In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm which handles the problem of mixed data. We then utilize big data platforms to implement the algorithm. This provides us with a solid basis ...
Research on parallelization of K-Means algorithm in security situation awareness system Jiang Jiaxi,Xie Yinghua School of Information Science and Technology,Donghua University,Shanghai 201620,China Abstract:With the emergence of network security events in a big data environment, the application of security...
In this paper, several experiments are performed to compare and analyze multiple performances of the algorithm. Through analysis, we know that the proposed algorithm is superior to the existing algorithms. 展开 关键词: Big data outlier detection SMK-means Mini Batch K-means simulated annealing ...
The Application of K-means Clustering Algorithm Based on Big Data in Network Security Detection 作者: 郑志娴[1];王敏[1]作者机构: [1]福建船政交通职业学院信息工程系,福州350007 出版物刊名: 湖北第二师范学院学报 页码: 36-40页 年卷期: 2016年 第2期 主题词: 大数据;网络安全检测;K-means聚类...
K-means algorithm has several limitations:choosing initial class centre of divisions was random,too sensitive to noises and outliers,divisions had a great difference in shape was not applicable. To against the deficiency,drawing on the experience of molecular interaction model with the text simulated ...
times. The R implementation of the k-means algorithm,kmeansin the stats package, is pretty fast. Running the example above on my pc (1.87 GHz Dell laptop with 8 GB of RAM) on 10,000,000 points took about 4.3 seconds. But what if you have a data set that won’t fit into memory?
Big dataEnvironmental monitoringKalman filterImproved k-means algorithmBP neural networkThe Embedded operating system based on STC12C5A MCU is designed to facilitate real-time monitoring and control of the home environment, in order to solve some problems such as low acquisition frequency, poor real-...
The following example demonstrates how to run the k-means clustering algorithm in R. library(ggplot2)# Prepare Datadata=mtcars# We need to scale the data to have zero mean and unit variancedata<-scale(data)# Determine number of clusterswss<-(nrow(data)-1)*sum(apply(data,2,var))for(ii...
针对大数据环境下K-means聚类算法聚类精度不足和收敛速度慢的问题,提出一种基于优化抽样聚类的K-means算法(OSCK).首先,该算法从海量数据中概率抽样多个样本;其次,基于最佳聚类中心的欧氏距离相似性原理,建模评估样本聚类结果并去除抽样聚类结果的次优解;最后,加权整合评估得到的聚类结果得到最终k个聚类中心,并将这k个聚...
K-means is a widely used clustering algorithm in field of data mining across different disciplines in the past fifty years. However, k-means heavily depends on the position of initial centers, and the chosen starting centers randomly may lead to poor quality of clustering. Motivated by this, ...