Big data analyticsApache spark Credit card fraudLarge Hedron ColliderA novel parallel implementation of the Evolving Clustering Method (ECM) is proposed in this paper. The original serial version of the ECM is the clustering method which computes online and with a......
Big Data analytics are recently coming up as prominent research area in the field of data science. Apache Spark is an open source distributed data processing platform that uses distributed memory...doi:10.1007/978-3-319-74690-6_41Omar Hesham Mohamed...
"Big Data" can mean different things to different people. The scale and challenges of Big Data are often described using three attributes, namely volume, v... C Wu,R Buyya,K Ramamohanarao - 《Big Data》 被引量: 19发表: 2016年 Use of machine learning in big data analytics for insider...
Referring to the earlier explained challenges, researchers are trying to create structures, methodologies and new approaches for managing, controlling and processing this volume of data, which has led to the use of data mining tools. One of the important methods of data mining is clustering (cluste...
The Python extension library sklearn is an open source library for data analysis and machine learning and machine learning that encapsulates common machine learning methods, including clustering, regression, dimensionality reduction, and classification. The general process of machine learning is shown in ...
Because of the distributed and parallel characteristics of the proposed algorithm, it shows acceptable scalability and a drastic speedup in comparison with recently proposed MapReduce based clustering methods. In a comprehensive review, the contributions of this paper include: (1) Compatibility with the...
Another reason for the use of the canonical polyadic decomposition is that the canonical polyadic decomposition is easily implemented by existing decomposition methods for instance alternating least squares [[10], [11]]. Finally, the traditional fuzzy c-means approach is extended to a high-order ...
Clustering is a standard method for data analysis and many clustering methods have been proposed [29]. Some of the most well-known clustering algorithms are DBSCAN [9], k-means clustering [23], and CLIQUE [1, 2]. Yet, they have in common that they do not perform well with big data,...
Methods To apply our method to a specific dataset, users need to provide a data matrix M and the desired number of cluster K. The objective function to minimize is the nuclear norm of the pooled within class residual. The nuclear norm of a matrix is defined as the sum of singular values...
The algorithmic methods for clustering are simple. One of the most popular clustering algorithms is the k-means algorithm, which assigns any number of data objects to one of k clusters.107 The number k of clusters is provided by the user. The algorithm is easy to describe and to understand...