Moreover, we design a new learning algorithm to cluster mixed datasets. The proposed algorithm achieves the clustering accuracies of 89.2% for heart disease and 89.4%, 84.9%, 85.5%, 91.2% for kaggle, factors, kinase, UV of chemoinformatics datasets, respectively. Also, it is compared with ...
聚类分析(Cluster Analysis)又称群分析,是根据“物以类聚”的道理,对样品或指标进行分类的一种多元统计分析方法,它们讨论的对象是大量的样品,要求能合理地按各自的特性来进行合理的分类,没有任何模式可供参考或依循,即是在没有先验知识的情况下进行的。聚类分析起源于分类学,在古老的分类学中,人们主要依靠经验和专...
从下面的结果来看,将总体聚成两类比较合适,如下: fit.m <- Mclust(exp_sample) summary(fit.m) ## --- ## Gaussian finite mixture model fitted by EM algorithm ## --- ## ## Mclust EEI (diagonal, equal volume and shape) model with 5 components: ## ## log-likelihood n df BIC ICL ##...
reps (resamplings, number of subsamples) : 50 clusterAlg (agglomerative heirarchical clustering algorithm) : 'hc' (hclust) distance : 'pearson' (1 - Pearson correlation) 运行如下: library(ConsensusClusterPlus) title = tempdir() results <- ConsensusClusterPlus(as.matrix(TumorMat), maxK = 6,...
Automated Deployment of a Spark Cluster with Machine Learning Algorithm IntegrationBig data analyticsApache SparkMachine learningCluster deploymentThe vast amount of data stored nowadays has turned big data analytics into a very trendy research field. The Spark distributed computing platform has emerged as...
The function kmeans performs K-Means clustering, using an iterative algorithm that assigns objects to clusters so that the sum of distances from each object to its cluster centroid, over all clusters, is a minimum. Used on Fisher's iris data, it will find the natural groupings among iris ...
To overcome the communication bottleneck, we propose the ClusterGrad algorithm to compress gradients which can considerably reduce the volume of communicated computations. Our design is based on the fact that there is only a small fraction of gradients whose values are far away from the origin in ...
Based on the analysis of the deficiency of BIRCH algorithm, a new clustering algorithm named CLAP (Clustering algorithm based on rAndom-samPling and cluster-feature) is proposed. CLAP preprocesses some of the data extracted from the database by random sampling technique, which decreases the running...
(parameters: reps = 100, pItem = 0.8, pFeature = 1). Ward.D2 and Euclidean distances were used as the clustering algorithm and distance metric, respectively, with k = 3. Median expression levels of coexpressed glycolytic and cholesterogenic genes were used to assign quiescent (glycolytic ...
This study presents a machine learning approach based on the C5.0 decision tree (DT) model and the K-means cluster algorithm to produce a regional landslide susceptibility map. Yanchang County, a typical landslide-prone area located in northwestern China, was taken as the area of interest to ...