The data sets generated from such studies are large and require sophisticated tools for proper analysis. In this chapter we review several techniques employed in clustering data sets of this type. Clustering can often reveal broad patterns which show that certain genes or proteins are performing ...
However, DP requires computing the distance between every pair of input points, therefore incurring quadratic computation overhead, which is prohibitive for large data sets. In this paper, we propose an efficient distributed algorithm LSHDDP, which is an approximate algorithm that exploits Locality ...
The data sets generated from such studies are large and require sophisticated tools for proper analysis. In this chapter we review several techniques employed in clustering data sets of this type. Clustering can often reveal broad patterns which show that certain genes or proteins are performing ...
The paper is about the clustering on large numeric data sets using hierarchical method. In this BIRCH approach is used, to reduce the amount of data, for this a hierarchical clustering method was applied to pre-process the dataset. Now a day's web information plays a prominent role in the...
Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. 展开 关键词: Pairwise data Selective sampling Spectral clustering Graph embedding ...
Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. However, working only on numeric values limits its use in data mini...
Huang, Z. Clustering large data sets with mixed numeric and categorical values, in Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 21–34 (1997). Bair, E., Tibshirani, R. & Golub, T. Semi-supervised methods to predict patient survival from gene...
Our software allows clustering of much larger EST data sets than is possible with current software. Because of its speed, it also facilitates multiple runs with different parameters, providing biologists a tool to better analyze EST sequence data. Using PaCE, we clustered EST data from 23 plant ...
We propose a pragmatic and scalable version of the tight clustering method that is applicable to data sets of very large size and deduce the properties of the proposed algorithm. We validate our algorithm with extensive simulation study and multiple real data analyses including analysis of real ...
In this paper we present an approach for clustering sets of alternatives using preferential information from a decision-maker. As clustering is dependent on the relations between the alternatives, clustering large datasets quickly becomes impractical, an issue we try to address by extending our ...