We present a new clustering algorithm for multivariate binary data. The new algorithm is based on the convex relaxation of hierarchical clustering, which is achieved by considering the binomial likelihood as a
To exploit the advantages of EM for clustering and of CEM for fast convergence, we combine the two algorithms. With Monte Carlo simulations and by varying parameters of the model, we rigorously validate the approach. We also illustrate our contribution using real datasets commonly used in document...
-td <str>, Path to the true/raw data/genotypes. Model Arguments -FN <float>, Replace <float> with the fixed error rate for false negatives. -FP <float>, Replace <float> with the fixed error rate for false positives. -FN_m <float>, Replace <float> with the mean for the prior for...
Binary genomic dataIn biomedical research a relevant issue is to identify time intervals or portions of a n-dimensional support where a particular event of interest is more likely to occur than expected. Algorithms that require to specify a-priori number/dimension/length of clusters assumed for ...
A partitional clustering algorithm obtains a single partition of the data instead of a clustering structure, such as the dendrogram produced by hierarchical methods. Partitional methods have advantages over hierarchical in applications involving large data sets for which the construction of a dendrogram ...
Fig. 1. Sparse and redundant representations implement a transformation Y=AX that increases the dimensionality of feature space from N to M and searches for suitable subspaces within the new M-dimensional feature space. Having considered the challenge of high dimensional data, it is also important ...
For binary data you can also usemeasurement="binary"ormeasurement="binary_nan". For continuous data, you can fit a Gaussian Mixture with diagonal covariances usingmeasurement="continuous"ormeasurement="continuous_nan". Setverbose=1for a detailed output. ...
There are many different clustering algorithms as there are multiple ways to define a cluster. Different approaches will work well for different types of models depending on the size of the input data, the dimensionality of the data, the rigidity of the categories and the number of clusters with...
A common analysis of single-cell sequencing data includes clustering of cells and identifying differentially expressed genes (DEGs). How cell clusters are defined has important consequences for downstream analyses and the interpretation of results, but i
In microbiome data analysis, unsupervised clustering is often used to identify naturally occurring clusters, which can then be assessed for associations with characteristics of interest. In this work, we systematically compared beta diversity and clustering methods commonly used in microbiome analyses. We...