First, we do not need to know the expected number of clusters beforehand. Second, without the computationally expensive final clustering, the fast BIRCH algorithm will become even faster. For very large data set
in-house Python script and subsequently used for homologous searching via BLASTp with parameters of “–a 10 -m8 -e 1e−7 -F F” (Camacho et al., 2009), and the core and dispensable gene sets were further estimated via gene family clustering with OrthoFinder (Emms and Kelly, 2015)....
We have applied the k-mean partitioning algorithm, the agglomerative hierarchical clustering algorithm BIRCH, and the density based partitioning algorithm DBSCAN on the above set of daily data containing r1, 1, and r2, 2 for each day. Many interesting clusters have been identified. The cluster ...