# Input data:Each row is a bagofwordswithaID.df=spark.createDataFrame([(0,"a b c".split(" ")),(1,"a b b c a".split(" "))],["id","words"])# fit a CountVectorizerModel from the corpus.cv=CountVectorizer(inputCol="words",outputCol="features",vocabSize=3,minDF=2.0)model=cv...
scale="row", cex=1, clustering_distance_rows="euclidean", cex=1, clustering_distance_cols="euclidean", clustering_method="complete", border_color=FALSE) -> heatof#make clusters based on h=1.5sort(cutree(heatof$tree_row, h=1.5)) -> hi_clusters#check if clusters are in same order...
extracting the index is possible by using the code. Filtering the rows where the year is 2007 can be done by using a boolean condition. If the values in the series cycle through