tfidf_model = TfidfVectorizer(binary=False, decode_error='ignore', stop_words='english') vec = tfidf_model.fit_transform(corpus) tfidf_model.get_feature_names() # 2 from sklearn.feature_extraction.text import Tf
第二步:通过上面的计算得到idf向量,剩下的工作就是计算 tf*idf了,会用到IDFMode类中的transform方法 val tfidf: RDD[Vector] = idf.transform(tf) private object IDFModel { /** * Transforms a term frequency (TF) vector to a TF-IDF vector with a IDF vector * * @param idf an IDF vector *...