如果你之前的代码中使用了get_feature_names,建议将其替换为get_feature_names_out,以确保代码在新版本的scikit-learn中能够正常运行。 综上所述,如果你遇到了“'countvectorizer' object has no attribute 'get_feature_names'”的错误,很可能是因为你使用的scikit-learn版本已经更新,而你的代码还没有进行相应的调整...
'high resolution display'] } df = pd.DataFrame(data) # 创建CountVectorizer对象 vectorizer = CountVectorizer() # 拟合并转换功能列表列 feature_vectors = vectorizer.fit_transform(df['features']) # 输出特征向量 print(feature_vectors.toarray()) # 输出词汇表 print(vectorizer.get_feature_names_...
("Vocabulary:", vectorizer.get_feature_names_out()) print("Vector Matrix:\n", X.toarray()) # 如果需要将向量矩阵拆分为两列 X_col1 = X[:len(df['col1']), :] X_col2 = X[len(df['col1']):, :] print("Vector Matrix for col1:\n", X_col1.toarray()) print("Vector Matrix...
vectorizer = CountVectorizer(token_pattern=r"(?u)\b\w+\b") X = vectorizer.fit_transform(corpus_zh_out) vectorizer.get_feature_names() X.toarray() 这是默认的情况,现在,我们加入binary参数 vectorizer = CountVectorizer(token_pattern=r"(?u)\b\w+\b", binary=True) X = vectorizer.fit_transfo...
vectorizer.get_feature_names_out() Returns words in your corpus, sorted by position in the sparse matrix. Get the indices of each feature name vectorizer.vocabulary_ Please note that this does not return the frequency count, but instead, it provides the index of each word in the corpus. ...
print(tv.get_feature_names_out())# 默认使用所有的词构建词袋 print(tv.vocabulary_) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. CountVectorizer() ...
X = vectorizer.fit_transform(corpus_zh_out) vectorizer.get_feature_names() 这是提取的特征,这里目前有个疑问,我本来以为会是所有的词都会在这里,但是原语料中的“我”,“你”,“吃”没有了,说明是提取了某些特征,没有使用所有的词组,不确定词袋模型中是否可以使用特征提取,这里的疑问等我再研究一下。
get_feature_names() NotFittedError: CountVectorizer - Vocabulary wasn't fitted. In [8]: vectorizer.transform(corpus) Out[8]: <4x9 sparse matrix of type '<class 'numpy.int64'>' with 19 stored elements in Compressed Sparse Row format> In [9]: hasattr(vectorizer, "vocabulary_") Out[9]:...
def get_feature_names(self): """Array mapping from feature integer indices to feature name""" self._check_vocabulary() return [t for t, i in sorted(six.iteritems(self.vocabulary_), key=itemgetter(1))] 1. 2. 3. 4. 5. 6.
['我爱', '编程', '快乐', '学习', '技术'] # 创建 CountVectorizer 实例,指定 n-gram 范围和自定义词汇表 vectorizer = CountVectorizer(ngram_range=(1, 2), vocabulary=custom_vocab) # 拟合并转换文本数据 X = vectorizer.fit_transform(corpus) # 输出结果 print(vectorizer.get_feature_names_o...