"It is unclear to me how the words are selected." From the documentation: max_features : optional, None by default If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus. All the features (in your case unigrams, bigrams and ...
The documentation clearly states that dtype only affects the return type of fit (and transform): dtype : type, optional Type of the matrix returned by fit_transform() or transform(). FYI: In [7]: scipy.__version__ Out[7]: '0.13.3' In [8]: sklearn.__version__ Out[8]: '0.16...
Linux-4.16.5-1-ARCH-x86_64-with-arch Python 3.7.0 (default, Jul 15 2018, 10:44:58) [GCC 8.1.1 20180531] NumPy 1.15.0 SciPy 1.1.0 Scikit-Learn 0.19.2Member jnothman commented Aug 16, 2018 via email Yes, it does. Please offer a change to documentation if you feel it can be ...
https://github.com/rapidsai/cuml/blob/cdb14e7de6a40d8d707d29b2889a89aa553125ee/python/cuml/feature_extraction/_tfidf_vectorizer.py. Copy link Member stsievertcommentedSep 23, 2020 I think this issue belongs in Dask-ML.@TomAugspurgeris it possible to move this issue over there?