If you only are interested in sentences or only plan to use a simpler regex tokenizer, a minimal install with poetry install --without words will also do. Next adjust the Settings section of src/top_open_subtitles_sentences.py to your liking, while optionally reading the Info section, before...
command line when at the root directory of the downloaded repository. This also installs extra tokenizers for Japanese, Thai, and Vietnamese. If you only are interested in sentences or only plan to use a simpler regex tokenizer, a minimal install withpoetry install --without wordswill also ...