running+tokenizer+on+dataset卡住

2025-06-12 16:10:46

拼音 [ 拼音 ]

Running tokenizer on dataset 一直阻塞,然后subprocesses has...

fromdatasetsimportDatasetimportos# Assume 'raw_datasets' is your original dataset# Directory to save the tokenized dataset in chunksoutput_dir="tokenized_dataset"# Create directory if it doesn't existifnotos.pat
Running tokenizer on dataset -- Hangs · Issue #19702...

@SaulLu when I use the wikitext-103 dataset the tokenizer hangs with Running tokenizer on dataset and shows no progress. This was not always an issue but as of today has become one. It will either hang at the end of tokenizing or at the very beginning. Any idea why this would be han...