fromdatasetsimportDatasetimportos# Assume 'raw_datasets' is your original dataset# Directory to save the tokenized dataset in chunksoutput_dir="tokenized_dataset"# Create directory if it doesn't existifnotos.pat
@SaulLu when I use the wikitext-103 dataset the tokenizer hangs with Running tokenizer on dataset and shows no progress. This was not always an issue but as of today has become one. It will either hang at the end of tokenizing or at the very beginning. Any idea why this would be han...