相比于著名的Penn Treebank (PTB)词库中的词汇数量,前者是其2倍,后者是其110倍。每个词汇还同时保留产生该词汇的原始文章,这尤其适合当需要长时依赖自然语言建模的场景。文件列表 bp06PsV.zip (预估有个5文件) WikiText Long Term Dependency Language Modeling Dataset wikitext-103-raw-v1.zip 183.09MB ...
To Reproduce: Here is the link to the notebook with the error:https://colab.research.google.com/drive/1Odac5EA0f3ozCGXYpqZs2nmrwB1Flu18?usp=sharing Expected behavior: Datasets downloaded successfully. Environment --2024-03-02 23:06:55-- https://raw.githubusercontent.com/pytorch/pytorch/ma...