32, in bfloat16, 2) a "debug state" used in unit testing (a small batch of data, and target activations and gradients), 3) the GPT-2 tokenizer, and 3) the tokenized [tinyshakespeare](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt) dataset....
What part(s) of the article would you like to see updated? There is no examples on finetuning with a pretokenized dataset. The only thing mentioned in the doc is:Columns in Dataset must be exactly input_ids, attention_mask, labels. But that raises these quetions: Should the values be p...
如果我设置了tokenized path,是否还需要设置dataset? 额外我加入了eval,如果不加dataset,则会报错: File "/mnt/nas/nuochen/code/cpt/LLaMA-Factory/src/llamafactory/hparams/data_args.py", line 140, in post_init raise ValueError("Cannot specify val_size if dataset is None.") ValueError: Cannot speci...