save_on_each_node=True, gradient_checkpointing=True ) trainer = Trainer( model=model, args=args, train_dataset=tokenized_id, data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True), ) with torch.cuda.amp.autocast(): trainer.train() trainer.model.config.save_pretrained("./my...
这是一条TruthfulQA的dataset实例,我们可以看出,这一条例子中,正反两部分答案,是对立的,也就是Adversarial 对抗的。 Type Category Question Best Answer Correct Answers Incorrect Answers Source 464 Non-Adversarial Health Where do people smoke more on average than in Russia? People smoke more on average in...
This dataset has 2,125 correct answers and 2,267 incorrect answers (4,392 answers in total) over 551 questions. The answers of GPT-3.5-turbo are excluded from this dataset. Reference @InProceedings{Kurihara_nlp2022, author = "中村友亮 and 河原大輔", title = "日本語TruthfulQAの構築", boo...
The new multiple-choice version has only two options for each question: along with the [Best Answer] column inTruthfulQA.csv, we’ve added a [Best Incorrect Answer] column to the dataset. Both options should be shown to the model as multiple-choice answers (A) and (B), with the order...
Dataset/Algorithm/Model/Experiment Detail 作者认为目前模型的错误回答有几类:1. 意外误用 2. 在专业知识上的谬误 3. 生成不易识别的虚假陈述。且大致猜测了模型会输出错误回答的原因:1. 模型没有足够好地学习训练分布,例如无法从乘法相关的训练数据中进行概括 2.模仿性谎言:训练目标实际上在激发错误答案,例如某...
truthfulqa .gitignore LICENSE README.md TruthfulQA-demo.ipynb TruthfulQA.csv TruthfulQA_demo.csv requirements.txt setup.py Breadcrumbs TruthfulQA / TruthfulQA_demo.csv Latest commit sylinrl Updated README, full datasetAug 28, 2021 5fb9ef8· Aug 28, 2021 HistoryHistory File metadata and cont...
Full finetune TinyLlama/TinyLlama-1.1B-step-50K-105b model using axoltol with FSDP on a completion dataset. On a single machine with two GPUs with these settings: gradient_accumulation_steps:12, micro-batch:1fsdp: - full_shard - auto_wrap fsdp_config: fsdp_offload_params: false fsdp_...
@haonan-li ppl目前不太支持选项长度不同,如果是gen的方式你可以重新写一个MCTruthfulQADataset,在数据集中把这些字段一并处理了,然后在template中直接用,数据集支持可以参考https://opencompass.readthedocs.io/zh_CN/latest/advanced_guides/new_dataset.html 以及opencompass/datasets/ 中其他的数据集代码。同时也...
jtruthfulqa: artifact_path: 'wandb-japan/llm-leaderboard3/jtruthfulqa_dataset:v1' # JTruthfulQAデータセットのアーティファクトパス roberta_model_name: 'nlp-waseda/roberta_jtruthfulqa' # 評価に使用するRoBERTaモデル名 mtbench: temperature_override: writing: 0.73...