BART这篇文章提出的是一种符合生成任务的预训练方法,BART的全称是Bidirectional
processor.tokenizer.bos_token_id).all().cpu().item(): if (labels[:, 0] == self.decoder_start_token_id).all().cpu().item(): labels = labels[:, 1:] batch["labels"] = labels @@ -564,7 +596,10 @@ class DataCollatorSpeechSeq2SeqWithPadding: Let's initialise the data collator...