There is no examples on finetuning with a pretokenized dataset. The only thing mentioned in the doc is:Columns in Dataset must be exactly input_ids, attention_mask, labels. But that raises these quetions: Should the values be pre-padded? What should be the ignore index for the labels? Ad...
If the pre-tokenized data is saved locally, it can be loaded directly the next time for the same data, allowing to speed this process. Others No response I don't really understand the answer. I already have a pre-tokenized dataset (withinput_idsandlabels), I don't need to setup prompt...
batch_pretokenized_inputs = [ ["Hello", ",", "world", "!"], ["This", "is", "a", "test", "."], ["Another", "example", "."] ] 说明如果输入不符合上述类型会导致什么结果: 如果输入不符合上述类型(例如,输入是一个整数、浮点数、非字符串的复杂对象等),那么处理该输入的函数或方...
ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples). · Issue #39 · Shivanandroy/simpleT5
Sign in Shivanandroy/simpleT5Public Notifications Fork62 Star390 New issue Jump to bottom Open j0stopened this issueAug 18, 2021· 5 comments Open ValueError: text input must of typestr(single example),List[str](batch or single pretokenized example) orList[List[str]](batch of pretokenized ...