padding_side="left")>>>tokenizer.pad_token=tokenizer.eos_token>>>prompts=["hello llama","who are you?"]>>>tokenizer(prompts,return_tensors="pt",padding=True){'input_ids':tensor([[2,1,22172,11148,3304],│···[1,1058,526,366,29973]]),'attention...
llama_tokenzier = LlamaTokenizer.from_pretrained("./link_model/llama2-7b-hf/",padding_side = "left") llama_tokenzier.pad_token = llama_tokenzier.eos_token print(llama_tokenzier.padding_side) ## output :left tokens = llama_tokenzier(input_text,padding="longest",return_tensors="pt") pri...
Revert "[Tokenizer] Enable padding_side as call time kwargs" #9192 Merged ZHUI added a commit that referenced this pull request Sep 25, 2024 Revert "[Tokenier] Enable padding_side as call time kwargs (#9161)" (#… … c4d3a2f Sign up for free to join this conversation on GitHub...
ZHUIcommentedSep 25, 2024 Revert "[Tokenier] Enable padding_side as call time kwargs (#9161)" 49c56f3 This reverts commitc5e6db5. Collaborator DrownFish19left a comment ZHUImerged commitc4d3a2fintodevelopSep 25, 2024 6 of 12 checks passed ...
很明显,tokenizer没有对这些示例进行填充。 我们可以通过简单地使用UNK标记作为填充标记来解决这个问题,如下所示: tokenizer.padding_side = "left" tokenizer.pad_token = tokenizer.unk_token input = tokenizer(prompts, padding='max_length', max_length=20, return_tensors="pt"); ...
@@ -409,7 +409,7 @@ def _init_tokenizer(self, model_id_or_path: str): path_to_use = model_id_or_path if self.peft_config is None else self.peft_config.base_model_name_or_path self.tokenizer = AutoTokenizer.from_pretrained( path_to_use, padding_size="left", padding_side="le...
input_ids = F.pad(input_ids, (0, padding_length), 'constant', tokenizer.pad_token_id) attention_mask = F.pad(attention_mask, (0, padding_length), 'constant', 0) pad_tuple = (0, padding_length) if padding_right else (padding_length, 0) Collaborator baoleai Jul 16, 2024 LG...
PR types Function optimization PR changes APIs Description Enable padding_side as call time kwargs. This PR is based on #9161 and compatible with the function self._pad without the argument padd...