🐛 Describe the bug Two tokens are decoded in this example. Ideally, the output feature on the first token should be the same regardless of the sequence length as a square subsequent mask is applied. Here are two ways to generate the tgt ...
在这里由于unk_idx=0,所以mask开头和结尾都加一个0,中间是tgt原始句子的每一个元素依次是否能在src中找到,如果tgt中有token在src中找到,mask的该位置则设置成src_ex_vocab中的idx。此时,src_ex_vocab有自己的Vocab,跟全局大vocab独立,专属于src sentence的vocab列表,仅包含当前src tokens。 例如: src = "this ...
在这种情况下,因果掩码(tgt_mask,根据nn.Transformers文档)是根据序列长度自动计算的,并与传递的atten...
size(0)): tgt_mask = transformer.generate_square_subsequent_mask(seq_tgt.size(0)) seq_out = transformer(src=src, tgt=seq_tgt, tgt_mask=tgt_mask) latest_out = seq_out[-1, :, :].unsqueeze(0) out_sequence_list.append(latest_out) # AssertError when seq_out.size(0) >= 2 # In...
因果掩码(tgt_mask,根据nn.Transformers文档)是根据序列长度自动计算的,并与传递的attention_mask结合...
self.tgt_generation_mask[:]=1 pre_caches_length=0ifnotself.config.export_precacheelseself.pre_caches[0].shape[-2] ifself.tokenizer.chat_templateisnotNone: Expand DownExpand Up@@ -468,15 +466,6 @@ def _preprocess(self, source):