Beam Search 与Greedy Decoding不同的是,每次都会有k个预选的token,其中k值来自于Top k中的k值,假设k=2,那么搜索示例如下: 在Decoder部分,输入了“我”,生成两个候选词“喜”、“爱”,假设生成这两个词的概率取对数后分别为-6和-7. 接下来“喜”生成两个候选词“欢”和“来”,同样假设生成这两个词的概率...
We could probably take the num_return_sequences top beams in the case of having beam search + greedy decoding otherwise this option is not useful in this case. 👍 3 Author rajarsheem commented Jan 10, 2020 Thanks @thomwolf for the clarification. So, in case of greedy decoding, you ...