但top-k sampling中k的选择是个难题,选大了可能会采样出长尾词,导致语句不通顺,选小了又退化成了Beam Search。 Nucleus Sampling (Top-p Sampling) 为解决这个问题,Nucleus sampling应运而生: The key intuition of Nucleus Sampling is that the vast majority of probability mass at each time step is concent...
上图比较了Beam Search和Pure Sampling两种解码方法,Beam Search导致了重复生成,Pure Sampling 导致了错误的输出。 论文对Beam Searc的分析: This may seem counter-intuitive, asone would expect that good models would assign higher probability to more human-like, grammatical text. Indeed, language models do ...