Notes on the KL-divergence retrieval formula andDirichlet prior smoothingChengXiang ZhaiMarch 11, 20071The KL-divergence measureGiven two probability mass functions p(x) and q(x), D(p|| q), the Kullback-Leibler divergence (or relativeentropy) between p and q is defined asD(p|| q) = xp(...
First, it is shown that, when the criterion for selecting a pmf from the MLS is the KL-divergence, the selected conditional pmf naturally has a back-off form, except for a ceiling on the probability of high frequency symbols that are not seen in particular contexts. Second, the pmf has ...