For a dataset with a finitely many samples, we assume a discrete uniform distribution over data \(p({\bf{x}})=\frac{1}{M}\mathop{\sum }\nolimits_{i = 1}^{M}\delta ({\bf{x}}-{{\bf{x}}}_{i})\) with M being the size of the whole dataset (train+test). Then, the ...