用代码做一个数值实验就很清楚了:fromscipy.specialimportsoftmaximportnumpyasnpdeftest_gradient(dim,time_steps=50,scale=1.0):# Assume components of the query and keys are drawn from N(0, 1) independentlyq=np.random.randn(dim)ks=np.random.randn(time_steps,dim)x=np.sum(q*ks,axis=1)/...
有以下公式:{E}[|X|] = \sigma \sqrt{2/\pi} \\因此,差值v_i - v_j \sim \mathcal{...