log-derivative+trick

2025-05-05 05:21:26

拼音 [ 拼音 ]

RL 学习笔记: Policy Gradient & Log Derivative Trick - 知乎

这就是 Log Derivative Trick 的应用:将原本不能直接用蒙特卡洛估算的梯度\nabla_\theta p(x;\theta)转换为p(x;\theta)\nabla_\theta \log p(x;\theta),从而在式子中凑出一个概率函数p(x;\theta),使得可以用蒙特卡洛方法来采样估算期望。在推导 Policy Gradient 的时候就主要用到了这个技巧。
Log Derivative Trick - 知乎

从分布p(z)采样很容易, 因为积分的 Monte Carlo evaluation 所需要的。其他很多领域的研究已经研究过log-derivative trick, 并给出了和他们的问题表述相关的名字, 包括: Score function estimators:我们的微分允许我们将期望的梯度转换为 score function\nabla_\theta \log p(z ; \theta)的期望, 使得很自然地得...
...probabilistic models with the CatLog-Derivative trick

CatLog-Derivative trick This repository contains the code and data generation for the experiments presented in the paper "Differentiable Sampling of Categorical Distributions Using the CatLog-Derivative Trick".The two synthetic experiments can be found in synthetic. The code for our discrete variational...