( /usr/local/lib/python3.10/dist-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator. logger.warn( /usr/local/lib/...
random.choice(range(prob_weights.shape[1]), p=prob_weights.ravel()) return action 存储回合 之前说过,policy gradient是在一个完整的episode结束后才开始训练的,因此,在一个episode结束前,我们要存储这个episode所有的经验,即状态,动作和奖励。 def store_transition(self, s, a, r): self.ep_obs....