The PPO model (from stable_baselines3) is unable to learn now to navigate randomly generated worlds even after 5 million time steps of training (each reset of an environment creates new random world layout). Tensor-board is showing only a very slight average reward increase after all...
self).__init__(verbose) def _on_step(self) -> bool: # Log Q-values obs = th.tensor(self.locals["replay_buffer"].observations[-1], device=self.model.device).float() q_values = self.model.q_net(obs) avg_q_values = q_values.mean().item() self.logger.record...
from stable_baselines3.common.callbacks import EvalCallback, StopTrainingOnRewardThreshold # Separate evaluation env eval_env = gym.make('Pendulum-v0') eval_env = gym.make('Pendulum-v1') # Stop training when the model reaches the reward threshold callback_on_best = StopTrainingOnRewardThreshol...
Please seeStable Baselines3 documentationfor alternatives. Docker Images Build docker image (CPU): make docker-cpu GPU: USE_GPU=True make docker-gpu Pull built docker image (CPU): docker pull stablebaselines/rl-baselines3-zoo-cpu GPU image: ...
1.4.4. While we strongly recommend that you update to 2.0, in case you require the old API, you can install the last stable version with pip:pip install pyRDDLGym==1.4.4, or directly from githubpip install git+https://github.com/pyrddlgym-project/pyRDDLGym@version_1.4.4_stable. ...