我正在致力于创建一个基于 LSTM 的强化学习模型,并尝试了解 sb3-contrib 的 Recurrent PPO 的工作原理。这是代码的简化示例: # import gym # from gym import spaces # import torch # import numpy as np # from sb3_contrib import RecurrentPPO class env_LSTM(gym.Env): def __init__(self, qnt_steps...
lstm_hidden_size设置为 1024,意味着每个LSTM层有1024个隐藏单元。 n_lstm_layers:指定了LSTM网络的层数。在这个例子中,只有一层LSTM (n_lstm_layers=1)。多层LSTM可以提供更深的网络结构,从而可能更好地建模长期依赖性,但也会增加训练难度和计算开销。 ...
9 implementation details for robotics tasks (with continuous action spaces) 5 LSTM implementation details 1 MultiDiscrete action spaces implementation detail 可复现性高:为了验证我们的复现效果,我们在经典控制任务、 Atari、MuJoCo 任务、LSTM 和实时战略(RTS)游戏任务中证明了我们的实现与原始实现的结果非常吻合。
This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Recurrent PPO (PPO LSTM), Truncated Quantile Critics (TQC), Quantile Regression DQN (QR-DQN) or PPO with invalid action masking (Maskable PPO)....
python reinforcement-learning ai deep-learning deep-reinforcement-learning torch recurrent-neural-networks genetic-algorithms gymnasium splendor gym-environment ppo ppo-agent ppo-pytorch custom-gym-environment ppo-gru recurrent-ppo ppo-lstm maskable-ppo ppo-self-attention Updated Nov 14, 2024 Python Swa...
Stable-Baselines3 (SB3)、pytorch-a2c-ppo-acktr-gail 和 CleanRL 等 RL 库已经构建了其 PPO 实现,以紧密匹配 ppo2 (ea25b9e) 中的实现细节。 最近的论文(Engstrom、Ilyas 等人,2020 年;Andrychowicz 等人,2021 年)研究了 ppo2 (ea25b9e) 中有关机器人任务的实现细节。
Any suggestions would be greatly appreciated :) To Reproduce Run Command: python ppo_atari.py --gpu 0 --env Atlantis --trials 5 The hyperparameters follow that of the original PPO implementation (without LSTM). ppo_atari.py: importargparseimportjsonimportosimportpathlibimporttimeimportuuidimportgym...
This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Recurrent PPO (PPO LSTM), Truncated Quantile Critics (TQC), Quantile Regression DQN (QR-DQN) or PPO with invalid action masking (Maskable PPO). ...