action = ppo.select_action(state, memory) state, reward, done, _ = env.step(action) # Saving reward and is_terminals: memory.rewards.append(reward) memory.is_terminals.append(done) # update if its time if time_step % update_timestep == 0: ppo.update(memory) memory.clear_memory() ti...
qqadssp/PPO-PytorchPublic Notifications Fork1 Star6 Code Issues Pull requests Actions Projects Security Insights Files master env logdir util LICENSE README.md agent.py main.py ppo.py runner.py Latest commit qqadssp Ant run Aug 17, 2018 ...
Proximal Policy Optimization (PPO) algorithm for Super Mario Bros - Super-mario-bros-PPO-pytorch/test.py at master · vietnh1009/Super-mario-bros-PPO-pytorch