Synthetic Experience Replay Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder Key: diffusion models, Data Synthesizer ExpEnv: D4RL Value function estimation using conditional diffusion models for control Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev...
Deep Learning extension of deterministic policy gradients (DPG), an off-policy RL algorithm. My implementation uses action and parameter noise to improve exploration at the start of training and then throughout the remainder of the steps.
A natural extension for further optimizing convergence during training is defining a suitable sample priority function over the samples already collected, like in [14]. We, however, focus solely on the optimized exploration aspect, with each sample gathered by our method having a uniform probability...
Synthetic Experience Replay Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder Key: diffusion models, Data Synthesizer ExpEnv: D4RL Value function estimation using conditional diffusion models for control Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev...