Mini-batch Gradient Descent with Buffer In this paper, we studied a buffered mini-batch gradient descent (BMGD) algorithm for training complex model on massive datasets. The algorithm studied here is designed fo
Firstly, by recalculating all samples in the experience replay buffer with the current network parameters, the policy network generates a new dataset . Secondly, by using the current network parameters and the dataset , the algorithm generates state-action features. Finally, the algorithm uses LSPI ...