However, unlike the method in [2], which focuses solely on immediate rewards, our approach uses N steps to optimize both short-term and long-term rewards. This allows for more informed decision making and better resource management, leading to superior performance, particularly in high-density ...
Simulation results have demonstrated that our method has significant advantages over traditional methods and other deep-learning algorithms, and effectively improves the communication performance of NOMA transmission to some extent. Keywords: NOMA; deep-reinforcement learning; actor–critic; power allocation;...
3. Proposed Method In this study, we propose a learning method to efficiently train an agent in an environment with continuous state and action spaces of the autonomous flight environments. Specifically, a positive buffer is additionally used in the SAC algorithm to use the experience of successful...
Compared with the control framework based on baseline algorithms, the framework based on HEP-SAC improved the accuracy of the liquid level in the mold by 4.29% and the stability of the stopper rod opening degree by 3.17%, which shows the effectiveness of the improved method in this paper. ...
A collision-avoidance method based on the COLREGs and a reciprocal velocity obstacles (RVO) method was proposed by Wang et al. [10]. A distance-closest-point of approach (DCPA) and a time-closest-point of approach (TCPA) were used for collision risk assessment in this algorithm. However,...
It is known that Reinforcement Learning (RL) is an effective method to address the Markov decision process. As mentioned above, the charging scheduling problem in WRSN is NP-hard; thus, it is unable to provide available optimal labels for supervised learning. However, the quality of a set of...