Routing selectionDRLMulti-agent double deep Q-networkThe increasing demand for high-speed data services from users has promoted the exploration of high-frequency spectrum resources, and the introduction of millimeter wave (mmWave) technology provides enormous bandwidth resources for the development of 5G...
Two independent DQN agents are trained. One agent selects operation sequences, while the other assigns jobs to machines. Du et al. (2021) [6] considered an FJSP with time-of-use electricity price constraint and dual-objective optimization for the makespan and total price and proposed a ...
For energy efficient routing in SDN, [37] proposed a deep Q-network (DQN)-based Energy Efficient routing (DQN-EER) algorithm to find energy-aware data paths between OpenFlow switches. The RL agent is implemented with deep convolutional neural network and modelled with MDP to interact with the...
First, DQN, which belongs to the field of deep reinforcement learning, is compared with the multi-agent D2D communication resource allocation algorithm (MAA2C) proposed in this paper based on A2C, to further prove the reliability of deep reinforcement learning in dealing with D2D communication ...
The main contribution of this study is the development of a framework for centralized multi-agent planning problems that outperform Multi-Agent DQNs for solving large-scale MMDPs and analysis of their performance. The second contribution of the study is to apply the framework to the problem of ...
The main contribution of this study is the development of a framework for centralized multi-agent planning problems that outperform Multi-Agent DQNs for solving large-scale MMDPs and analysis of their performance. The second contribution of the study is to apply the framework to the problem of ...
The main contribution of this study is the development of a framework for centralized multi-agent planning problems that outperform Multi-Agent DQNs for solving large-scale MMDPs and analysis of their performance. The second contribution of the study is to apply the framework to the problem of ...
The environment calculates the reward based on the designed reward function, sends complete {s, a, r, s’} to the agent, and records it in the experience replay memory. In the proposed algorithm, E-DQN uses the prioritized experience replay method to choose valuable data for training, and ...
Multi-robot path planning based on a deep reinforcement learning DQN algorithm. CAAI TRIT. 2020, 5, 177–183. [Google Scholar] [CrossRef] Koval, A.; Mansouri, S.S.; Nikolakopoulos, G. Multi-Agent Collaborative Path Planning Based on Staying Alive Policy. Robotics 2020, 9, 101. [...
Based on this work Hutse [17] added lead time delivery cycles to the scenario, using DQN to handle discrete action. The environment state consisted of inventory, production, transportation, and the last m (hyperparameter) demands. The agent’s action was factory production and product ...