When the experience replay buffer does not meet the training requirements, our method is the same as the original algorithm, MATD3, and the network does not change at this stage. After the experience replay buffer is full, every 𝜀ε episode interval, we compare the average reward received...