Multi-Agent Constrained Policy Optimisation by Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois Knoll, and Yaodong Yang, 2021. Settling the Variance of Multi-Agent Policy Gradients by Kuba Jakub, Muning Wen, Linghui Meng, Shangding Gu,...
the proposed method can effectively improve users’ satisfaction while reducing the bill payment compared with traditional reinforcement learning (RL) methods (i.e., deep Q learning (DQN), deep deterministic policy gradient (DDPG), QMIX and multi-agent deep deterministic policy gradient (MADDPG)). ...
Demand response, an essential component of grid optimisation, allows consumers to adjust their electricity consumption patterns in response to market signals or system conditions [7]. By incorporating demand response mechanisms, grid-tied microgrids can effectively manage peak loads, reduce grid stress, ...
Policy-based methods mainly focus on the actor-critic architecture (see Sect.2.3). These studies use a centralised critic to train decentralised actors. Counterfactual multiagent (COMA) (Foerster et al.2018b) uses a centralised critic to approximate the Q-function and decentralised actors to optimi...
Su, P. H., Gasic, M., Mrksic, N., Rojas-Barahona, L. M., Ultes, S., Vandyke, D. et al. (2016). On-line active reward learning for policy optimisation in spoken dialogue systems. InProceedings of the 54th annual meeting of the association for computational linguistics. Berlin: As...
These include: multi-objective weighted constraint satisfaction problems (MO-WCSPs) [141] and Multi-objective Constraint Optimisation Problems (MOCOPs) [97, 99]. Or, in their original paper, the equivalent concept of domination. Although individualised reward shaping implies that each agent receives ...