注意 Q-DPP 属于 value-based 方法,是因为 Q-DPP 将值函数定义如下 Q^{\boldsymbol{\pi}}(\boldsymbol{o}, \boldsymbol{a}):=\log \operatorname{det}\left(\mathcal{L}_{Y=\left\{ \left(o_{1}, a_{1}\right), \ldots,\left(o_{N}, a_{N}\right)\right\} \in \mathcal{C}\le...
Multi-Agent Determinantal Q-LearningDavid MguniJun WangKun ShaoLiheng ChenWeinan ZhangYaodong YangYing Wen
图片摘自 Multi-agent Determinantal Q-Learning 在上图中,作者们通过 exploration 和 centralized/independent 来对一众算法进行分类,而在这篇文章中作者们主要着眼于offline MARL,即 learner 没有办法与环境进行实时地互动,只能对既有的数据集进行充分的利用。对于 multi-agent 的 actor-critic 算法,主要可以分为两类...
Q-DPP is a novel function approximator for cooperative multi-agent reinforcement learning problems based on determinantal point process (DPP). Q-DPP promotes agents to acquire diverse behavioral models; this allows a natural factorization for the joint Q-functions with no need for a priori structur...
This paper surveys the field of deep multiagent reinforcement learning (RL). The combination of deep neural networks with RL has gained increased traction
Independent LearningIQL:Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agentshttps://github.com/oxwhirl/pymarlICML1993 Value DecompositionVDN:Value-Decomposition Networks For Cooperative Multi-Agent Learninghttps://github.com/oxwhirl/pymarlAAMAS2017 ...
Q-function factorisation multi-agent soft learning networked multi-agent MDP stochastic potential games zero-sum continuous games online MDP turn-based stochastic games policy space response oracle approximation methods in general sum games mean-field type learning in games with infinite agents ...