论文发表在AAAI2022,通过将Stackelberg博弈的结构引入Actor-Critic框架,解决梯度循环问题,提升收敛速度。在若干经典的OpenAI gym环境中表现较好。 背景 Stackelberg博弈 Stackelberg博弈,又称为主从博弈。在双人一般和博弈场景下,存在一个Leader(L)与Follower(F),L先于F做决策,F会根据L的决策最大化自身收益,同样的L会预...
在2022年的AAAI会议上,一篇论文提出了Stackelberg Actor-Critic(SAC)算法,它将Stackelberg博弈的原理应用于强化学习,以解决Actor-Critic框架中的梯度循环问题,从而加快收敛速度。在OpenAI gym的多个经典环境中,SAC展现出良好的性能。Stackelberg博弈描述了一种双人博弈,其中一方(Leader,L)先行动,另一方...
The two-step Stackelberg game is a widely used and feasible model for resource allocation and power control problem formulation. Both in the follower games for small cells and in the leader games for the macro cell, the cost parameters are a critical variable for the performance of Stacke...
Bao Tao, Zhang Xiaoshun, Yu Tao, et al. A Stackelberg game model of real-time supply-demand interaction and the solving method via reinforcement learning[J]. Proceedings of the CSEE, 2018, 38(10): 2947-2955. Zhang XipengGu Qingfa3 ...
{The gist is to simulate strong attack behavior using reinforcement learning (RL-based attacks) in pre-training and then design meta-RL-based defense to combat diverse and adaptive attacks.} We develop an efficient meta-learning approach to solve the game, leading to a robust and adaptive FL ...
This was achieved with the help of reinforcement learning (RL) and repeated Stackelberg game [29]. To deal with adaptive adversaries, Zhang and Zhuang [30] introduced a sequential game that accurately estimates the required resources to face several attack types. This will provide guidance on how...
In addition, traditional game-theoretic approaches need to solve the dynamic pricing problem repeatedly to keep up with the changing network traffic. In contrast, reinforcement learning (RL) algorithms have proven to be capable of training a model and making resource trading decisions accordingly (...
在这篇论文中,Spatio-Temporal Sequential Markov Game(STMG)是一个用于多智能体强化学习(MARL)的框架,旨在引导智能体之间的协调并促进Stackelberg均衡(SE)策略的实施。STMG可以被形式化为一个元组,包括智能体集合、状态空间、动作空间、状态转移概率、折扣因子,以及一个新添加的术语,即智能体的动作顺序(第3页)。 STM...
This scenario can be modelled as a co-operative Stackelberg game, where the rescuer acts as a leader in signaling his intent to the rescuee. We present an efficient approach to obtain the optimal signaling policy, as well as its robust counterpart, when the topology of the rescue environment...
Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust ...