简单的MPC+model based 方案 MPC+model based model based + planning 一般model based 在不考虑限制跟环境交互的次数的前提下,model free的性能就是model based的上限。 所以如果需要考虑最终性能超过model free,一般还是得考虑planning。例如MCTS。 AlphaZero和Muzero就是一个例子,下面主要讲下Muzero。 MuZero与Alpha...
Model-Based Offline PlanningArthur ArgensonGabriel Dulac-ArnoldInternational Conference on Learning Representations
However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, ...
其他的 offline RL 工作: Argenson, Arthur, and Gabriel Dulac-Arnold. "Model-based offline planning." arXiv preprint arXiv:2008.05556 (2020). Kidambi, Rahul, et al. "Morel: Model-based offline reinforcement learning."Advances in neural information processing systems33 (2020): 21810-21823. Yu, ...
文章要点:这篇文章用model based方法去做offline RL。主要分为两步,第一步是用offline data学一个pessimistic MDP (P-MDP),第二步就是用这个P-MDP去学一个near-optimal policy。P-MDP的性质保证了这个near-optimal policy是真实环境里的performance的lower bound。具体来说,因为dataset不可能覆盖整个状态动作空间,...
offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; and (b) learning a ...
Model-Based Offline Planning Arthur Argenson, Gabriel Dulac-Arnold Key: model-based, offline OpenReview: 8, 7, 5, 5 ExpEnv: RL Unplugged(RLU), d4rl dataset Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation Justin Fu, Sergey Levine Key: model-based, offline OpenReview...
Abstract We propose a “plan online and learn offline” framework for the setting where an agent, with an internal model, needs to continually act and learn in the world. Our work builds on the synergistic relationship between local model-based control, global value function learning, and explor...
Model-Based在90年代初其实就有学者对其进行过研究。像Q-Planning、Dyna-Q都是最早期的Model-Based RL。Model这个概念在强化学习里面其实是比较特殊的,在平时说训练监督学习、无监督学习等算法中model就是预测数据模型本身,而在强化学习中,最终的输出是通过策略π \piπ实现的,但是我们从来不会把ploicyπ \piπ称...
planning 数据增强 白盒模型 Value-aware and Policy-aware Model Learning 在其他RL形式中的model-based方法 offline RL goal-conditioned RL multi-agent RL Meta RL 总结 这个系列主要记录一下感兴趣的一些强化学习方向的调研。 这篇用来记录一下model-based RL方向主流方法的调研,主要是根据上海交大张伟楠老师和南...