Successor Representation最早由MIT的Peter Dayan于1993年提出(https://core.ac.uk/reader/188812946) [1],以下我们简称SR。考虑到TD算法的核心是估计从当前时刻开始到未来的累积奖励值(value function),Dayan认为这个值和后继状态的相似度关系密切。如果有一个很好的表征(representation)能够描述当前状态到未来某个状态...
本文先通过在MDP上的随机采样来学习一个表示(successor representation),并且从该表示下得到一个关于环境的刻画(一个状态空间上的diffusion matrix),从而得到与环境有关的辅助奖励函数(eigenpurpose),最大化该奖励函数能够形成相应的更为抽象的行动(eigenoption),进而产生一个基于option的分层强化学习算法(参考【强化学习...
这两篇文章都是基于强化学习中的Successor Representation (SR)概念发展出来的,今天我们来详细探讨一下这里的Successor Representation。 【背景及发展脉络】 一般强化学习算法分为两类:model-based和model-free。Model-based的算法通过reward函数和状态转移函数的学习来估计值函数;而后者不考虑模型的具体表达形式,从状态-...
We examine an intermediate algorithmic family, the successor representation, which balances flexibility and efficiency by storing partially computed action values: predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. The ...
In this paper, we argue that the successor representation, which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions. To support our claim, we take a big picture view of recent ...
general successor-style representation, together with a Bellman equation that connects multiple sources of information within this representation, including different latent states, policies, and reward functions. The new representation is highly expressive: for example, it lets us...
The computational logic of the successor representation 尽管上面我们在线性和非线性结构的相对优点之间进行了区分,但事实证明,任何价值函数都可以表示为“预测”特征的线性组合(Dayan, 1993): 其中M(s, s')是SR,定义为状态s'的折扣性,是在状态s发起的轨迹上平均的结果。SR可以直观地看作是一种预测图,它根据不...
If instead of a value function, a successor representation is learned for some fixed dynamics, then any value function defined on any linear reward function can be computed efficiently. This setting can be particularly useful for reusing options in some hierarchical planning settings. Unfortunately, ...
Building upon the successor representation (SR) [33], which encodes future state visitation frequencies in a small state space environment, an alternative approach has been proposed for transfer learning. The SR encodes the dynamics of the environment; therefore, it can be constructed without knowledg...
其核心思想是基于successor representation 学习 state-action abstraction, abstraction 为agent 在latent space提供intrinsic reward,进而提高探索效率。 基于所提问题,这篇工作主要包含以下三个部分,我们依次对论文内容进行介绍, (1)Abstracting states (successor representation, differentiable manner) (2)Abstracting ...