Universal Successor Features Approximators (USFAs) 根据前面的介绍,我们定义 universal value functions (UVFs) 为 Q^*(s, a, \boldsymbol{w}) ,令 \pi_{\boldsymbol{w}} 为任务 \boldsymbol{w} 下的一个最优策略,则该 UVFs 也可表示为 Q^{\pi_{\boldsymbol{w}}}(s, a, \boldsymbol{w}) 。
以上是一个常用的思路,但是在这篇文章中,作者采用 universal value function approximators (UVFAs) 技术,并且结合 successor features (SFs) 以及generalised policy improvement (GPI) ,提出了 universal successor features approximators (USFAs)。USFAs 可以学到更一般的值函数,它将每个任务和该任务对应的策略进行参...
Papers:深入理解:迁移强化学习之Successor Representation Papers:论文解读:Successor Features for Transfer in Reinforcement Learning Papers:论文解读:Transfer in DRL Using Successor Features and Generalised Policy Improvement Papers:论文解读:Universal Successor Features Approximators...
playing around with universal successor feature approximators (USFAs), universal value function approximators (UVFAs), successor features and generalized policy iteration (SF&GPI) - tomov/MTRL