Universal Successor Features Approximators (USFAs) 根据前面的介绍,我们定义 universal value functions (UVFs) 为 Q^*(s, a, \boldsymbol{w}) ,令 \pi_{\boldsymbol{w}} 为任务 \boldsymbol{w} 下的一个最优策略,则该 UVFs 也可表示为 Q^{\pi_{\bol
Universal Successor Features Approximators [Borsa et al., 2019] and Universal Successor Representations [Ma et al., 2018] combine the benefits of SF and UVFA to further generalise across goals. For succession, the look-back period is 90 calendar days and functions similarly, except in the case...
playing around with universal successor feature approximators (USFAs), universal value function approximators (UVFAs), successor features and generalized policy iteration (SF&GPI) - tomov/MTRL
以上是一个常用的思路,但是在这篇文章中,作者采用 universal value function approximators (UVFAs) 技术,并且结合 successor features (SFs) 以及generalised policy improvement (GPI) ,提出了 universal successor features approximators (USFAs)。USFAs 可以学到更一般的值函数,它将每个任务和该任务对应的策略进行参...