Reward-free RL via Sample-Efficient Representation Learning 讲座摘要:As reward-free reinforcement learning (RL) becomes a powerful framework for a variety of multi-objective applications, representation learning arises as an effective technique to deal with the curse of dimensionality in reward-free RL...
To isolate the challenges of exploration, we propose a new "reward-free RL" framework. In the exploration phase, the agent first collects trajectories from an MDP M without a pre-specified reward function. After exploration, it is tasked with computing near-optimal policies under for M for a...
Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new “reward-free RL” framework. In the exploration phase, the agent...
Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new “reward-free RL” framework. In the exploration phase, the agent...
文章提出的这个范式,在第一个探索阶段只做 reward-free 的探索,这个交互和标准的 RL 交互的区别就在于环境不返回奖励。相比于标准 RL,其他方面都一样,比如都具有一个固定的初始状态分布,并且要从该分布出发根据 transition dynamics 来访问各个状态。 3、Overview ...
Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs 线性混合MDP的最优无水平无报酬探索 We study reward-free reinforcement learning (RL) with linear function approximation, where the agent works in two phases 研究了线性函数无报酬强化学习(RL 近似值,其中代理分两个阶段工作...
文章提出了要对policy space进行压缩的问题,旨在找到一种压缩方法,获得一个compressed policy space,使得常规的RL任务prediction/control 能够高效且不失最优性地进行 显而易见,本文的动机具有合理性和重要意义,但是同时并不是一个简单的问题 Preliminaries
During RL, we need to evaluate the agent A many times. If we want to use a learned reward function we may need to evaluate A more times. And if we want to train a policy which remains benign off of the training distribution, we may need to evaluate A more times (e.g. since we ...
Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new "reward-free RL" framework. In the exploration phase, the agent ...
In reward-free reinforcement learning (RL), an agent explores the environment first without any reward information, in order to achieve certain learning goals afterwards for any given reward. In this paper we focus on reward-free RL under low-rank MDP models, in which both the representation ...