强化学习在许多领域都取得了成功,比如 DreamerV3 可以自主从 Minecraft 中采集钻石。 然而,这些基于图像的 RL 控制模型在 iGibson OOD 测试中的表现都并不好。 Part2动机 本文认为,基于图像的控制模型都需要看到所有可能的示例,否则需要额外的机制来处理 OOD 和从模拟到现实的任务。 然而,语言可以良好的表示高级概念...
(2)zero-shot reward model 的效果。就是用这个 reward 做 RL。Oracle 是直接将是否到达目标 xyz 位置当做奖励,Curiosity-RL 也是一种仅仅能获取 image 的方法,和本文类似。 (3)Multi-task Policy Results。rollout 学到的 policy,50次平均,看 reward(以 Oracle 作为 metric)。提出的模型是最好的。
master 1Branch0Tags Code Folders and files Name Last commit message Last commit date Latest commit zjr2000 Fix rl issues Dec 8, 2023 91c237e·Dec 8, 2023 History 6 Commits cfgs Fix rl issues Dec 8, 2023 data Release May 17, 2023 ...
http://www.isrl.uiuc.edu/~amag/langev/paper/vogt05compositionalGrounded_AIJ.html.Vogt P (2005) The emergence of compositional structures in perceptually grounded language games. Artificial Intelligence 167: 206-242.Vogt, P. The emergence of compositional structures in perceptually grounded language ...
To run baseline models (rule, IL, RL, IL+RL) from the paper, please refer to theREADME.mdin thebaseline_modelsfolder. To read more about how the sim-to-real transfer of agents trained on WebShop to other environments works, please refer to theREADME.mdin thetransferfolder. ...