Citation @misc{logic-rl, author = {Tian Xie and Qingnan Ren and Yuqian Hong}, title = {Logic-RL}, howpublished = {https://github.com/Unakar/Logic-RL}, note = {Accessed: 2025-02-03}, year = {2025} } 🙏 Acknow
[project.urls] Homepage = "https://github.com/volcengine/verl" # --- # tool.setuptools - Additional config # --- [tool.setuptools] # True means `setuptools` will attempt to include all relevant files in package_data automatically. # This corresponds to `include_package_data=True` in se...
Rule based reward设计 我们是反复观察模型输出,与其斗智斗勇里不断完善我们的rule的,代码可以参考我们的github:https://github.com/Unakar/Logic-RL/blob/main/verl/utils/reward_score/kk.py 实际上,看到很多开源复现的rule都不是很严苛,应该会出现各种匪夷所思的hack才是。比如我们观察到,如果仅依赖把thinking...
Results from the Paper Edit Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers. Methods Edit No methods listed for this paper. Add relevant methods here Contact us on: hello@paperswithcode.com . Papers With Code is...
GitHub地址:GitHub - Unakar/Logic-RL: Reproduce R1 Zero on Logic Puzzle 1、Logic-RL的特点 Logic-RL 项目在基于规则的强化学习的基础上,增强了以下几个方面: >>不确定性标记(Uncertainty Marking):标记模棱两可的步骤,以便验证。 >>渐进式总结(Progressive Summarization):维护中间结论。
https://github.com/Unakar/Logic-RLarXiv reCAPTCHA https://github.com/Jiayi-Pan/TinyZero GitHub - volcengine/verl: verl: Volcano Engine Reinforcement Learning for LLMs
编程算法https网络安全github强化学习 作为强化学习(Reinforce Learning,RL)的初学者,常常想将RL的理论应用于实际环境,以超级马里奥为例,当看着自己训练的AI逐渐适应环境,得分越来越高,到最后能完美躲避所有障碍,快速通关时,你肯定能体会到算法的魅力,成就感十足!本文不拘泥于DQN(Deep Q Learning Network)算法的深层原...
GitHub地址:GitHub - Unakar/Logic-RL: Reproduce R1 Zero on Logic Puzzle 1、Logic-RL的特点 Logic-RL 项目在基于规则的强化学习的基础上,增强了以下几个方面: >> 不确定性标记 (Uncertainty Marking):标记模棱两可的步骤,以便验证。 >> 渐进式总结 (Progressive Summarization):维护中间结论。 >> 自我验证 ...
Reproduce R1 Zero on Logic Puzzle. Contribute to Unakar/Logic-RL development by creating an account on GitHub.
Reproduce R1 Zero on Logic Puzzle. Contribute to kekewind/Logic-RL development by creating an account on GitHub.