This branch is 44 commits behind Unakar/Logic-RL:main.Folders and files Latest commit ShadeCloak data 216c3c4· Feb 5, 2025 History30 Commits data/kk/instruct data Feb 5, 2025 docker main RL done Feb 2, 2025 doc
Reproduce R1 Zero on Logic Puzzle. Contribute to kekewind/Logic-RL development by creating an account on GitHub.
Qwen2.5-7B-Logic-RL (ours) 0.99 0.99 0.94 0.92 0.91 0.80 0.67 Installation conda create -n logic python=3.9 pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121 pip3 install vllm==0.6.3 ray pip3 install flash-attn --no-build-isolation pip install -e . # ...
Reproduce R1 Zero on Logic Puzzle. Contribute to kekewind/Logic-RL development by creating an account on GitHub.
此外,课程学习的数据设计似乎还是有用的。我们固定数据混合ratio,调控不同难度的先后出现顺序,发现循序渐进,每个stage都比上次难一点,对RL学习收敛性能很有助益: Rule based reward设计 我们是反复观察模型输出,与其斗智斗勇里不断完善我们的rule的,代码可以参考我们的github:https://github.com/Unakar/Logic-RL/blob...
GitHub地址:GitHub - Unakar/Logic-RL: Reproduce R1 Zero on Logic Puzzle 1、Logic-RL的特点 Logic-RL 项目在基于规则的强化学习的基础上,增强了以下几个方面: >>不确定性标记(Uncertainty Marking):标记模棱两可的步骤,以便验证。 >>渐进式总结(Progressive Summarization):维护中间结论。
Code Edit Unakar/Logic-RL official 2,100 Tasks Edit Math reinforcement-learning Reinforcement Learning Reinforcement Learning (RL) Datasets Edit Add Datasets introduced or used in this paper Results from the Paper Edit Submit results from this paper to get state-of-the-art GitHub badges ...
https://github.com/Unakar/Logic-RLarXiv reCAPTCHA https://github.com/Jiayi-Pan/TinyZero GitHub - volcengine/verl: verl: Volcano Engine Reinforcement Learning for LLMs
Reproduce R1 Zero on Logic Puzzle. Contribute to Unakar/Logic-RL development by creating an account on GitHub.
github.com/THU-KEG/Long 微软亚洲研究院、Ubiquant提出了一种基于规则的强化学习方法,通过合成逻辑谜题增强7B模型的推理能力,实现了对复杂数学基准的稳定训练和泛化。 【Bohr精读】 j1q.cn/xFv5rfIS 【arXiv链接】 arxiv.org/abs/2502.1476 【代码地址】 unakar666@gmail.com 清华大学、南京大学和马里兰大学提出...