我们复制了DeepSeek-R1-Zero和DeepSeek-R1在7B模型上的训练,只用了8000个示例,结果出奇地强大。从Qwen2.5-Math-7B(基础模型)开始,我们直接在其上执行RL。没有SFT,没有奖励模型,只是8000个数学示例用于验证,最终模型在AIME上达到了33.3%的通过率,在AMC上达到了62.5%,在数学上达到了77.2%,超过了Qwen2.5-math-7B...
Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more OK, Got it. We can't find that page. You can search Kaggle above or visit our homepage.
deepseek-math-7b-rl lewtun/deepseek-math-7b-rl Model CardCode (0)Discussion (0)Competitions (0) Oh no! Loading items failed. We are experiencing some issues. Please try again, if the issue is persistent pleasecontact us. Try again