https://github.com/OpenLMLab/MOSS-RLHF Secrets of RLHF in Large Language Models Part I: PPO Ablustrund/moss-rlhf-reward-model-7B-zh · Hugging Face 小虎AI珏爷:从人的反馈中强化学习(RLHF)-简单理解 …
为了解决这一问题,MOSS-RLHF框架应运而生,其中PPO算法的应用成为了关键。 一、MOSS-RLHF框架简介 MOSS-RLHF(Model-Oriented Science Studies - Reinforcement Learning with Human Feedback)框架是一种以人为本的人工智能对齐方法。该框架的核心思想是通过人类反馈强化学习(RLHF)来训练人工智能模型,使其行为与人类的...
而基于人类反馈的强化学习(RLHF)则被视为支撑这一目标的关键技术。 RLHF的技术路线通常包括衡量人类偏好的奖励模型、优化策略模型输出的近端策略优化(Proximal Policy Optimization,PPO)以及提高逐步推理能力的过程监督。在这些技术中,PPO算法扮演着至关重要的角色。本文将对MOSS-RLHF框架中的PPO算法进行深入剖析,探讨其...
# SPDX-License-Identifier: Apache-2.0 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 \ accelerate launch \ --config_file accelerate_config.yaml \ train_ppo.py \ --tokenizer_name_or_path models/moss-rlhf-reward-model-7B-zh \ --policy_model_path models/sft_model \ --critic_model_path models/...
Saved searches Use saved searches to filter your results more quickly Cancel Create saved search Sign in Sign up Reseting focus {{ message }} trillionmonster / MOSS-RLHF Public forked from OpenLMLab/MOSS-RLHF Notifications You must be signed in to change notification settings Fork 0 ...
train_ppo.py train_ppo_en.sh train_ppo_zh.sh train_rm.py train_rm.sh utils.py Breadcrumbs MOSS-RLHF / Latest commit Cannot retrieve latest commit at this time. History History File metadata and controls 40 lines (39 loc) · 1.14 KB ...
model-7B-en/recover \ --critic_model_path models/moss-rlhf-reward-model-7B-en/recover \ --model_save_path outputs/models/ppo/ppo_model_en \ --data_path data/ppo_data \ --seed 42 \ --maxlen_prompt 2048 \ --maxlen_res 512 \ --lr 5e-7 \ --critic_lr 1.5e-6 \ --gamma 1...
forked fromOpenLMLab/MOSS-RLHF NotificationsYou must be signed in to change notification settings Fork0 Star0 Code Pull requests Actions Projects Security Insights Additional navigation options Files main assets data models ppo rm .gitignore
csxrzhang / MOSS-RLHF Public forked from OpenLMLab/MOSS-RLHF Notifications Fork 0 Star 0 Code Pull requests Actions Projects Security Insights Footer © 2024 GitHub, Inc. Footer navigation Terms Privacy Security Status Docs Contact Manage cookies Do not share my personal information...