A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM - jackaduma/ChatGLM-LoRA-RLHF-PyTo
LoRA (r=8) + rm1INT811GB- RLHF methodBatch sizeModeGRAMSpeed LoRA (r=8) + ppo4FP1623GB- LoRA (r=8) + ppo1INT812GB- Note:ris the lora rank,pis the number of prefix tokens,lis the number of trainable layers,ex/sis the examples per second at training. Thegradient_accumulation_step...
基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调 惊变**to上传19.41MB文件格式zip健康医疗 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调 (0)踩踩(0) 所需:1积分
RLHF 训练 CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ --stage ppo \ --model_name_or_path path_to_your_chatglm_model \ --do_train \ --dataset alpaca_gpt4_zh \ --finetuning_type lora \ --resume_lora_training False \ --checkpoint_dir path_to_sft_checkpoint \ --reward_mode...
RLHF阶段数据 这个阶段重点不是数据,只要提供prompt字段的训练即可训练。核心就是第一个阶段的finetune model 和第二阶段训练好的reward模型,再利用PPO策略优化调参finetnue model使其和人类意图达到对齐的效果. 参考代码: DeepSpeedExamples/applications/DeepSpeed-Chat/training/utils/data/data_utils.py ...
LoRA (r=8) + rm 1 INT8 11GB - RLHF methodBatch sizeModeGRAMSpeed LoRA (r=8) + ppo 4 FP16 23GB - LoRA (r=8) + ppo 1 INT8 12GB - Note: r is the lora rank, p is the number of prefix tokens, l is the number of trainable layers, ex/s is the examples per second at ...
LoRA (r=8) + rm 1 INT8 11GB - RLHF methodBatch sizeModeGRAMSpeed LoRA (r=8) + ppo 4 FP16 23GB - LoRA (r=8) + ppo 1 INT8 12GB - Note: r is the lora rank, p is the number of prefix tokens, l is the number of trainable layers, ex/s is the examples per second at ...
LoRA (r=8) + rm 1 INT8 11GB - RLHF 训练方法批处理大小模式GPU显存速度 LoRA (r=8) + ppo 4 FP16 23GB - LoRA (r=8) + ppo 1 INT8 12GB - 注:r 为LoRA 维数大小,p 为前缀词表大小,l 为微调层数,ex/s 为每秒训练的样本数。gradient_accumulation_steps 参数设置为 1。上述结果均来自于...
LoRA (r=8) + rm 1 INT8 11GB - RLHF methodBatch sizeModeGRAMSpeed LoRA (r=8) + ppo 4 FP16 23GB - LoRA (r=8) + ppo 1 INT8 12GB - Note: r is the lora rank, p is the number of prefix tokens, l is the number of trainable layers, ex/s is the examples per second at ...
https://github.com/zhangsheng93/cMedQA https://github.com/hiyouga/ChatGLM-Efficient-Tuning https://github.com/jackaduma/ChatGLM-LoRA-RLHF-PyTorch https://github.com/THUDM/ChatGLM-6B Releases No releases published Packages No packages published Languages Python100.0%...