--gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --save_strategy "epoch" \ #也可以采用“step",需要多配置save_steps --save_total_limit 5 \ #最大checkpoint个数 --learning_rate 2e-5 \ --weight_decay 0. \ #AdamW优化器参数 --warmup_ratio 0.03 \ --lr_scheduler_type "l...
logging_dir="logs", logging_strategy="steps", logging_steps=50, # 每50个step打印一次log evaluation_strategy="steps", eval_steps=500, # 每500个step进行一次评估 save_steps=500, save_total_limit=2, load_best_model_at_end=True, deepspeed=deepspeed_config, # deepspeed配置文件的位置 report_to...
--save_steps 100 \ --eval_steps 100 \ --learning_rate 5e-5 \ --max_grad_norm 0.5 \ --num_train_epochs 2.0 \ --dev_ratio 0.01 \ --evaluation_strategy steps \ --load_best_model_at_end \ --plot_loss \ --fp16 \ 环境配置等等可以参考各个开源仓库的readme,写的比较好的是Tigerbot...
deepspeed--master_addr10.255.19.82--master_port29500--hostfile=$hostfile fine-tune.py--report_to"none"--data_path"/data2/xinyuuliu/Baichuan2-main/fine-tune/data/全网评价总结训练数据.json"--model_name_or_path"/data1/xinyuuliu/Baichuan2-13B-Chat"--output_dir"output_lora_summary"--model_...
save_strategy epoch \--learning_rate 2e-4\--lr_scheduler_type constant \--adam_beta10.9\--adam_beta20.98\--adam_epsilon 1e-8\--max_grad_norm1.0\--weight_decay 1e-4\--warmup_ratio0.0\--logging_steps1\--gradient_checkpointing True \--deepspeed ds_config.json \--bf16 True \--tf...
--save_strategy epoch \ --learning_rate 2e-4 \ --lr_scheduler_type constant \ --adam_beta1 0.9 \ --adam_beta2 0.98 \ --adam_epsilon 1e-8 \ --max_grad_norm 1.0 \ --weight_decay 1e-4 \ --warmup_ratio 0.0 \ --logging_steps 1 \ ...
"save_strategy":"epoch","learning_rate":1e-6,"save_total_limit":1,"num_train_epochs":1,"warmup_ratio":0.05,"weight_decay":0.01,"plot_loss":true,"accelerator_config": {"dispatch_batches":false},"use_fast_tokenizer":true,"resume_from_checkpoint":false,"report_to":"wandb","deepspeed...
save_total_limit=4, logging_steps=5, save_strategy='steps', weight_decay=0, push_to_hub=False, disable_tqdm=True, no_cuda=not config.get('platform').get('use_gpu'), gradient_checkpointing=True, output_dir="./outputs_ray",
(fEval Epoch {epoch}/{self.epochs} loss {loss_mean}) # epoch_bar.update() def save_model(self, path: str, only_rank0: bool = False, tokenizer: Optional[PreTrainedTokenizerBase] = None) -> None: self.strategy.save_model(model=self.model, path=path, only_rank0=only_rank0, tokenizer...
torch.save(path,model_state_dict ) 三、rwkv-lora微调 rwkv的微调主要的重点内容在于数据的整理(整理成模型可训练的格式)、训练环境的搭建、训练代码的修改和最后的模型效果评估,其中至于怎么样微调才能获得比较好的效果,本文不予讨论。由于rwkv支持2中数据格式,一种是question+answer拼接,另外一种是instruction+in...