基本思路是:我们针对Gemma-7B模型进行ORPO+QLora+Flash Attention 2优化,利用Huggingface的Transformers、Transformer Reinforcement Learning(TRL)、Parameter-Efficient Fine-Tuning框架、QLora以及TRL的ORPOTrainer对模型进行优化,性能指标监控使用wandb。使用的数据集是wenbopan/Chinese-dpo-pairs。
Fine Tuning the Model: Function Validating the Model Performance: Function Main Function Initializing WandB Importing and Pre-Processing the domain data Creation of Dataset and Dataloader Neural Network and Optimizer Training Model and Logging to WandB Validation and generation of Summary Examples of the...