--lora_target_modules ALL \ --model_name 小黄 'Xiao Huang' \ --model_author 魔搭 ModelScope \ --deepspeed default-zero3 # 单卡A10/3090可运行的例子 (Qwen2.5-7B-Instruct) # 显存占用:24GB CUDA_VISIBLE_DEVICES=0 swift sft \ --model_type qwen2_5-7b-instruct \ --model_id_or_path ...
--check_dataset_strategy warning \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout_p 0.05 \ --lora_target_modules ALL \ --gradient_checkpointing true \ --batch_size 1 \ --weight_decay 0.01 \ --learning_rate 1e-4 \ --gradient_accumulation_steps 16 \ --max_grad_norm 0.5 \ ...
微调脚本: LoRA # https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_32b_chat/lora_mp/sft.sh# Experimental environment: A100# 2*49GB GPU memoryCUDA_VISIBLE_DEVICES=0,1 \swift sft \--model_type qwen1half-32b-chat \--sft_type lora \--tuner_backend swif...
--lora_target_modules ALL \ --model_name 小黄 'Xiao Huang' \ --model_author 魔搭 ModelScope \ --deepspeed default-zero3 # 单卡A10/3090可运行的例子 (Qwen2.5-7B-Instruct) # 显存占用:24GB CUDA_VISIBLE_DEVICES=0 swift sft \ --model_type qwen2_5-7b-instruct \ --model_id_or_path ...
target_modules:需要训练的模型层的名字,主要就是attention部分的层,不同的模型对应的层的名字不同,可以传入数组,也可以字符串,也可以正则表达式。 r:lora的秩,具体可以看Lora原理 lora_alpha:Lora alaph,具体作用参见Lora原理 Lora的缩放是啥嘞?当然不是r(秩),这个缩放就是lora_alpha/r, 在这个LoraConfig中缩放...
config=LoraConfig(task_type=TaskType.CAUSAL_LM,target_modules=["q_proj","k_proj","v_proj","...
[INFO:swift] Setting lora_target_modules: ['c_proj', 'w2', 'c_attn', 'w1'] Traceback (most recent call last): File "/root/autodl-tmp/swift/examples/pytorch/llm/scripts/qwen_14b_chat_int8/qlora/../../../llm_sft.py", line 7, in <module> best_ckpt_dir = sft_main() File...
lora_rank: 64 lora_alpha: 16 lora_dropout: 0.05 target_modules: '.*wq|.*wk|.*wv|.*wo|.*w1|.*w2|.*w3' freeze_exclude: ["*wte*", "*lm_head*"] # configuration items copied from Qwen rotary_pct: 1.0 rotary_emb_base: 10000 ...
lora_alpha: 16 lora_dropout: 0.05 target_modules: '.*wq|.*wk|.*wv|.*wo|.*w1|.*w2|.*w3' # freeze_exclude: ["wte", "lm_head"] # configuration items copied from Qwen rotary_pct: 1.0 rotary_emb_base: 10000 kv_channels: 128 ...
lora_register_forward_hook ... ['word_embeddings', 'input_layernorm'] lora_target_modules ... [] loss_scale ... None loss_scale_window ... 1000 lr ... None lr_decay_iters ...