eval_sample_packing: # you can set these packing optimizations AFTER starting a training at least once. # The trainer will provide recommended values for these values. sample_packing_eff_est: total_num_tokens: # if you want to use 'lora' or 'qlora' or leave blank to train all parameters...
Training script with DeepSpeed ZeRO-3: finetune.sh. If you are do not have enough GPU memory: Use LoRA: finetune_lora.sh. We are able to fit 13B training in 8-A100-40G/8-A6000, and 7B training in 8-RTX3090. Make sure per_device_train_batch_size*gradient_accumulation_steps is the...