ghost I am trying to .merge_and_unload() a Llama 2 peft model to use it for inference in Databricks. Here is my code for training the model; I do add a pad token which I think is the cause of the error. target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj'...
from peft import AutoPeftModelForCausalLMmodel = AutoPeftModelForCausalLM.from_pretrained( args.output_dir, low_cpu_mem_usage=True, return_dict=True, torch_dtype=torch.float16, device_map=device_map, )# Merge LoRA and base modelmerged_model = model.merge_and_unload()# S...
It would be helpful to describe both within peft documentation. More specifically: Highlight thatmerge_and_unloaddoes not work with AutoModelforCausalLM. Clarify how AutoModelforCausalLM actually loads the adapter (I assume it keeps the adapter weights unmerged with the base model, hence slower i...
I have fine-tuned the model using Lora, the config is available here: "Lukee4/biogpt-2020_2labels" I used BioGPTforSequenceClassification and the fine-tuning worked fine, the results on the test data improved after fine-tuning in compari...
model = AutoModelForXXXXX.from_pretrained() model = PeftModel.from_pretrained(model, peft_model_id) model = model.merge_and_unload() model.save_pretrained("merged_model") model = AutoModelForXXXXX.from_pretrained("merged_model", load_in_8bit=True) # do inference cc @younesbelkada for...