Run Locally: the instructions for running LLM locally on CPU and GPU, with frameworks like llama.cpp and Ollama; Deployment: the demonstration of how to deploy Qwen for large-scale inference with frameworks like vLLM, TGI, etc.; Quantization: the practice of quantizing LLMs with GPTQ, AWQ...
(Note: You do not need to quantize the Q-LoRA fine-tuned model because it is already quantized.) If you use LoRA, please follow the above instructions to merge your model before quantization. We recommend using auto_gptq to quantize the finetuned model. pip install auto-gptq optimum ...