"WARNING 07-16 14:51:24 config.py:244] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models 使用awq量化模型,启动时,报图中警告,并且ModelScope模型推理速度非常慢,这应该怎么办?"展开 小小爱吃香菜 2024-07-23 22:44:26 110 0 1 条回答 写回答 为...
/vllm/13$ cd /data/sda/deploy/vllm/vllm (vllm) ailearn@gpts:/data/sda/deploy/vllm/vllm$ python examples/llm_engine_example.py --model /data/sda/models/vicuna-7b-v1.5-awq --quantization awq WARNING 01-14 20:09:03 config.py:175] awq quantization is not fully optimized yet. ...
WARNING 01-02 19:00:12 config.py:171] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. 02.警告:对 awq 量化还未完全优化呢。速度比未量化模型会慢一些。 assert linear_method is NoneAssertionError 03.断言 l_m 为空,断言错误。 参考:github.com...
WARNING 01-02 20:21:59 config.py:179] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 01-02 20:21:59 llm_engine.py:73] Initializing an LLM engine with config: model='/Yi/quantized_model', tokenizer='/Yi/quantized_model', tokenizer...
WARNING 12-17 21:04:57 config.py:440] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. WARNING 12-17 21:04:57 config.py:446] Using AWQ quantization with ROCm, but VLLM_USE_TRITON_AWQ is not set, enabling VLLM_USE_TRITON_AWQ. INFO...
与未量化的模型对比一下呢。此回答整理自钉群“魔搭ModelScope开发者联盟群 ①”
WARNING 04-11 18:00:30 config.py:211] gptq quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 04-11 18:00:30 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='./data/models/Qwen1.5-32B-Chat-GPTQ-Int4...
[128-half-expected_outputs0-/root/autodl-tmp/InternVL2-26B-AWQ] WARNING 08-09 11:08:27 config.py:1483] Casting torch.bfloat16 to torch.float16. WARNING 08-09 11:08:27 config.py:286] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. ...
=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None) WARNING 04-28 20:25:27 config.py:177] awq quantization is not fully optimized yet. ...