WARNING 01-14 20:09:03 config.py:175] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 01-14 20:09:03 llm_engine.py:73] Initializing an LLM engine with config:
WARNING 01-02 19:00:12 config.py:171] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. 02.警告:对 awq 量化还未完全优化呢。速度比未量化模型会慢一些。 assert linear_method is NoneAssertionError 03.断言 l_m 为空,断言错误。 参考:github.com...
WARNING 12-17 21:04:57 config.py:440] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. WARNING 12-17 21:04:57 config.py:446] Using AWQ quantization with ROCm, but VLLM_USE_TRITON_AWQ is not set, enabling VLLM_USE_TRITON_AWQ. INFO ...
WARNING 01-02 20:21:59 config.py:179] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 01-02 20:21:59 llm_engine.py:73] Initializing an LLM engine with config: model='/Yi/quantized_model', tokenizer='/Yi/quantized_model', tokenizer...
"WARNING 07-16 14:51:24 config.py:244] awq quantization is not fully optimized yet. The ...
When using vLLM from Python code, pass thequantization=awqparameter, for example: fromvllmimportLLM, SamplingParams prompts = ["Hello, my name is","The president of the United States is","The capital of France is","The future of AI is", ] sampling_params = SamplingParams(temperature=0....
max_loras=1,max_lora_rank=16,lora_extra_vocab_size=256,lora_dtype='auto',max_cpu_loras=None,engine_use_ray=False,disable_log_requests=False,max_log_len=None)WARNING 04-28 20:25:27 config.py:177]awq quantization is not fully optimized yet. The speed can be slower than non-quantized...
322 322 "%s quantization is not fully " 323 323 "optimized yet. The speed can be slower than " 324 324 "non-quantized models.", self.quantization) 325 + if (self.quantization == "awq" and is_hip() 326 + and not envs.VLLM_USE_TRITON_AWQ): 327 + logger.warning( 328...