kv_cache_dtype

2025-03-01 13:29:48

拼音 [ 拼音 ]

[Bug]: KV Cache Error with KV_cache_dtype=FP8 and Large...

🐛 Describe the bug When I serve llama3.1-70B quantized w4a16, with the following parameters: --max-model-len: 127728 --enable-prefix-caching: True --enable-chunked-prefill: False --kv-cache-dtype: fp8_e4m3 VLLM_ATTENTION_BACKEND: FLASHIN...
...Namespace' object has no attribute 'kv_cache_dtype...

Actions Projects3 Security2 Insights Additional navigation options New issue Closed as not planned Your current environment accelerate 0.27.2 torch 2.1.2 transformers 4.38.2 pydantic 2.6.1 pydantic_core 2.16.2 pydantic-settings 2.0.3 vllm 0.3.2 ...