🐛 Describe the bug When I serve llama3.1-70B quantized w4a16, with the following parameters: --max-model-len: 127728 --enable-prefix-caching: True --enable-chunked-prefill: False --kv-cache-dtype: fp8_e4m3 VLLM_ATTENTION_BACKEND: FLASHIN...
Actions Projects3 Security2 Insights Additional navigation options New issue Closed as not planned Your current environment accelerate 0.27.2 torch 2.1.2 transformers 4.38.2 pydantic 2.6.1 pydantic_core 2.16.2 pydantic-settings 2.0.3 vllm 0.3.2 ...