...prevent AsyncEngineDeadError on input exceeding max_model...
to reproduce, run vLLM with a recent Mistral, reduce the max_model_len and enable chunked prefill, then submit requests larger than 1000 tokens. { "model": "mistralai/Mistral-Small-24B-Instruct-2501", "disable_