[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. · Issue #5060 · vllm-project/vllm (github.com) 以及添加参数 ENGINE_ITERATION_TIMEOUT_S ## 设置为 180 timeout=configuration.request_timeout or 180.0...
VLLM_ENGINE_ITERATION_TIMEOUT_S=180 \ GLOO_SOCKET_IFNAME=enp5s0 \TP_SOCKET_IFNAME=enp5s0 \ NCCL_SOCKET_IFNAME=enp5s0 \ NCCL_DEBUG=info \ NCCL_NET=Socket \ NCCL_IB_DISABLE=0 WORKDIR /server COPY . . RUN apt-get update && apt -y install \ dos2unix tzdata vim tree curl wget \...
return_value = task.result() File "/usr/local/lib/python3.10/dist-packages/vllm-0.5.3.post1+cpu-py3.10-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 630, in run_engine_loop async with asyncio_timeout(ENGINE_ITERATION_TIMEOUT_S): File "/usr/local/lib/python3.10/dist-...
ERROR:asyncio:Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f4124ad08b0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f412c39bac0>>) handle: <Handle functools.partial(<function _...
设置环境变量VLLM_ENGINE_ITERATION_TIMEOUT_S为更大的值(如180秒),以延长引擎每次迭代的超时时间。 在请求端配置中延长请求时间。 禁用自定义AllReduce:在启动参数中添加--disable-custom-all-reduce,可能有助于解决某些并发请求导致的错误。 4. 应用解决方案 根据具体情况选择上述解决方案中的一种或多种进行尝试...
FROMvllm/vllm-openai:v0.6.2ENVTZ=Asia/Shanghai\DEBIAN_FRONTEND=noninteractive\VLLM_ENGINE_ITERATION_TIMEOUT_S=180\GLOO_SOCKET_IFNAME=eth0\TP_SOCKET_IFNAME=eth0\NCCL_SOCKET_IFNAME=eth0\NCCL_DEBUG=info\NCCL_NET=Socket\NCCL_IB_DISABLE=0WORKDIR/serverCOPY. .RUNapt-get update && apt -y in...
我添加了额外的环境变量:VLLM_CPU_KVCACHE_SPACE=4和额外的启动参数:python3 -m vllm.entrypoints....
VLLM_ENGINE_ITERATION_TIMEOUT_S=180 \ GLOO_SOCKET_IFNAME=ens18 \ TP_SOCKET_IFNAME=ens18 \ NCCL_SOCKET_IFNAME=ens18 \ NCCL_DEBUG=info \ NCCL_NET=Socket \ NCCL_IB_DISABLE=0 \ NODE_TYPE=worker \ HEAD_NODE_ADDRESS=127.0.0.1
当有请求时,会按照PP配置数量,创建requests_in_progress,持有PP_size个engine.step(ve)的消息。 后续会通过: async with asyncio_timeout(ENGINE_ITERATION_TIMEOUT_S): done, _ = await asyncio.wait( requests_in_progress, return_when=asyncio.FIRST_COMPLETED) ...
[Engine iteration timed out. This should never happen! ] [Bug]: Using CPU for inference, an error occurred. [Engine iteration timed out. This should never happen! ] Aug 21, 2024 Contributor ilya-lavrenov commented Aug 21, 2024 Hi @liuzhipengchd You can try to run on CPU via ...