runpod-vllm-nccl-diagnostic This repository demonstrates and reproduces a specific multi-GPU environment issue encountered when running NCCL tests on NVIDIA L40S GPUs in certain data centers. While most environments work seamlessly, some pods exhibit a potential hardware configuration difference that ...
export NCCL_P2P_DISABLE=1 Backend-Specific Flags: Depending on your LLM backend, additional flags may further mitigate or confirm P2P connectivity problems: vLLM: --disable-custom-all-reduce SGLang: --enable-p2p-check or --disable-custom-all-reduce (Consult your backend’s documentation for ...
例如,如果vLLM有TP进程,而您的DeepSpeed进程组也具有TP进程,它们可以在不复制的情况下共享cudaTensor。
The model executes correctly but gives this error at the end. Does any one know what might be the issue? Exception ignored in: <function NCCLCommunicator.__del__ at 0x7fb652b996c0> Traceback (most recent call last): File "/workspace/vllm...
vllm_nccl .gitignore LICENSE README.md setup.py Repository files navigation README Apache-2.0 license NOTE: This repo is deprecated withthis fixto the main vLLM repo. vllm-nccl Manages vllm-nccl dependency Definepackage_name,nccl_version,vllm_nccl_verion ...
if "vllm-nccl-cu12" in req: req = req.replace("vllm-nccl-cu12", f"vllm-nccl-cu{cuda_major}") elif ("vllm-flash-attn" in req and not (cuda_major == "12" and cuda_minor == "1")): if ("vllm-flash-attn" in req ...
environ["VLLM_INSTALL_NCCL"].split("+")assert nccl_major_version in ["2.20", "2.18", "2.17", "2.16"], f"Unsupported nccl major version: {nccl_major_version}"assert cuda_major_version in ["11", "12"], f"Unsupported cuda major version: {cuda_major_version}"...
NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details), Traceback (most recent call last): (VllmWorkerProcess pid=235) ERROR 08-23 08:25:25 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_worker_utils.py", line ...
vLLM: --disable-custom-all-reduce SGLang: --enable-p2p-check or --disable-custom-all-reduce (Consult your backend’s documentation for more details.) Note: If you are only using a single GPU, these issues are unlikely to occur, as no inter-GPU communication via NCCL is necessary. Contr...
vllm-nccl Manages vllm-nccl dependency Definepackage_name,nccl_version,vllm_nccl_verion runpython setup.py sdist runtwine upload dist/* Releases1 v0.1.0Latest Apr 10, 2024 Packages No packages published Contributors2 youkaichaoyoukaichao