Hi, could you please turn on NCCL_DEBUG=INFO to see what error it is? It is hard to tell whether it is a NCCL error from the current log. Also, which NCCL and CUDA versions are you using? NCCL_DEBUG=INFO should also tell you that. In terms of the number of GPUs supported, I ...
IB_GID_INDEX=3 export NCCL_SOCKET_IFNAME=eth export NCCL_DEBUG=INFO export NCCL_IB_HCA=mlx5 export NCCL_IB_TIMEOUT=22 export NCCL_IB_QPS_PER_CONNECTION=8 export NCCL_NET_PLUGIN=none ml.gu8xf.8xlarge-gu108 export NCCL_IB_TC... 模型导出 模型导出组件实现EasyRec模型导出(export)功能。
PyTorch version: 2.5.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.30.5 Libc version: ...
also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0" E0910 14:06:11.450000 1210439 torch/fx/experimental/recording.py:298] If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1 E0910 14:06:11.450000 1210439 torch/fx/experimental/recording.py:298] For more d...
We read every piece of feedback, and take your input very seriously. Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Cancel Create saved search Sign in Sign up Reseting focus {...
Search or jump to... Search code, repositories, users, issues, pull requests... Provide feedback We read every piece of feedback, and take your input very seriously. Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your...
Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.24.0 ...
PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: version 3.26.3 Libc version: ...
[pip3] nvidia-nccl-cu12==2.20.5 [pip3] nvidia-nvjitlink-cu12==12.5.82 [pip3] nvidia-nvtx-cu12==12.1.105 [pip3] torch==2.3.0a0+git63d5e92 [pip3] triton==2.3.1 [conda] numpy 1.24.4 pypi_0 pypi [conda] nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi [conda] nvidia-cuda-cupti...
=1, TORCH_VERSION=1.9.0, USE_CUDA=0, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON,