这个错误可能是由于NCCL_P2P_LEVEL设置不正确导致的。你可以尝试将NCCL_P2P_LEVEL设置为0,然后重新运行...
Hi, I have a 10x Quadro RTX 8000 server and want to use all GPUs for a TensorFlow training job. I understand NCCL supports only up-to 8 GPU per server while NVSwitch is not available. After some search it seems setting NCCL_P2P_DISABLE=1...