排查方式: strace -f -e 'trace=!poll' mpirun ...and make a gist ssh localhost 可能原因: Q:ssh localhost后需要密码 A:openmpi未配置免密https://github.com/horovod/horovod/issues/125 CUDA driveris a stub library misc/strongstream.cc:60 NCCL WARN Cuda failure 'CUDA driver is a stub librar...