训练时,不设置 CUDA_DEVICE_MAX_CONNECTIONS=1 环境变量,loss 从 3+ 降到 1+,如果设置了CUDA_DEVICE_MAX_CONNECTIONS=1 环境变量,loss 从 3+ 降到 0.1+ ,下降快很多,请教一下这个是什么原因呢? 期望行为 | Expected Behavior No response 复现方法 | Steps To Reproduce ...
In order to achieve more concurrent stream parallelism I’m using the env variable CUDA_DEVICE_MAX_CONNECTIONS, which seems to be working as of CUDA 12.1. However I could find traces of this variable being defined in the Cuda Toolkit 5.5 but not in the latest one. Is th...