In order to achieve more concurrent stream parallelism I’m using the env variable CUDA_DEVICE_MAX_CONNECTIONS, which seems to be working as of CUDA 12.1. However I could find traces of this variable being defined in the Cuda Toolkit 5.5 but not in the latest one. Is th...
1、如果设置 CUDA_DEVICE_MAX_CONNECTIONS=1 无论是否 padding 到 max(比如4096),loss 下降正常 2、如果不设置 CUDA_DEVICE_MAX_CONNECTIONS=1 并且 padding 到 max,loss 下降正常 3、如果不设置 CUDA_DEVICE_MAX_CONNECTIONS=1,只是做 batch padding(非全局 padding),loss 下降非常慢 ...
set CUDA_DEVICE_MAX_CONNECTIONS=1 #1113 Merged zhyncs merged 1 commit into main from deadlock-2 Aug 15, 2024 +1 −0 Conversation 0 Commits 1 Checks 7 Files changed 1 Conversation Contributor merrymercy commented Aug 15, 2024 This can reduce the chance of deadlock in multi nodes ...
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR. To re-run CI remove and add the label again. To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run". Before your PR is "Ready for review" Pre c...
I have a question, why is it necessary to set CUDA_DEVICE_MAX_CONNECTIONS=1 after enabling seq_parallel? This note is written in the bwd_compute function, and it says it is to launch communication first, so as to achieve overlap with cal...