[2024-03-05 15:37:54,398] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-05 15:37:54,398] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /opt/conda/envs/ptca/lib/python3.8/site...
Setting ds_accelerator to cuda (auto detect) Running gcloud compute tpus tpu-vm ssh test-tpu --zone us-central1-a --command cd /usr/share; pip install accelerate -U; echo "hello world"; echo "this is a second command" --worker all Expected behavior A configurable option to silence th...
🐛 Describe the bug Hello, when I am using DDP to train a model, I found that using multi-task loss and gradient checkpointing at the same time can lead to gradient synchronization failure between GPUs, which in turn causes the parameters...