deepspeed_stage_2_offload

2025-01-24 20:22:22

拼音 [ 拼音 ]

[BUG] Deepspeed Crashes when using MoE, Stage 2 Offload with...

When performing a training run with a model with Mixture of Experts (MoE) layers using stage 2 offload with the DeepSpeedCPUAdam optimizer, during the parameter update step the following runtime error is thrown. │ /home/kyle/.conda/envs/llama2-chat/lib/python3.11/site-packages/lightning/fabr...