解释use_cache=true与梯度检查点(gradient checkpointing)不兼容的原因 use_cache=true通常用于在序列生成任务(如文本生成)中缓存上一时间步的键值对(key-value pairs),以加速后续时间步的计算。这通过避免重复计算已经计算过的隐藏状态来实现,从而提高推理速度。 然而,梯度检查点(gradient checkpointing)是一种用于减少...
Describe the bug I tried to train a ControlNet, with both DeepSpeed Stage-3and gradient checkpointing, but unexpected errors will occur. There is no problem using either of these alone, the errors seems to happen in the loss backforward:...
Describe the bug During Step 2 - Reward Model of DeepSpeed-Chat, an AssertionError occurs in the backward process for ZeRO stage 3 if gradient_checkpointing is enabled, while it works if gradient_checkpointing is disabled Log output Traceback (most recent call last): File"run_bloom.py", li...