解释use_cache=true与梯度检查点(gradient checkpointing)不兼容的原因 use_cache=true通常用于在序列生成任务(如文本生成)中缓存上一时间步的键值对(key-value pairs),以加速后续时间步的计算。这通过避免重复计算已经计算过的隐藏状态来实现,从而提高推理速度。 然而,梯度检查点(gradient checkpointing)是一种用于减少...
Need to explicitly set use_reentrant when calling checkpoint#26969 Closed 4 tasks younesbelkadamentioned this pull requestOct 23, 2023 [core] Refactor ofgradient_checkpointing#27020 Merged younesbelkadaclosed thisNov 18, 2023 wwxFromTjumentioned this pull requestFeb 5, 2024 ...
However, when we uncommentmodel.gradient_checkpointing_enable(), we get this error: (venv) username@server:~/project$ accelerate launch --use_fsdp -m train_multi The following values were not passed to`accelerate launch`and had defaults used instead:`--num_processes`wassetto a value of`2`...