1. 理解use_cache=True的含义及其在模型训练中的作用 在深度学习框架中,特别是使用Hugging Face Transformers库进行自然语言处理任务时,use_cache=True是一个常用的参数。当设置为True时,模型会在前向传播过程中缓存中间层的输出。这样做的目的是为了在生成任务(如文本生成)中能够高效地计算下一个token的概率,而无需...
Describe the bug I tried to train a ControlNet, with both DeepSpeed Stage-3and gradient checkpointing, but unexpected errors will occur. There is no problem using either of these alone, the errors seems to happen in the loss backforward:...
Fixes a bug for which if gradient checkpointing is enabled, SeamlessM4Tv2ConformerEncoderLayer.forward() is called with some missing arguments. Fixes #31028 Before submitting This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). Did you read the...
Gradient hook is called twice for shared parameter with activation checkpointing #131909 Sign in to view logs Summary Jobs assign Run details Usage Workflow file Triggered via issue January 7, 2025 21:46 soulitzer commented on #81296 3beb700 Status Success Total duration 9s Artifacts – assi...