Contributor bdhirsh commented Feb 7, 2025 • edited by pytorch-bot bot Summary: FSDP needs to hide VC bumps on its allgather buffer, but it does not need to do this is the allgather buffer was generated under inference mode. more details here: https://www.internalfb.com/diff/D6911...
Tensors and Dynamic neural networks in Python with strong GPU acceleration - FSDP: avoid resetting version counter of all_gather_output in inference_mode · pytorch/pytorch@a8aa293