Are there other optimizers in this realm that I am not aware of? If we want to support Adafactor, should we be adding an entirely new Optimizer(torch.optim.Optimizer)? Should it live in core vs pytorch-labs? Maybe down the road, we'd be able to bundle a collection of foreach ops wi...