...my model with torchrun on multiple GPUs but without DDP...
Issue description I want use command "torchrun" to train my model on multiple GPU, but I need to set data parallel=1 in order to use sequence parallel. What should I do? cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @g...