在前文(跟代码执行流程,读Megatron源码(三)megatron训练脚本training.py之pretrain())中,我们讲述了pretrain函数的执行流程,其首要步骤是megatron分组的初始化与环境的配置。本文将深入initialize_megatron函数源码,剖析其初始化分布式训练环境的内部机制。 注:在此假设读者具备3D并行相关知识 一. initialize_megatron函数的...
broadcast_single = torch.tensor([1,2,3]).cuda(non_blocking = True) broadcast = torch.tensor(args.world_size*[[0,0,0]]).cuda(non_blocking = True) torch.distributed.all_gather_into_tensor(broadcast,broadcast_single) gather后的broadcast 存储的还是初始化的值,并不是各个rank的真实值 发现问...
def initialize_megatron(extra_args_provider=None, args_defaults={}, ignore_unknown_args=False, allow_no_cuda=False): """Set global variables, initialize distributed, and set autoresume and random seeds.""" set autoresume and random seeds. `allow_no_cuda` should not be set unless using meg...
Currently,deepspeed.comm.get_rankis called beforedeepspeed.init_distributed, leading to a DS assertion error such as: AssertionError: DeepSpeed backend not set, please initialize it using init_process_group() If we replacetorch.distributed.init_process_groupwithdeepspeed.init_distributed, DS will both...