例如 pipeline_model parallel_size 为2,tensor_model parallel_size 为4,表示一个模型会被纵向分为2个stage进行pipeline并行,每个组(stage)内会对应有一个tensor并行组进行4卡gpu的tensor并行。如下图分为2个阶段,每个阶段按列【g0,g1,g2,g3]和[g4,g5,g6,g7] 分别对应两个tensor并行通信组;按行分别有...
pipeline_parallel_sizeto docs since it was missing. cc@strangiato 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack athttps://slack.vllm.aito discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in...