物理机上的npu没有分配ip时,此时/etc/hccn.conf文件为空,读取/etc/hccn.conf为空,hccl_tools.py报错,需要为npu分配ip。 你希望看到什么解决方案? 在hccl_tools.py的readme中进行补充说明,或者在hccl_tools.py中自动为机器分配IP 你考虑过哪些替代方案? 你有其他上下文或截图吗? 意向参与贡献 将...
hccl_tools.py脚本生成hccl_8p.json文件失败 DONE #I3DP2Y Bug-Report xixi_han 创建于 2021-03-26 20:33 nameaboutlabels Bug Report Use this template for reporting a bug kind/bug Environment Hardware Environment(Ascend/GPU/CPU): Uncomment only one /device <> line, hit enter to put that in...
用途,准备用这两张推理卡部署chatglm3,没有完整的教程,自己摸索,在基础组件安装完成后,准备生成hccl json文件的时候,执行 (ascend_py39) [root@xctest1 mindformers]# python ./mindformers/tools/hccl_tools.py --device_num "[0,8)" --server_ip=10.23.13.83 start /root/llm/mind/mindformers/./mi...
由于hccl头文件目录结构修改,导致mmcv npu侧编译报错,增加头文件 Modification setup.py中增加编译需要的头文件. BC-breaking (Optional) Does the modification introduce changes that break the backward-compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how ...
Motivation 由于hccl头文件目录结构修改,导致mmcv npu侧编译报错,增加头文件(1.x分支). Modification setup.py中增加编译需要的头文件. BC-breaking (Optional) Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
hccl_tools / merge_hccl.py merge_hccl.py 2.26 KB 一键复制 编辑 原始数据 按行查看 历史 徐永飞 提交于 3年前 . fix merge_hccl.py value type of rank_id is not string 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374 # Cop...
python hccl_tools.py --device_num "[0,8)" output: hccl_[device_num]p_[which device]_[server_ip].json Note Please note that the Ascend accelerators used must be continuous, such [0,4) means to use four chips 0,1,2,3; [0,1) means to use chip 0; The first four chips are...
hccl_tools / merge_hccl.py merge_hccl.py 2.26 KB 一键复制 编辑 原始数据 按行查看 历史 徐永飞 提交于 3年前 . fix merge_hccl.py value type of rank_id is not string 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374 # Cop...