使用nvidia-smi topo -m指令输出之后发现有GPU0-7,以及NIC0、1,其中NIC1和所有设备显示都SYS(想问一下您它是怎么连接的呢?);另外测试有2个cpu,但是测试lscpu | grep NUMA,只有一个内核,而且发现这批卡的NVLink没有了,都是用的PCIe连接的,还发现GPU03和NIC0是PIX连接的,想问一下您这个是不是用于服务器并行
GPU之间的通信链路方式就会有多种,使用命令nvidia-smi topo --matrix可以直接获得服务器上每两个卡之间的物理通信方式 SYS: 通过QPI(PCIe + QPI总线)跨NUMA node间GPU通信;NODE: 单个NUMA node内经过Host Bridge PCIe总线通信(一个NUMA node上有多个CPU芯片);PHB: 经过Host Bridge(Root complex中)的PCIe总线通信...
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_0 mlx5_2 mlx5_1 mlx5_3 CPU Affinity GPU0 X NV1 NV1 NV2 NV2 SYS SYS SYS PIX SYS PHB SYS0-19,40-59GPU1 NV1 X NV2 NV1 SYS NV2 SYS SYS PIX SYS PHB SYS0-19,40-59GPU2 NV1 NV2 X NV2 SYS SYS NV1 SYS PHB SY...
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_0 mlx5_2 mlx5_1 mlx5_3 CPU Affinity GPU0 X NV1 NV1 NV2 NV2 SYS SYS SYS PIX SYS PHB SYS0-19,40-59GPU1 NV1 X NV2 NV1 SYS NV2 SYS SYS PIX SYS PHB SYS0-19,40-59GPU2 NV1 NV2 X NV2 SYS SYS NV1 SYS PHB SY...
看到"ON"字样表示驱动持久化模式已开启。接着,通过执行 "sudo reboot" 进行重启。重启后,再次检查 "nvidia-smi topo -m" 命令,若发现GPU之间的连接已变为NV#,表示NVLink功能已经成功激活。如果连接仍显示为SYS(即PCIE),而非NV#,则需要执行 "nvidia-smi topo -p2p n" 命令进行检查。若出现...
nvidia-smi topo--matrix 代码语言:javascript 代码运行次数:0 运行 AI代码解释 GPU0CPUAffinityGPU0X0-13,28-41Legend:X=SelfSYS=Connection traversing PCIeaswellastheSMPinterconnect betweenNUMAnodes(e.g.,QPI/UPI)NODE=Connection traversing PCIeaswellasthe interconnect between PCIe Host Bridges within aNUMA...
* Added nvidia-smi topo interface to display the GPUDirect communication matrix (EXPERIMENTAL) * Added support for displayed the GPU board ID and whether or not it is a multiGPU board * Removed user-defined throttle reason from XML output === Changes between nvidia-smi v5.319 Update and v...
$ nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NV1 NV1 NV2 NV2 SYS SYS SYS PIX SYS 0-19,40-59 0 N/A GPU1 NV1 X NV2 NV1 SYS NV2 SYS SYS PIX SYS 0-19,40-59 0 N/A ...
随后输入sudo reboot,重启后再检查nvidia-smi topo -m会发现GPU之间的连接已经是NV#了。 GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NV4 0-47 0 N/A GPU1 NV4 X 0-47 0 N/A Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA node...