RDMA Shared Device Plugin supports IB and RoCE HCA. This plugin runs as daemonset. Usage For using the RDMA Shared Device Plugin, see the Readme License Agreements This container is licensed under Apache 2.0. Suggested Reading Code Repository for RDMA Shared Device Plugin NVIDIA AI Enterprise Supp...
若上述 k8s-rdma-shared-dev-plugin-ds.yaml 的 git rep 无法访问,可以采用如下方式: # git clone https://github.com/Mellanox/k8s-rdma-shared-dev-plugin.git # cd k8s-rdma-shared-dev-plugin/ # kubectl create -f deployment/k8s/base/daemonset.yaml daemonset.apps/rdma-shared-dp-ds created 1. 2...
Device Plugin:设备插件,K8S提供了设备插件的接口规范,NVIDIA基于规范提供了NVIDIA Device Plugin用来支持GPU设备接入K8S,Mellanox也基于规范提供了用于管理ConnectX-5RoCE网卡的k8s-rdma-shared-dev-plugin。 其实Network Plugin和Device Plugin主要对应的便是容器接口中的CNI和CDI。下面主要阐述Network Plugin和Device Plugin...
k8s-rdma-device-plugin:该插件通过实现Kubernetes的设备插件接口,将RDMA设备引入容器环境,使RDMA设备能够在容器世界中无缝运行。它监控并管理RDMA设备的分配,与现有的基础设施(如libibverbs库)紧密结合。 k8s-rdma-shared-dev-plugin:该插件允许在多个Pod之间共享RDMA设备,提高了RDMA设备的利用率。它同样基于Kubernetes的设...
RDMA Shared Device Plugin Configurations The plugin has several configuration fields, this section explains each field usage {"periodicUpdateInterval":300,"configList": [{"resourceName":"hca_shared_devices_a","resourcePrefix":"example_prefix","rdmaHcaMax":1000,"devices": ["ib0","ib1"] }, ...
the devices which rdma shared device plugin exposes depend on its configuration (provided via configmap usually) Author vsoch commented Oct 30, 2024 Thanks! I think this would have been helpful discussion 2 months ago, but we wound up creating a custom installer. https://github.com/converged...
GTC session:Innovative High-Performance Cisco Ethernet Fabrics for AI Infrastructure and Networks (Presented by Cisco) NGC Containers:RDMA Shared Device Plugin NGC Containers:NVIDIA DOCA SNAP Virtio-fs SDK:GPUDirect Storage
or remove selector altogetherkubernetes.io/hostname:nvnode1restartPolicy:OnFailurecontainers:-image:mellanox/cuda-perftestname:rdma-gpu-test-ctrsecurityContext:capabilities:add:["IPC_LOCK"]resources:limits:nvidia.com/gpu:1rdma/rdma_shared_device_a:1requests:nvidia.com/gpu:1rdma/rdma_shared_device_a...
此外,FreeFlow Orchestrator还需要管理容器内应用缓冲区虚拟地址到FreeFlow Router内部Shared Memory指针的映射关系。 Figure 9 FreeFlow的总体架构 在标准的容器生产中,容器网络接口应遵循云原生计算基金会(Cloud Native Computing Foundation,CNCF)规定的容器网络接口(Container Network Interface,CNI)规范 [12],然而FreeFlow和...
or remove selector altogetherkubernetes.io/hostname:nvnode1restartPolicy:OnFailurecontainers:-image:mellanox/cuda-perftestname:rdma-gpu-test-ctrsecurityContext:capabilities:add:["IPC_LOCK"]resources:limits:nvidia.com/gpu:1rdma/rdma_shared_device_a:1requests:nvidia.com/gpu:1rdma/rdma_shared_device_a...