In this work, we present an implementation of Breadth First Search for multi-GPU systems using NVSHMEM. We analyze the benefits and bottlenecks of moving fine-grained communication into CUDA kernels. Using our implementation of BFS, we achieve up to 75% improvement in performance compared to a ...
NVSHMEM is an implementation of OpenSHMEM for NVIDIA GPUs which allows communication to be issued from inside CUDA kernels. In this work, we present an implementation of Breadth First Search for...doi:10.1007/978-3-319-73814-7_6Sreeram Potluri...
Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM NVSHMEM is an implementation of OpenSHMEM for NVIDIA GPUs which allows communication to be issued from inside CUDA kernels. In this work, we present an implementation of Breadth First Search for......
CUDABFSDistributed algorithmLarge graphsGraph 500 benchmarkSimple distributed BFS algorithms lead load unbalance among threads.Communication among tasks quickly becomes an issue.We propose a novel technique for mapping threads to data.We perform a pruning operation on the set of edges exchanged at each...
CUDABFSDistributed algorithmLarge graphsGraph 500 benchmarkAbstract Simple algorithms for the execution of a Breadth First Search on large graphs lead, running on clusters of GPUs, to a situation of load unbalance among threads and un-coalesced memory accesses, resulting in pretty low performances. ...
We used NVIDIA RTX 3080 GPU (8704 CUDA cores and 10GB DRAM capacity) as the accelerator. In our evaluation, we utilized eight graph datasets that were not applied to train the MLP model. Table 2 lists the details of the graph suite used for the evaluation, all of which are abbreviated ...