The topi.cuda.inclusive_scan currently relies on performing an exclusive_scan followed by an add operation that adds the input data back in. To eliminate the overhead introduced by this extra addition, probably we should have an implementation specifically designed for inclusive_scan. As of now,...
The topi.cuda.inclusive_scan currently relies on performing an exclusive_scan followed by an add operation that adds the input data back in. To eliminate the overhead introduced by this extra addition, probably we should have an implementation specifical
The example uses the opencv_contrib modules but the bug is in the files scan.hpp and warp_shuffle.hpp in opencv/modules/core/include/opencv2/cuda/ The crash is coming from the kernel hanging inside warpScanInclusive call to cv::cuda::device::shfl_up(iData, i) on line 191 of scan.hpp...