启用cuda graph的情况下,tuning部分的代码不能执行在计算stream上,同时,tensor 也无法保证可用,所以重新分配 liujuncheng added 2 commits January 16, 2023 17:34 cutlass conv support CudaGraph 2ada25d enable CudaGraphSupport 7738127 liujuncheng added enhancement op labels Jan 16, 2023 liujuncheng requ...
@@ -987,7 +987,9 @@ Status LaunchDepthwiseConv2dBackpropInputGPU(OpKernelContext* ctx, const T* filter, T* in_backprop, TensorFormat data_format) { if (args.depth_multiplier == 1) { if (CanLaunchDepthwiseConv2dGPUSmall(args)) { // This kernel doesn't currently work in all cases so...