以高性能计算领域的重要算子 stencil 计算为例,常见的 stencil 计算采用预定义的计算模式,不断地在时间维度上通过计算其与相邻点的加权来更新每个数据点。这种计算方式使得 stencil 计算难以直接转化为矩阵乘法,因此无法充分利用因深度学习而不断涌现的矩阵乘法加速硬件。 针对此问题,本文提出了一种新的 stencil 计算系...
git clone https://github.com/microsoft/ConvStencil.git CompileUse the following commands:mkdir -p build cd build cmake .. make all -j24 UsageYou can run convstencil in the following input format.convstencil_program shape input_size time_interation_size options ...
We read every piece of feedback, and take your input very seriously. Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Cancel Create saved search Sign in Sign up Reseting focus {...
I'm puzzled by the speed of vslsconv. I'm trying to convolve a 1000 by 1000 image with a10 by 10 (or smaller) stencil. Appropriate FFT algorithms should expandboth to 1024 by 1024, take Fourier transforms (where the FT of the stencil of course can be done fast), multiplyand do th...
I'm sorry this is getting a little technical now, but I think you're not right:The image is 1000 by 1000, but the stencil is only 10 by 10, meaning that we need a padding on the original image of 9 pixels to guarantee that the cyclic convolution is the same as the "normal" one...
I'm puzzled by the speed of vslsconv. I'm trying to convolve a 1000 by 1000 image with a10 by 10 (or smaller) stencil. Appropriate FFT algorithms should expandboth to 1024 by 1024, take Fourier transforms (where the FT of the stencil of course can be done fast), multiplyand do the...
I'm puzzled by the speed of vslsconv. I'm trying to convolve a 1000 by 1000 image with a10 by 10 (or smaller) stencil. Appropriate FFT algorithms should expandboth to 1024 by 1024, take Fourier transforms (where the FT of the stencil of course can be done fast), multiplyand do the...
I'm puzzled by the speed of vslsconv. I'm trying to convolve a 1000 by 1000 image with a10 by 10 (or smaller) stencil. Appropriate FFT algorithms should expandboth to 1024 by 1024, take Fourier transforms (where the FT of the stencil of course can be done fast), multiplyand do the...
Hi, I'm puzzled by the speed of vslsconv. I'm trying to convolve a 1000 by 1000 image with a10 by 10 (or smaller) stencil. Appropriate FFT algorithms
I'm puzzled by the speed of vslsconv. I'm trying to convolve a 1000 by 1000 image with a10 by 10 (or smaller) stencil. Appropriate FFT algorithms should expandboth to 1024 by 1024, take Fourier transforms (where the FT of the stencil of course can be done fast), multiplyand do the...