用Verilog实现Bicubic插值,将960×540图像上采样至3840×2160。 Reference Cubic Convolution Interpolation for Digital Image Processing Why systolic architectures? Restrictions 这个IP只能用于实现960×540图像至3840×2160图像的上采样。因为卷积核的系数是固定的,在IP内部是通过实例化多个常系数乘法器实现的矩阵乘法功能...
通过优化矩阵乘法初步了解CUDA. Contribute to qiujiandong/learn-cuda development by creating an account on GitHub.
qiujiandong 📝docs: update README.md 80ed1cc· Apr 17, 2024 HistoryHistory File metadata and controls Preview Code Blame 55 lines (38 loc) · 2.88 KB Raw Shared Memory 在device上malloc的数据分配在global memory中,访问global memory相对来说是比较慢的,而访问shared memory会很快,shared memory类...
Latest commit qiujiandong 📝docs: update README.md 80ed1cc· Apr 17, 2024 HistoryHistoryFile metadata and controls Preview Code Blame 55 lines (39 loc) · 4.01 KB Raw Coalesce 对比采用shared memory加速的结果和cublas实现的结果,实际上还是有很多优化空间。 有个新的概念是coalesced memory access,...
通过优化矩阵乘法初步了解CUDA. Contribute to qiujiandong/learn-cuda development by creating an account on GitHub.
用Verilog实现Bicubic插值,将960×540图像上采样至3840×2160. Contribute to qiujiandong/bicubic development by creating an account on GitHub.