Compared to central processing units and graphic processing units, field programmable gate arrays (FPGA)-based CNNs are gaining popularity owing to their flexibility and efficiency. In this work, we present an efficient CNN accelerator based on blocked Winograd-GEMM architecture with high performance....
The design of FPGA based accelerator design traditionally required a tedious Register Transfer Level (RTL) design flow process. To improve design productivity, the proposed work uses High-Level Synthesis (HLS), described in OpenCL, to generate the FPGA bitstream for the CNN model. The 2D ...
Recently, FPGA-based CNN accelerators have demonstrated superior energy efficiency compared to high-performance devices like GPGPUs. However, due to the constrained on-chip resource and many other factors, single-board FPGA designs may have difficulties in achieving optimal energy efficienc...
2 DeepFire2: A Convolutional Spiking Neural Network Accelerator on FPGAs 标题:DeepFire2:FPGA 上的卷积尖峰神经网络加速器 文章链接:arxiv.org/abs/2305.0518 摘要:类脑脉冲神经网络 (SNN) 通过集成和激发神经元取代传统神经网络的乘法累加运算,目的是实现更高的能效。这些神经元的专用硬件实现在功率和性能方面...
The adoption of transformer networks has experienced a notable surge in various AI applications. However, the increased computational complexity, stemming primarily from the self-attention mechanism, parallels the manner in which convolution operations c
(CNNs). The self-attention algorithm, specifically the matrix-matrix multiplication (MatMul) operations, demands a substantial amount of memory and computational complexity, thereby restricting the overall performance of the transformer. This paper introduces an efficient hardware accelerator for the ...
High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-... X Lian,Z Liu,Z Song,... - 《IEEE Transactions on Very ...
In this work, we made two contributions: (1) Proposed a new neighbor sampler: CONCAT Sampler, which can be easily accelerated on hardware level while guaranteeing the test accuracy. (2) Designed a CONCAT-sampler-accelerator based on FPGA, with which the neighbor sampling process boosted to ...
we present a new efficient OpenCL-based Accelerator for large scale Convolutional Neural Networks called “Fast Inference on FPGAs for Convolution Neural Network” (FFCNN). FFCNN is based on a deeply pipelined OpenCL kernels architecture. As pointed out before, high-level synthesis tools such as...
2017.12-A survey of FPGA-based neural network accelerator 2018-FITEE-Recent Advances in Efficient Computation of Deep Convolutional Neural Networks 2018-IEEE Signal Processing Magazine-Model compression and acceleration for deep neural networks: The principles, progress, and challenges. Arxiv extension 2018...