基于FPGA的Winograd算法卷积神经网络加速器设计与实现

牛朝旭; 孙海江

doi:10.37188/CJLCD.2023-0013

您当前的位置：

首页 >

文章列表页 >

基于FPGA的Winograd算法卷积神经网络加速器设计与实现

目标检测与识别 | 更新时间：2023-11-03

- 基于FPGA的Winograd算法卷积神经网络加速器设计与实现
- Design and implementation of convolution neural network accelerator for Winograd algorithm based on FPGA
- 液晶与显示 2023年38卷第11期页码：1521-1530
- 作者机构：
  
  1.中国科学院长春光学精密机械与物理研究所，吉林长春 130033
  2.中国科学院大学，北京 100049
- 作者简介：
  
  [ "牛朝旭（1998—），男，山东菏泽人，硕士研究生，2020年于东北电力大学获得学士学位，主要从事CNN加速器设计方面的研究。E-mail：niuzhaoxu20@ mails.ucas.edu.cn" ]
  [ "孙海江（1980—），男，吉林辉南人，博士，研究员，2012年于中国科学院长春光学精密机械与物理研究所获得博士学位，主要从事目标识别与跟踪技术及高清视频图像增强显示方面的研究。E-mail：sunhaijing@126.com" ]
- 基金信息：
  
  吉林省科技发展计划(20200404155YY)
- DOI：10.37188/CJLCD.2023-0013
  中图分类号： TP332
- 收稿日期：2023-01-13，
  
  修回日期：2023-02-04，
  
  纸质出版日期：2023-11-05
- 稿件说明：
移动端阅览
牛朝旭, 孙海江. 基于FPGA的Winograd算法卷积神经网络加速器设计与实现[J]. 液晶与显示, 2023,38(11):1521-1530.

NIU Zhao-xu, SUN Hai-jiang. Design and implementation of convolution neural network accelerator for Winograd algorithm based on FPGA[J]. Chinese journal of liquid crystals and displays, 2023, 38(11): 1521-1530.
牛朝旭, 孙海江. 基于FPGA的Winograd算法卷积神经网络加速器设计与实现[J]. 液晶与显示, 2023,38(11):1521-1530. DOI： 10.37188/CJLCD.2023-0013.

NIU Zhao-xu, SUN Hai-jiang. Design and implementation of convolution neural network accelerator for Winograd algorithm based on FPGA[J]. Chinese journal of liquid crystals and displays, 2023, 38(11): 1521-1530. DOI： 10.37188/CJLCD.2023-0013.

摘要

为实现卷积神经网络在低功耗、边缘计算等场景中的加速计算，设计了一种基于现场可编程门阵列（FPGA）的Winograd算法卷积神经网络加速器。首先，将图像数据和权重数据量化为8位定点数，并设计了硬件卷积计算过程中的量化流程，提升了数据传输速度和计算速度。接着，设计了输入数据缓存复用模块，将多输入通道数据融合后传输，复用了行重叠数据。然后设计了Winograd流水线卷积模块，实现列数据的组合复用，从而最大化重用了片上数据，降低了片上数据存储的占用和带宽压力。最后将加速器在Xilinx的ZCU104开发板上部署。经过实验验证，加速器的卷积层计算性能达到354.5 GOPS，片上DSP计算效率达到0.69，与相关研究相比，实现了1.6倍以上的提升。该加速器能够以高能效比完成基于VGG-16网络的遥感图像分类任务。

Abstract

In order to realize the acceleration of convolutional neural network in low-power， edge computing and other scenarios， a Winograd algorithm convolutional neural network accelerator based on field programmable gate array （FPGA） is designed. Firstly， the image data and weight data are quantized into 8-bit fixed-point numbers， and the quantization process in the hardware convolution calculation process is designed to improve data transmission speed and calculation speed. Secondly， the input data buffer multiplexing module is designed， which fuses the data of multiple input channels and transmits them， reusing the row overlapping data. Then， the Winograd pipeline convolution module is designed to realize the combined reuse of column data， so as to maximize the reuse of data on chip and reduce the occupation of data storage on chip and bandwidth pressure. Finally， the accelerator is deployed on the ZCU104 development board of Xilinx. Experimental verification shows that the convolution layer computing performance of accelerator reaches to 354.5 GOPS， and the on-chip DSP computing efficiency reaches to 0.69， which is more than 1.6 times higher than relevant research. The accelerator can complete remote sensing image classification task based on VGG-16 network with high energy efficiency ratio.

关键词

Keywords

references

CHEN H L ， HUANG L Z ， LIU T R ， et al . Fourier imager network （FIN）： a deep neural network for hologram reconstruction with superior external generalization ［J］. Light： Science & Applications ， 2022 ， 11 （ 1 ）： 254 . doi: 10.1038/s41377-022-00949-8 http://dx.doi.org/10.1038/s41377-022-00949-8

ZUO C ， QIAN J M ， FENG S J ， et al . Deep learning in optical metrology： a review ［J］. Light： Science & Applications ， 2022 ， 11 （ 1 ）： 39 . doi: 10.1038/s41377-022-00714-x http://dx.doi.org/10.1038/s41377-022-00714-x

LUO W J ， SCHWING A G ， URTASUN R . Efficient deep learning for stereo matching ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas ： IEEE ， 2016 ： 5695 - 5703 . doi: 10.1109/cvpr.2016.614 http://dx.doi.org/10.1109/cvpr.2016.614

邓箴，王一斌，刘立波 . 视觉注意机制的注意残差稠密神经网络弱光照图像增强［J］. 液晶与显示， 2021 ， 36 （ 11 ）： 1463 - 1473 ． doi: 10.37188/CJLCD.2021-0098 http://dx.doi.org/10.37188/CJLCD.2021-0098

DENG Z ， WANG Y B ， LIU L B . Attentive residual dense network of visual attention mechanism for weakly illuminated image enhancement ［J］. Chinese Journal of Liquid Crystals and Displays ， 2021 ， 36 （ 11 ）： 1463 - 1473 . （in Chinese） . doi: 10.37188/CJLCD.2021-0098 http://dx.doi.org/10.37188/CJLCD.2021-0098

李珣，李林鹏， LAZOVIK A ，等 . 基于改进双流卷积递归神经网络的RGB-D物体识别方法［J］. 光电工程， 2021 ， 48 （ 2 ）： 200069 ． doi: 10.12086/oee.2021.200069 http://dx.doi.org/10.12086/oee.2021.200069

LI X ， LI L P ， LAZOVIK A ， et al . RGB-D object recognition algorithm based on improved double stream convolution recursive neural network ［J］. Opto-Electronic Engineering ， 2021 ， 48 （ 2 ）： 200069 . （in Chinese） . doi: 10.12086/oee.2021.200069 http://dx.doi.org/10.12086/oee.2021.200069

周秦汉，王振 . 基于多尺度特征增强卷积神经网络遥感目标检测算法［J］. 电光与控制， 2022 ， 29 （ 11 ）： 74 - 81 ． doi: 10.3969/j.issn.1671-637X.2022.11.013 http://dx.doi.org/10.3969/j.issn.1671-637X.2022.11.013

ZHOU Q H ， WANG Z . A remote sensing target detection algorithm based on multi-scale feature enhancement CNNs ［J］. Electronics Optics & Control ， 2022 ， 29 （ 11 ）： 74 - 81 . （in Chinese） . doi: 10.3969/j.issn.1671-637X.2022.11.013 http://dx.doi.org/10.3969/j.issn.1671-637X.2022.11.013

卢丽强，郑思泽，肖倾城，等 . 面向卷积神经网络的FPGA设计［J］. 中国科学：信息科学， 2019 ， 49 （ 3 ）： 277 - 294 ． doi: 10.1360/n112018-00291 http://dx.doi.org/10.1360/n112018-00291

LU L Q ， ZHENG S Z ， XIAO Q C ， et al . Accelerating convolutional neural networks on FPGAs ［J］. Scientia Sinica Informationis ， 2019 ， 49 （ 3 ）： 277 - 294 . （in Chinese） . doi: 10.1360/n112018-00291 http://dx.doi.org/10.1360/n112018-00291

张军阳，王慧丽，郭阳，等 . 深度学习相关研究综述［J］. 计算机应用研究， 2018 ， 35 （ 7 ）： 1921 - 1928， 1936 ． doi: 10.3969/j.issn.1001-3695.2018.07.001 http://dx.doi.org/10.3969/j.issn.1001-3695.2018.07.001

ZHANG J Y ， WANG H L ， GUO Y ， et al . Review of deep learning ［J］. Application Research of Computers ， 2018 ， 35 （ 7 ）： 1921 - 1928， 1936 . （in Chinese） . doi: 10.3969/j.issn.1001-3695.2018.07.001 http://dx.doi.org/10.3969/j.issn.1001-3695.2018.07.001

WEI X C ， YU C H ， ZHANG P ， et al . Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs ［C］// Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference . Austin ： IEEE ， 2017 ： 1 - 6 . doi: 10.1145/3061639.3062207 http://dx.doi.org/10.1145/3061639.3062207

LIN X H ， YIN S Y ， TU F B ， et al . LCP： A layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA ［C］// Proceedings of 2018 55th ACM/ESDA/IEEE Design Automation Conference . San Francisco ： IEEE ， 2018 ： 1 - 6 . doi: 10.1109/dac.2018.8465777 http://dx.doi.org/10.1109/dac.2018.8465777

ZHANG C ， PRASANNA V . Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system ［C］// Proceedings of 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . Monterey ： ACM ， 2017 . doi: 10.1145/3020078.3021727 http://dx.doi.org/10.1145/3020078.3021727

LAVIN A ， GRAY S . Fast algorithms for convolutional neural networks ［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas ： IEEE ， 2016 ： 4013 - 4021 . doi: 10.1109/cvpr.2016.435 http://dx.doi.org/10.1109/cvpr.2016.435

LIANG Y ， LU L Q ， XIAO Q C ， et al . Evaluating fast algorithms for convolutional neural networks on FPGAs ［J］. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ， 2020 ， 39 （ 4 ）： 857 - 870 . doi: 10.1109/tcad.2019.2897701 http://dx.doi.org/10.1109/tcad.2019.2897701

黄程程，董霄霄，李钊 . 基于二维Winograd算法的深流水线5×5卷积方法［J］. 计算机应用， 2021 ， 41 （ 8 ）： 2258 - 2264 ． doi: 10.11772/j.issn.1001-9081.2020101668 http://dx.doi.org/10.11772/j.issn.1001-9081.2020101668

HUANG C C ， DONG X X ， LI Z . Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm ［J］. Journal of Computer Applications ， 2021 ， 41 （ 8 ）： 2258 - 2264 . （in Chinese） . doi: 10.11772/j.issn.1001-9081.2020101668 http://dx.doi.org/10.11772/j.issn.1001-9081.2020101668

WINOGRAD S . Arithmetic Complexity of Computations ［M］. Philadelphia ： Society for Industrial and Applied Mathematics ， 1980 ： 18 - 23 . doi: 10.1137/1.9781611970364 http://dx.doi.org/10.1137/1.9781611970364

ZHANG C ， FANG Z M ， ZHOU P P ， et al . Caffeine： towards uniformed representation and acceleration for deep convolutional neural networks ［C］// Proceedings of 2016 IEEE/ACM International Conference on Computer-Aided Design . Austin ： IEEE ， 2016 . doi: 10.1145/2966986.2967011 http://dx.doi.org/10.1145/2966986.2967011

GUO K Y ， SUI L Z ， QIU J T ， et al . Angel-eye： a complete design flow for mapping CNN onto embedded FPGA ［J］. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ， 2018 ， 37 （ 1 ）： 35 - 47 . doi: 10.1109/tcad.2017.2705069 http://dx.doi.org/10.1109/tcad.2017.2705069

梅志伟 . 卷积神经网络加速模块设计与FPGA实现［D］. 杭州：浙江大学， 2020 ． doi: 10.13232/j.cnki.jnju.2020.04.016 http://dx.doi.org/10.13232/j.cnki.jnju.2020.04.016

MEI Z W . Design and FPGA implementation of convolutional neutral network acceleration module ［D］. Hangzhou ： Zhejiang University ， 2020 . （in Chinese） . doi: 10.13232/j.cnki.jnju.2020.04.016 http://dx.doi.org/10.13232/j.cnki.jnju.2020.04.016

浏览量

388

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于高速相机的智能事件识别时扩存储方法及FPGA实现

基于FPGA的九点插值自适应图像缩放算法设计

基于代价敏感正则化和EfficientNet的糖尿病视网膜病变分类方法

双任务卷积神经网络的图像去模糊方法