1.中国科学院 长春光学精密机械与物理研究所, 吉林 长春 130033
2.中国科学院大学, 北京 100049
[ "牛朝旭(1998—),男,山东菏泽人,硕士研究生,2020年于东北电力大学获得学士学位,主要从事CNN加速器设计方面的研究。E-mail:niuzhaoxu20@ mails.ucas.edu.cn" ]
[ "孙海江(1980—),男,吉林辉南人,博士,研究员,2012年于中国科学院长春光学精密机械与物理研究所获得博士学位,主要从事目标识别与跟踪技术及高清视频图像增强显示方面的研究。E-mail:sunhaijing@126.com" ]
扫 描 看 全 文
牛朝旭, 孙海江. 基于FPGA的Winograd算法卷积神经网络加速器设计与实现[J]. 液晶与显示, 2023,38(11):1521-1530.
NIU Zhao-xu, SUN Hai-jiang. Design and implementation of convolution neural network accelerator for Winograd algorithm based on FPGA[J]. Chinese Journal of Liquid Crystals and Displays, 2023,38(11):1521-1530.
牛朝旭, 孙海江. 基于FPGA的Winograd算法卷积神经网络加速器设计与实现[J]. 液晶与显示, 2023,38(11):1521-1530. DOI: 10.37188/CJLCD.2023-0013.
NIU Zhao-xu, SUN Hai-jiang. Design and implementation of convolution neural network accelerator for Winograd algorithm based on FPGA[J]. Chinese Journal of Liquid Crystals and Displays, 2023,38(11):1521-1530. DOI: 10.37188/CJLCD.2023-0013.
为实现卷积神经网络在低功耗、边缘计算等场景中的加速计算,设计了一种基于现场可编程门阵列(FPGA)的Winograd算法卷积神经网络加速器。首先,将图像数据和权重数据量化为8位定点数,并设计了硬件卷积计算过程中的量化流程,提升了数据传输速度和计算速度。接着,设计了输入数据缓存复用模块,将多输入通道数据融合后传输,复用了行重叠数据。然后设计了Winograd流水线卷积模块,实现列数据的组合复用,从而最大化重用了片上数据,降低了片上数据存储的占用和带宽压力。最后将加速器在Xilinx的ZCU104开发板上部署。经过实验验证,加速器的卷积层计算性能达到354.5 GOPS,片上DSP计算效率达到0.69,与相关研究相比,实现了1.6倍以上的提升。该加速器能够以高能效比完成基于VGG-16网络的遥感图像分类任务。
In order to realize the acceleration of convolutional neural network in low-power, edge computing and other scenarios, a Winograd algorithm convolutional neural network accelerator based on field programmable gate array (FPGA) is designed. Firstly, the image data and weight data are quantized into 8-bit fixed-point numbers, and the quantization process in the hardware convolution calculation process is designed to improve data transmission speed and calculation speed. Secondly, the input data buffer multiplexing module is designed, which fuses the data of multiple input channels and transmits them, reusing the row overlapping data. Then, the Winograd pipeline convolution module is designed to realize the combined reuse of column data, so as to maximize the reuse of data on chip and reduce the occupation of data storage on chip and bandwidth pressure. Finally, the accelerator is deployed on the ZCU104 development board of Xilinx. Experimental verification shows that the convolution layer computing performance of accelerator reaches to 354.5 GOPS, and the on-chip DSP computing efficiency reaches to 0.69, which is more than 1.6 times higher than relevant research. The accelerator can complete remote sensing image classification task based on VGG-16 network with high energy efficiency ratio.
卷积神经网络现场可编程门阵列Winograd算法流水线并行计算
convolution neural networkfield programmable gate arraywinograd algorithmassembly lineparallel computing
CHEN H L, HUANG L Z, LIU T R, et al. Fourier imager network (FIN): a deep neural network for hologram reconstruction with superior external generalization [J]. Light: Science & Applications, 2022, 11(1): 254. doi: 10.1038/s41377-022-00949-8http://dx.doi.org/10.1038/s41377-022-00949-8
ZUO C, QIAN J M, FENG S J, et al. Deep learning in optical metrology: a review [J]. Light: Science & Applications, 2022, 11(1): 39. doi: 10.1038/s41377-022-00714-xhttp://dx.doi.org/10.1038/s41377-022-00714-x
LUO W J, SCHWING A G, URTASUN R. Efficient deep learning for stereo matching [C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 5695-5703. doi: 10.1109/cvpr.2016.614http://dx.doi.org/10.1109/cvpr.2016.614
邓箴,王一斌,刘立波. 视觉注意机制的注意残差稠密神经网络弱光照图像增强[J]. 液晶与显示,2021,36(11):1463-1473. doi: 10.37188/CJLCD.2021-0098http://dx.doi.org/10.37188/CJLCD.2021-0098
DENG Z, WANG Y B, LIU L B. Attentive residual dense network of visual attention mechanism for weakly illuminated image enhancement [J]. Chinese Journal of Liquid Crystals and Displays, 2021, 36(11): 1463-1473. (in Chinese). doi: 10.37188/CJLCD.2021-0098http://dx.doi.org/10.37188/CJLCD.2021-0098
李珣,李林鹏,LAZOVIK A,等. 基于改进双流卷积递归神经网络的RGB-D物体识别方法[J]. 光电工程,2021,48(2):200069. doi: 10.12086/oee.2021.200069http://dx.doi.org/10.12086/oee.2021.200069
LI X, LI L P, LAZOVIK A, et al. RGB-D object recognition algorithm based on improved double stream convolution recursive neural network [J]. Opto-Electronic Engineering, 2021, 48(2): 200069. (in Chinese). doi: 10.12086/oee.2021.200069http://dx.doi.org/10.12086/oee.2021.200069
周秦汉,王振. 基于多尺度特征增强卷积神经网络遥感目标检测算法[J]. 电光与控制,2022,29(11):74-81. doi: 10.3969/j.issn.1671-637X.2022.11.013http://dx.doi.org/10.3969/j.issn.1671-637X.2022.11.013
ZHOU Q H, WANG Z. A remote sensing target detection algorithm based on multi-scale feature enhancement CNNs [J]. Electronics Optics & Control, 2022, 29(11): 74-81. (in Chinese). doi: 10.3969/j.issn.1671-637X.2022.11.013http://dx.doi.org/10.3969/j.issn.1671-637X.2022.11.013
卢丽强,郑思泽,肖倾城,等. 面向卷积神经网络的FPGA设计[J]. 中国科学:信息科学,2019,49(3):277-294. doi: 10.1360/n112018-00291http://dx.doi.org/10.1360/n112018-00291
LU L Q, ZHENG S Z, XIAO Q C, et al. Accelerating convolutional neural networks on FPGAs [J]. Scientia Sinica Informationis, 2019, 49(3): 277-294. (in Chinese). doi: 10.1360/n112018-00291http://dx.doi.org/10.1360/n112018-00291
张军阳,王慧丽,郭阳,等. 深度学习相关研究综述[J]. 计算机应用研究,2018,35(7):1921-1928, 1936. doi: 10.3969/j.issn.1001-3695.2018.07.001http://dx.doi.org/10.3969/j.issn.1001-3695.2018.07.001
ZHANG J Y, WANG H L, GUO Y, et al. Review of deep learning [J]. Application Research of Computers, 2018, 35(7): 1921-1928, 1936. (in Chinese). doi: 10.3969/j.issn.1001-3695.2018.07.001http://dx.doi.org/10.3969/j.issn.1001-3695.2018.07.001
WEI X C, YU C H, ZHANG P, et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs [C]//Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference. Austin: IEEE, 2017: 1-6. doi: 10.1145/3061639.3062207http://dx.doi.org/10.1145/3061639.3062207
LIN X H, YIN S Y, TU F B, et al. LCP: A layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA [C]//Proceedings of 2018 55th ACM/ESDA/IEEE Design Automation Conference. San Francisco: IEEE, 2018: 1-6. doi: 10.1109/dac.2018.8465777http://dx.doi.org/10.1109/dac.2018.8465777
ZHANG C, PRASANNA V. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system [C]//Proceedings of 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey: ACM, 2017. doi: 10.1145/3020078.3021727http://dx.doi.org/10.1145/3020078.3021727
LAVIN A, GRAY S. Fast algorithms for convolutional neural networks [C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4013-4021. doi: 10.1109/cvpr.2016.435http://dx.doi.org/10.1109/cvpr.2016.435
LIANG Y, LU L Q, XIAO Q C, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(4): 857-870. doi: 10.1109/tcad.2019.2897701http://dx.doi.org/10.1109/tcad.2019.2897701
黄程程,董霄霄,李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用,2021,41(8):2258-2264. doi: 10.11772/j.issn.1001-9081.2020101668http://dx.doi.org/10.11772/j.issn.1001-9081.2020101668
HUANG C C, DONG X X, LI Z. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm [J]. Journal of Computer Applications, 2021, 41(8): 2258-2264. (in Chinese). doi: 10.11772/j.issn.1001-9081.2020101668http://dx.doi.org/10.11772/j.issn.1001-9081.2020101668
WINOGRAD S. Arithmetic Complexity of Computations [M]. Philadelphia: Society for Industrial and Applied Mathematics, 1980: 18-23. doi: 10.1137/1.9781611970364http://dx.doi.org/10.1137/1.9781611970364
ZHANG C, FANG Z M, ZHOU P P, et al. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks [C]//Proceedings of 2016 IEEE/ACM International Conference on Computer-Aided Design. Austin: IEEE, 2016. doi: 10.1145/2966986.2967011http://dx.doi.org/10.1145/2966986.2967011
GUO K Y, SUI L Z, QIU J T, et al. Angel-eye: a complete design flow for mapping CNN onto embedded FPGA [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(1): 35-47. doi: 10.1109/tcad.2017.2705069http://dx.doi.org/10.1109/tcad.2017.2705069
梅志伟. 卷积神经网络加速模块设计与FPGA实现[D]. 杭州:浙江大学,2020. doi: 10.13232/j.cnki.jnju.2020.04.016http://dx.doi.org/10.13232/j.cnki.jnju.2020.04.016
MEI Z W. Design and FPGA implementation of convolutional neutral network acceleration module [D]. Hangzhou: Zhejiang University, 2020. (in Chinese). doi: 10.13232/j.cnki.jnju.2020.04.016http://dx.doi.org/10.13232/j.cnki.jnju.2020.04.016
0
浏览量
11
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构