

浏览全部资源
扫码关注微信
1.中国科学院 长春光学精密机械与物理研究所 应用光学国家重点实验室,吉林 长春 130033
2.中国科学院大学, 北京 100049
Received:30 September 2021,
Revised:27 November 2021,
Published:05 April 2022
移动端阅览
Ya-jie KONG, Ye ZHANG. YOLOv3 object detection method by introducing Gaussian mask self-attention module[J]. Chinese journal of liquid crystals and displays, 2022, 37(4): 539-548.
Ya-jie KONG, Ye ZHANG. YOLOv3 object detection method by introducing Gaussian mask self-attention module[J]. Chinese journal of liquid crystals and displays, 2022, 37(4): 539-548. DOI: 10.37188/CJLCD.2021-0250.
基于行车图像的目标检测方法为感知周围的道路环境提供了便宜、有效的解决方案,但同时也对检测效果和检测速度提出了较高要求。本文针对基于深度学习的一阶段目标检测算法YOLOv3展开研究,结合自注意力机制,在其网络深层结构中嵌入高斯掩码自注意力模块,缓解卷积操作感受野不足的缺陷,以捕捉更多的全局信息,提高算法的检测效果。实验结果表明,改进后的模型在MS COCO 2017数据集上训练结果的mAP@0.5达到56.88%,精度达到65.31%。与YOLOv3相比,mAP@0.5提高了2.56%,精度提高了3.53%。虽然检测速度有一些损失,但有效提高了模型的检测效果,能够更好地支撑辅助驾驶等应用。
With the video captured in driving, the surroundings can be sensed economically and conveniently by using object detection techniques, but the accuracy and speed of detection requires a lot in such kind of real-time scenes. In this work, a deep learning-based one-stage object detection algorithm called YOLOv3 is studied. Self-attention mechanism is introduced into this method, by embedding Gaussian mask self-attention modules in the high layers of YOLOv3 network. These modules can merge more global information into feature map to improve the accuracy of model. According to the results of experiments, trained on the MS COCO 2017 dataset, the mAP@0.5 and precision of this improved model can reach to 56.88% and 65.31%. Compared with YOLOv3, its mAP@0.5 and precision increase by 2.56% and 3.53%. Although there is a little loss in detection speed, detection accuracy is significantly improved when the method is applied to assisted driving system.
中国汽车工业协会 . 2020年中国汽车工业经济运行报告 [EB/OL]. [2021]. http://lwzb.stats.gov.cn/pub/lwzb/tzgg/202107/W020210723348607396983.pdf http://lwzb.stats.gov.cn/pub/lwzb/tzgg/202107/W020210723348607396983.pdf . doi: 10.1177/0009445520930397 http://dx.doi.org/10.1177/0009445520930397
China Association of Automobile Manufacturers . The economic operation report of China’s automobile industry in 2020 [EB/OL]. [2021]. http://lwzb.stats.gov.cn/pub/lwzb/tzgg/202107/W020210723348607396983.pdf. http://lwzb.stats.gov.cn/pub/lwzb/tzgg/202107/W020210723348607396983.pdf. (in Chinese) . doi: 10.1177/0009445520930397 http://dx.doi.org/10.1177/0009445520930397
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach, CA, USA : Curran Associates Inc. , 2017 : 6000 - 6010 . doi: 10.1016/s0262-4079(17)32358-8 http://dx.doi.org/10.1016/s0262-4079(17)32358-8
GIRSHICK R , DONAHUE J , DARRELL T , et al . Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus, OH, USA : IEEE , 2014 : 580 - 587 . doi: 10.1109/cvpr.2014.81 http://dx.doi.org/10.1109/cvpr.2014.81
GIRSHICK R . Fast R-CNN [C]// Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago : IEEE , 2015 : 1440 - 1448 . doi: 10.1109/iccv.2015.169 http://dx.doi.org/10.1109/iccv.2015.169
REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 . doi: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031
LIN T Y , DOLLÁR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI, USA : IEEE , 2017 : 936 - 944 . doi: 10.1109/cvpr.2017.106 http://dx.doi.org/10.1109/cvpr.2017.106
REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: unified, real-time object detection [C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, NV, USA : IEEE , 2016 : 779 - 788 . doi: 10.1109/cvpr.2016.91 http://dx.doi.org/10.1109/cvpr.2016.91
LIU W , ANGUELOV D , ERHAN D , et al . SSD: single shot MultiBox detector [C]// Proceedings of the European Conference on Computer Vision . Amsterdam : Springer , 2016 : 21 - 37 . doi: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2
REDMON J , FARHADI A . YOLO9000: better, faster, stronger [C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI, USA : IEEE , 2017 : 6517 - 6525 . doi: 10.1109/cvpr.2017.690 http://dx.doi.org/10.1109/cvpr.2017.690
REDMON J , FARHADI A . YOLOv3: an incremental improvement [J]. arXiv : 1804.02767 , 2018 . doi: 10.1109/cvpr.2017.690 http://dx.doi.org/10.1109/cvpr.2017.690
唐悦 , 吴戈 , 朴燕 . 改进的GDT-YOLOV3目标检测算法 [J]. 液晶与显示 , 2020 , 35 ( 8 ): 852 - 860 . doi: 10.37188/YJYXS20203508.0852 http://dx.doi.org/10.37188/YJYXS20203508.0852
TANG Y , WU G , PIAO Y . Improved algorithm of GDT-YOLOV3 image target detection [J]. Chinese Journal of Liquid Crystals and Displays , 2020 , 35 ( 8 ): 852 - 860 . (in Chinese) . doi: 10.37188/YJYXS20203508.0852 http://dx.doi.org/10.37188/YJYXS20203508.0852
李天宇 , 李栋 , 陈明举 , 等 . 一种高精度的卷积神经网络安全帽检测方法 [J]. 液晶与显示 , 2021 , 36 ( 7 ): 1018 - 1026 . doi: 10.37188/CJLCD.2020-0309 http://dx.doi.org/10.37188/CJLCD.2020-0309
LI T Y , LI D , CHEN M J , et al . High precision detection method of safety helmet based on convolution neural network [J]. Chinese Journal of Liquid Crystals and Displays , 2021 , 36 ( 7 ): 1018 - 1026 . (in Chinese) . doi: 10.37188/CJLCD.2020-0309 http://dx.doi.org/10.37188/CJLCD.2020-0309
DEVLIN J , CHANG M W , LEE K , et al . BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Minneapolis, Minnesota : Association for Computational Linguistics , 2019 . doi: 10.18653/v1/n19-1423 http://dx.doi.org/10.18653/v1/n19-1423
WANG X L , GIRSHICK R , GUPTA A , et al . Non-local neural networks [C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City : IEEE , 2018 : 7794 - 7803 . doi: 10.1109/cvpr.2018.00813 http://dx.doi.org/10.1109/cvpr.2018.00813
HUANG Z L , WANG X G , HUANG L C , et al . CCNet: criss-cross attention for semantic segmentation [C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea : IEEE , 2019 : 603 - 612 . doi: 10.1109/iccv.2019.00069 http://dx.doi.org/10.1109/iccv.2019.00069
YI J R , WU P X , METAXAS D N . ASSD: attentive single shot multibox detector [J]. Computer Vision and Image Understanding , 2019 , 189 : 102827 . doi: 10.1016/j.cviu.2019.102827 http://dx.doi.org/10.1016/j.cviu.2019.102827
DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16×16 words: transformers for image recognition at scale [C]// Proceedings of the 9th International Conference on Learning Representations . Virtual Event, Austria : OpenReview.net , 2021 .
CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C]// Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer , 2020 : 213 - 229 . doi: 10.1007/978-3-030-58452-8_13 http://dx.doi.org/10.1007/978-3-030-58452-8_13
BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate [C]// Proceedings of the 3rd International Conference on Learning Representations . San Diego , 2015 .
0
Views
119
下载量
3
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621