Attention and cross-scale fusion for vehicle and pedestrian detection

LI Jian-dong; LI Jia-qi; QU Hai-cheng

doi:10.37188/CJLCD.2023-0037

您当前的位置：

首页 >

文章列表页 >

Attention and cross-scale fusion for vehicle and pedestrian detection

Image Processing | 更新时间：2024-07-28

- Attention and cross-scale fusion for vehicle and pedestrian detection
- Chinese Journal of Liquid Crystals and Displays Vol. 38, Issue 12, Pages: 1707-1716(2023)
- 作者机构：
  
  1.辽宁工程技术大学软件学院，辽宁葫芦岛 125105
  2.辽宁工程技术大学矿业学院，辽宁阜新 123000
- 作者简介：
- 基金信息：
  
  the Department of Education Fund Item(LJKZ0350);of Liaoning Province(LNTU20TD-23)
- DOI：10.37188/CJLCD.2023-0037
  CLC： TP391.4
- Received：06 February 2023，
  
  Revised：13 March 2023，
  
  Published：05 December 2023
- 稿件说明：
移动端阅览
LI Jian-dong, LI Jia-qi, QU Hai-cheng. Attention and cross-scale fusion for vehicle and pedestrian detection[J]. Chinese journal of liquid crystals and displays, 2023, 38(12): 1707-1716.
DOI：

LI Jian-dong, LI Jia-qi, QU Hai-cheng. Attention and cross-scale fusion for vehicle and pedestrian detection[J]. Chinese journal of liquid crystals and displays, 2023, 38(12): 1707-1716. DOI： 10.37188/CJLCD.2023-0037.

摘要

针对道路交通中目标所处环境复杂，存在模型对关键特征提取不充分、目标定位准确率低的问题，选取SSD模型为基本框架展开了特征提取方式、关键信息增强和非局部性特征定位的研究。首先，为针对性地解决道路交通场景下目标多尺度的问题，提出跳跃式反向特征金字塔结构，生成更具判别力的特征；其次，为解决不同语义层次的信息对特征融合过程贡献度不同的问题，设计基于注意力机制的自适应特征融合模块，在通道层面非先验地增强关键特征表达能力；最后，引入十字交叉注意力模块，提升模型对目标的位置敏感度。实验结果表明，与原始SSD模型相比，在保证实时性的情况下，改进方法的精度均值在PASCAL VOC子数据集上提升了2.6%，在自制道路交通数据集上提升了3.9%。综合考量，改进算法可广泛适用于道路车辆与行人检测任务中。

Abstract

Due to the complex environment of the target in road traffic， there exist the problems of the insufficient extraction of key features by the model and the low accuracy of target positioning. The SSD model is used as the basic framework in this paper， and research is conducted on feature extraction methods， key information enhancement， and non-local feature positioning. Firstly， in order to solve the multi-scale problem of targets in road traffic scenarios， a jumping reverse feature pyramid structure is proposed to generate more discriminant features. Secondly， in order to solve the problem that information at different semantic levels has different degrees of contribution to the feature fusion process， an adaptive feature fusion module based on attention mechanism is designed to enhance the key feature expression ability non-priori at the channel level. Finally， the cross-attention module is introduced to improve the position sensitivity of the model to the target. Experimental results indicate that compared with the original model of SSD， in guarantee under the condition of real-time， the average accuracy of the proposed algorithm is improved by 2.6% on PASCAL VOC sub-dataset and 3.9% on homemade road traffic dataset. Taking everything into account， the improved algorithm can be applied widely to the task of detecting vehicles and pedestrians on the road.

关键词

Keywords

references

XUE Z J ， CHEN W J ， LI J . Enhancement and fusion of multi-scale feature maps for small object detection ［C］. 2020 39th Chinese Control Conference （CCC）. Shenyang ： IEEE ， 2020 ： 7212 - 7217 . doi: 10.23919/ccc50068.2020.9189352 http://dx.doi.org/10.23919/ccc50068.2020.9189352

LIU Y ， MA Z ， LIU X M ， et al . Privacy-preserving object detection for medical images with faster R-CNN ［J］. IEEE Transactions on Information Forensics and Security ， 2022 ， 17 ： 69 - 84 . doi: 10.1109/tifs.2019.2946476 http://dx.doi.org/10.1109/tifs.2019.2946476

JAEGER P F ， KOHL S A A ， BICKELHAUPT S ， et al . Retina U-Net： embarrassingly simple exploitation of segmentation supervision for medical object detection ［C］. Machine Learning for Health Workshop . Vancouver ： PMLR ， 2020 ： 171 - 183 . doi: 10.1007/978-1-4842-6543-7_10 http://dx.doi.org/10.1007/978-1-4842-6543-7_10

SAKHARE K V ， TEWARI T ， VYAS V . Review of vehicle detection systems in advanced driver assistant systems ［J］. Archives of Computational Methods in Engineering ， 2020 ， 27 （ 2 ）： 591 - 610 . doi: 10.1007/s11831-019-09321-3 http://dx.doi.org/10.1007/s11831-019-09321-3

ZHANG L L ， LIN L ， LIANG X D ， et al . Is faster R-CNN doing well for pedestrian detection？［C］. 14th European Conference on Computer Vision . Amsterdam ： Springer ， 2016 ： 443 - 457 . doi: 10.1007/978-3-319-46475-6_28 http://dx.doi.org/10.1007/978-3-319-46475-6_28

PANG J M ， CHEN K ， SHI J P ， et al . Libra R-CNN： towards balanced learning for object detection ［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach ： IEEE ， 2019 ： 821 - 830 . doi: 10.1109/cvpr.2019.00091 http://dx.doi.org/10.1109/cvpr.2019.00091

白创，王英杰，闫昱，等．基于多向特征金字塔的轻量级目标检测算法［J］．液晶与显示， 2021 ， 36 （ 11 ）： 1516 - 1524 ． doi: 10.37188/CJLCD.2021-0018 http://dx.doi.org/10.37188/CJLCD.2021-0018

BAI C ， WANG Y J ， YAN Y ， et al . Lightweight object detection algorithm based on multi-directional feature pyramid ［J］. Chinese Journal of Liquid Crystals and Displays ， 2021 ， 36 （ 11 ）： 1516 - 1524 . （in Chinese） . doi: 10.37188/CJLCD.2021-0018 http://dx.doi.org/10.37188/CJLCD.2021-0018

WANG H L ， TIAN S H ， ZHANG Z A ， et al . A improved Yolov4’s vehicle and pedestrian detection method ［C］. ICMLCA 2021 ； 2nd International Conference on Machine Learning and Computer Application. Shenyang ： VDE ， 2021： 1 - 7 .

李经宇，杨静，孔斌，等 . 基于注意力机制的多尺度车辆行人检测算法［J］. 光学精密工程， 2021 ， 29 （ 6 ）： 1448 - 1458 ． doi: 10.37188/OPE.20212906.1448 http://dx.doi.org/10.37188/OPE.20212906.1448

LI J Y ， YANG J ， KONG B ， et al . Multi-scale vehicle and pedestrian detection algorithm based on attention mechanism ［J］. Optics and Precision Engineering ， 2021 ， 29 （ 6 ）： 1448 - 1458 . （in Chinese） . doi: 10.37188/OPE.20212906.1448 http://dx.doi.org/10.37188/OPE.20212906.1448

LI Y F ， WANG X Q ， HE Y ， et al . Deep spatial-temporal feature extraction and lightweight feature fusion for tool condition monitoring ［J］. IEEE Transactions on Industrial Electronics ， 2022 ， 69 （ 7 ）： 7349 - 7359 . doi: 10.1109/tie.2021.3102443 http://dx.doi.org/10.1109/tie.2021.3102443

董小伟，韩悦，张正，等 . 基于多尺度加权特征融合网络的地铁行人目标检测算法［J］. 电子与信息学报， 2021 ， 43 （ 7 ）： 2113 - 2120 ． doi: 10.11999/JEIT200450 http://dx.doi.org/10.11999/JEIT200450

DONG X W ， HAN Y ， ZHANG Z ， et al . Metro pedestrian detection algorithm based on multi-scale weighted feature fusion network ［J］. Journal of Electronics & Information Technology ， 2021 ， 43 （ 7 ）： 2113 - 2120 . （in Chinese） . doi: 10.11999/JEIT200450 http://dx.doi.org/10.11999/JEIT200450

邹梓吟，盖绍彦，达飞鹏，等 . 基于注意力机制的遮挡行人检测算法［J］. 光学学报， 2021 ， 41 （ 15 ）： 1515001 ． doi: 10.3788/aos202141.1515001 http://dx.doi.org/10.3788/aos202141.1515001

ZOU Z Y ， GAI S Y ， DA F P ， et al . Occluded pedestrian detection algorithm based on attention mechanism ［J］. Acta Optica Sinica ， 2021 ， 41 （ 15 ）： 1515001 . （in Chinese） . doi: 10.3788/aos202141.1515001 http://dx.doi.org/10.3788/aos202141.1515001

LIN T Y ， DOLLÁR P ， GIRSHICK R ， et al . Feature pyramid networks for object detection ［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu ： IEEE ， 2017 ： 936 - 944 . doi: 10.1109/cvpr.2017.106 http://dx.doi.org/10.1109/cvpr.2017.106

LIU S ， QI L ， QIN H F ， et al . Path aggregation network for instance segmentation ［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City ： IEEE ， 2018 ： 8759 - 8768 . doi: 10.1109/cvpr.2018.00913 http://dx.doi.org/10.1109/cvpr.2018.00913

NIE Y D ， BIAN C J ， LI L G ， et al . LFC-SSD： multiscale aircraft detection based on local feature correlation ［J］. IEEE Geoscience and Remote Sensing Letters ， 2022 ， 19 ： 6510505 . doi: 10.1109/lgrs.2022.3177836 http://dx.doi.org/10.1109/lgrs.2022.3177836

CHOLLET F . Xception： deep learning with depthwise separable convolutions ［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu ： IEEE ， 2017 ： 1800 - 1807 . doi: 10.1109/cvpr.2017.195 http://dx.doi.org/10.1109/cvpr.2017.195

SANDLER M ， HOWARD A ， ZHU M L ， et al . MobileNetV2： inverted residuals and linear bottlenecks ［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City ： IEEE ， 2018 ： 4510 - 4520 . doi: 10.1109/cvpr.2018.00474 http://dx.doi.org/10.1109/cvpr.2018.00474

MA N N ， ZHANG X Y ， ZHENG H T ， et al . ShuffleNet V2： practical guidelines for efficient CNN architecture design ［C］// Proceedings of the 15th European Conference on Computer Vision （ECCV） . Munich ： Springer ， 2018 ： 122 - 138 . doi: 10.1007/978-3-030-01264-9_8 http://dx.doi.org/10.1007/978-3-030-01264-9_8

LIANG J H ， ZHANG T ， FENG G Q . Channel compression： rethinking information redundancy among channels in CNN architecture ［J］. IEEE Access ， 2020 ， 8 ： 147265 - 147274 . doi: 10.1109/access.2020.3015714 http://dx.doi.org/10.1109/access.2020.3015714

HUANG S H ， LU Z C ， CHENG R ， et al . FaPN： Feature-aligned pyramid network for dense image prediction ［C］// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal ： IEEE ， 2021 ： 844 - 853 . doi: 10.1109/iccv48922.2021.00090 http://dx.doi.org/10.1109/iccv48922.2021.00090

DAI Y M ， GIESEKE F ， OEHMCKE S ， et al . Attentional feature fusion ［C］// Proceedings of the IEEE Winter Conference on Applications of Computer Vision . Waikoloa ： IEEE ， 2021 ： 3559 - 3568 . doi: 10.1109/wacv48630.2021.00360 http://dx.doi.org/10.1109/wacv48630.2021.00360

LI X ， WANG W H ， HU X L ， et al . Selective kernel networks ［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach ： IEEE ， 2019 ： 510 - 519 . doi: 10.1109/cvpr.2019.00060 http://dx.doi.org/10.1109/cvpr.2019.00060

ZHANG H ， WU C R ， ZHANG Z Y ， et al . ResNeSt： split-attention networks ［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans ： IEEE ， 2022 ： 2735 - 2745 . doi: 10.1109/cvprw56347.2022.00309 http://dx.doi.org/10.1109/cvprw56347.2022.00309

WANG Q L ， WU B G ， ZHU P F ， et al . ECA-Net： efficient channel attention for deep convolutional neural networks ［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle ： IEEE ， 2020 ： 11531 - 11539 . doi: 10.1109/cvpr42600.2020.01155 http://dx.doi.org/10.1109/cvpr42600.2020.01155

WANG X L ， GIRSHICK R ， GUPTA A ， et al . Non-local neural networks ［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City ： IEEE ， 2018 ： 7794 - 7803 . doi: 10.1109/cvpr.2018.00813 http://dx.doi.org/10.1109/cvpr.2018.00813

HUANG Z L ， WANG X G ， HUANG L C ， et al . CCNet： Criss-cross attention for semantic segmentation ［C］// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul ： IEEE ， 2019 ： 603 - 612 . doi: 10.1109/iccv.2019.00069 http://dx.doi.org/10.1109/iccv.2019.00069

BELL S ， ZITNICK C L ， BALA K ， et al . Inside-outside net： detecting objects in context with skip pooling and recurrent neural networks ［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas ： IEEE ， 2016 ： 2874 - 2883 . doi: 10.1109/cvpr.2016.314 http://dx.doi.org/10.1109/cvpr.2016.314

REN S Q ， HE K M ， GIRSHICK R ， et al . Faster R-CNN： towards real-time object detection with region proposal networks ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems . Montreal ： ACM ， 2015 ： 91 - 99 .

TIAN Z ， SHEN C H ， CHEN H ， et al . FCOS： fully convolutional one-stage object detection ［C］// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul ： IEEE ， 2019 ： 9626 - 9635 . doi: 10.1109/iccv.2019.00972 http://dx.doi.org/10.1109/iccv.2019.00972

HWANG B ， LEE S ， HAN H . LNFCOS： efficient object detection through deep learning based on LNblock ［J］. Electronics ， 2022 ， 11 （ 17 ）： 2783 . doi: 10.3390/electronics11172783 http://dx.doi.org/10.3390/electronics11172783

FU C Y ， LIU W ， RANGA A ， et al . DSSD： deconvolutional single shot detector ［J/OL］. arXiv ， 2017 ： 1701 . 06659 .

AHMAD T ， CHEN X N ， SAQLAIN A S ， et al . EDF-SSD： An improved feature fused SSD for object detection ［C］. 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics （ICCCBDA） . Chengdu ： IEEE ， 2021 ： 469 - 473 . doi: 10.1109/icccbda51879.2021.9442501 http://dx.doi.org/10.1109/icccbda51879.2021.9442501

REDMON J ， FARHADI A . YOLOv3： an incremental improvement ［J/OL］. arXiv ， 2018 ： 1804 . 02767 . doi: 10.1109/cvpr.2017.690 http://dx.doi.org/10.1109/cvpr.2017.690

HUANG Z C ， WANG J L ， FU X S ， et al . DC-SPP-YOLO： dense connection and spatial pyramid pooling based YOLO for object detection ［J］. Information Sciences ， 2020 ， 522 ： 241 - 258 . doi: 10.1016/j.ins.2020.02.067 http://dx.doi.org/10.1016/j.ins.2020.02.067

JEONG J ， PARK H ， KWAK N . Enhancement of SSD by concatenating feature maps for object detection ［C］. British Machine Vision Conference 2017. London ： BMVC ， 2017 . doi: 10.5244/c.31.76 http://dx.doi.org/10.5244/c.31.76

刘涛，汪西莉 . 采用卷积核金字塔和空洞卷积的单阶段目标检测［J］. 中国图象图形学报， 2020 ， 25 （ 1 ）： 102 - 112 ． doi: 10.11834/jig.190166 http://dx.doi.org/10.11834/jig.190166

LIU T ， WANG X L . Single-stage object detection using filter pyramid and atrous convolution ［J］. Journal of Image and Graphics ， 2020 ， 25 （ 1 ）： 102 - 112 . （in Chinese） . doi: 10.11834/jig.190166 http://dx.doi.org/10.11834/jig.190166

贾天豪，彭力 . 残差学习与循环注意力下的SSD目标检测算法［J］. 计算机科学， 2023 ， 50 （ 5 ）： 170 - 176 ． doi: 10.11896/jsjkx.220400085 http://dx.doi.org/10.11896/jsjkx.220400085

JIA T H ， PENG L . SSD object detection algorithm with residual learning and cyclic attention ［J］. Computer Science ， 2023 ， 50 （ 5 ）： 170 - 176 . （in Chinese） . doi: 10.11896/jsjkx.220400085 http://dx.doi.org/10.11896/jsjkx.220400085

姜竣，翟东海 . 基于空洞卷积与特征增强的单阶段目标检测算法［J］. 计算机工程， 2021 ， 47 （ 7 ）： 232 - 238，248 ．

JIANG J ， ZHAI D H . Single-stage object detection algorithm based on dilated convolution and feature enhancement ［J］. Computer Engineering ， 2021 ， 47 （ 7 ）： 232 - 238， 248 . （in Chinese）

叶召元，郑建立 . 基于自动驾驶场景的目标检测算法DFSSD ［J］. 计算机工程与应用， 2020 ， 56 （ 16 ）： 139 - 147 ．

YE Z Y ， ZHENG J L . Object detection algorithm DFSSD based on automatic driving scene ［J］. Computer Engineering and Applications ， 2020 ， 56 （ 16 ）： 139 - 147 . （in Chinese）

DAI J F ， LI Y ， HE K M ， et al . R-FCN： object detection via region-based fully convolutional networks ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems . Barcelona ： ACM ， 2016 ： 379 - 387 .

Views

193

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

3D hand reconstruction method based on adaptive occlusion recovery and topology-pose bidirectional perception

Fatigue driving detection based on improved YOLOv8n-Pose

Remote sensing scene classification model based on improved ShuffleNetV2 network

Lens-free adversarial encoded imaging method based on convolutional attention mechanism

Related Author

LI Jian-dong

LIU Jia

HUANG Nanxuan

CHEN Dapeng

WEI Lina

CAI Zhongqi

LIN Shanling

LIN Jianpu

Related Institution

School of Automation， Nanjing University of Information Science & Technology

School of Computer and Computing Science， Hangzhou City University

School of Advanced Manufacturing， Fuzhou University

Fujian Science and Technology Innovation Laboratory for Photoelectric Information

Digital Center， Changchun Institute of Optics， Fine Mechanics and Physics， Chinese Academy of Sciences

AI问答

Address：No.3888 Dong Nanhu Road, Changchun, Jilin, China 130033 Postal code：130033
Tel：0431-86176059 Email：yjxs@ciomp.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 吉ICP备11002662号-17 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰