图1 YOLOv7模型结构图
Received:08 November 2022,
Revised:25 November 2022,
Published:05 August 2023
Scan QR Code
Cite this article
Wearing masks is an effective way for preventing COVID-19 and cooperating with the national epidemic prevention and control. An improved YOLOv7 algorithm is proposed to solve the problems such as whether masks are correctly worn, different shooting angles and being blocked. Based on YOLOv7, the convolutional attention mechanism is introduced into the Head region of the network to make the feature network more targeted in the processing of the mask region, thus enhancing the learning ability of the feature network to the mask region. The structure of Backbone area is optimized, the ConvNeXt network structure is improved, and partial convolution is introduced into the network instead, which improves the detection accuracy and robustness of the model and enhances the accuracy of prediction without introducing a large number of additional calculations. The space pyramid pool of the Head layer is improved to improve the training speed and accelerate the model convergence. Experiments show that in the case of complexity and occlusion, the loss function of the improved YOLOv7 decreases significantly, and the mAP on the test set is 93.8%, which is 3.6% higher than that of the original YOLOv7 algorithm.The accuracy of each category is improved, and the accuracy of no mask, correct mask and incorrect mask are increased by 6.8%, 2.1% and 1.7%, respectively. The cases of error detection are significantly reduced, and the generalization ability is significantly improved.
新冠疫情爆发以来,我国坚持动态清零政策,保证了极低的感染率、病亡率,而居民外出佩戴好口罩,仍是预防疫情反扑的重要方法[
随着社会的需要和深度学习[
本文将居民佩戴口罩时常见的3种情况(没有佩戴口罩、正确佩戴口罩、错误佩戴口罩)设为检测目标,对YOLOv7算法进行改进。在Head区域加入了卷积注意力机制(CBAM),从通道和空间两方面入手,使得网络更加关注于目标的重要特征,提高了网络对口罩佩戴目标的学习能力。在主干网络(Backbone)区引入了改进后的ConvNeXT,对原有的SPPCSPC进行改进,在原有池化层的结构上增加了串行连接,在不降低识别精度的同时加快了收敛速度和识别速度。
YOLOv7算法在2022年由Alexey Bochkovskiy团队提出,在检测精度和速度两方面均优于YOLOv5。YOLOv7的整体结构由输入层、主干网络(Backbone)、Head和预测端4部分组成,其模型结构如
图1 YOLOv7模型结构图
Fig.1 YOLOv7 model structure
Mosaic数据增强通过对图片进行随机的缩放、裁剪、排布来充实检测目标的背景变相地对batch_size进行提高。自适应锚框计算会在网络模型训练的初始状态设定好锚框,随后输出一个预测框,将锚框跟真实框进行对照,再多次计算误差并进行反馈,通过不断的计算和补偿来选取适应度最好的锚框,从而产生最后的预测框[
YOLOv7对Mosaic数据增强方法进行了优化。传统的Mosaic方法会选取4张图片进行增强,而YOLOv7则会根据函数的随机生成值与超参数值进行比较,当随机值过小时会关闭Mosaic数据增强功能,当随机值适中时会抽取4张图片进行增强,而随机值过大时则会选取9张图片进行增强,从而更加灵活地增加了数据的多样性。
主干网络Backbone由CBS模块、ELAN模块和MPC-B模块组成。其中CBS模块包含了卷积(Conv)、批正则化层(BN)和SiLU激活函数这三部分。MPC-B模块由1个池化层和3个CBS组成,作用是下采样,同时通过卷积和池化层的结合可以获取局部小区域所有值的信息,避免了池化层只获取最大值的弊端。ELAN模块由多个CBS组成,是一个高效聚合的网络结构,删除了1×1的卷积,提高了GPU计算效率,大幅降低了访问内存的消耗,并采用了梯度分割的思想,在卷积网络的输出和输入层直接添加了较短的连接,使得梯度流在不同的网络结构中传播,解决了输入以及梯度信息的过度膨胀,同时能控制最短和最长的梯度路径,使得网络能提取更多的特征,使训练更加高效和精确。
Head层主要由SPPCSPC模块、ELAN-H模块、MPC-N模块、UPSample模块和RepConv模块组成。SPPCSPC是一种空间金字塔池化改进模块[
最后的预测端包括了损失函数计算以及边界框预测。总体损失函数由定位损失、目标置信度损失和分类损失3部分组成,其中目标置信度和分类损失采用了BCEWithLogitsLoss算法,坐标损失采用CIoU算法。
除了网络结构的深度、宽度以及网络的基数这3个重要因素外,注意力机制也能提高卷积网络的性能[
图2 卷积注意力结构图
Fig.2 Convolutional attention structure diagram
卷积运算基于通道和空间两种信息来综合提取信息特征。卷积注意力机制(CBAM)从这两个方面入手,融合了通道注意模块(CAM)和空间注意模块(SAM)来提高卷积神经网络的学习能力。
通道注意力Mc来关注特征的通道信息,从而确定图像中由主要特征的目标及通道。通道注意力Mc的计算如
(1) |
式中:σ为sigmoid函数,W0∈RC/r×C,W1∈RC×C/r,当σ在W0前时为ReLU激活函数。空间注意模块(SAM)也使用了平均池化(AP)和最大池化(MP)的方法,采用了特征空间的关系来产生空间注意力Ms从而确定空间内包含主要特征信息的位置。Ms的计算如
(2) |
近几年来,Transformer[
图3 ConvNeXt改进结构图
Fig.3 Improved ConvNeXt structure diagram
空间金字塔池化的功能主要是使图像可以以任意大小和像素宽高比输入,其输入端可以接纳任意大小的图片[
图4 空间金字塔池化结构改进图
Fig.4 Diagram of improved space pyramid pool structure
(1)对Backbone主干网络中ELAN-H模块的特征输出进行CBS卷积操作,即分别经过卷积操作、批正则化和SiLU激活函数处理。
(2)对第一步中进行过一次CBS卷积操作的特征再进行两次CBS卷积操作,并在每个卷积操作后进行一次最大池化处理,随后将3次最大池化后的特征进行串并行连接,并进行特征层融合。
(3)将第二步中的结果经过两次CBS卷积处理后与第一步中的结果融合,再最后进行一次CBS处理,即可得到最终输出特征层。
本文的算法在Pycharm集成软件中实现,采用的编程语言为Python3.9,使用Pytorch 1.12.1作为深度学习框架,并使用了CUDA11.3硬件加速工具。实验平台使用了NVIDIA RTX3080 GPU,Intel(R)Core(TM)i7-12700KF @ 3.60 GHz处理器,操作系统为Win10,设备内存为32.0 GB。
由于目前网络上公开的口罩佩戴情况数据集较少,而且很少有不正确佩戴口罩的数据集,因此本文通过在互联网查找和组织同学拍摄共计9 000余张图片,采用LabelImg软件自行标注制作了口罩佩戴数据集。数据集包括了各种不同的场景,其中有不佩戴口罩、正确佩戴口罩和错误佩戴口罩3种情况,包含了正脸、左侧脸、右侧脸3种角度及不同的背景、光线和遮挡等情形,如
图5 数据集图片示例
Fig.5 Example of dataset images
数据分类包括没有佩戴口罩、正确佩戴口罩和不正确佩戴口罩3种,并将训练集、测试集以及验证集按照8∶1∶1的比例进行分割,训练批次为16,学习率0.005,模型迭代150次。
Defect type | Training set | Validation set | Test set | Total |
---|---|---|---|---|
No-mask | 3 112 | 778 | 778 | 3 890 |
Wearing correctly | 2 352 | 588 | 588 | 2 940 |
Wearing incorrectly | 1 920 | 480 | 480 | 2 400 |
本文采用精度(P)、召回率(R)、均值平均精度(mAP)作为评价指标检验模型的效果。精度和召回率的表达式为:
(3) |
(4) |
以本文检测中没有佩戴口罩类别No-mask为例,TP为训练完成的模型将没有佩戴口罩的图片目标检测为No-mask类别的数量,FP为模型将正确佩戴口罩以及错误佩戴口罩的图片目标检测为No-mask类别的数量,FN为模型将没有佩戴口罩的图片目标检测为正确佩戴口罩(Wearing correctly)和不正确佩戴口罩(Wearing incorrectly)类别的数量。精度(P)描述了模型对该类别分类的精确情况,召回率(R)描述了模型对该分类的漏检情况,平均精度(AP)是P-R曲线与横纵坐标正半轴所围成的面积,从精度(P)和召回率(R)两个方面评估模型在该类别上的检测效果。均值平均精度(mAP)是模型中所有分类的平均精度(AP)的均值,能有效评估该模型对所有分类的检测情况。平均精度(AP)和均值平均精度(mAP)的计算公式如
(5) |
(6) |
为了验证改进后的算法对口罩佩戴检测的效果,本文采用了两组对照实验,第一组将改进后的算法模型与和原始的YOLOv7算法以及不同改进部分的算法模型进行对照,第二组将改进后的算法与Faster-RCNN算法进行比较。
4.4.1 损失函数收敛对比
改进前后的YOLOv7 在训练过程中验证集的损失函数变化如
图6 损失函数对比
Fig.6 Example of dataset images
4.4.2 改进方法对模型性能的影响
本文通过对不同改进方法的检测指标进行对照实验,分析不同改进部分对网络性能提升情况,
Model | CBAM | ConvNeXt | SPPCSPCP | Precision/% | mAP/% |
---|---|---|---|---|---|
YOLOv7 | 90.2 | 90.2 | |||
YOLOv7-A | + | 93.4 | 90.9 | ||
YOLOv7-B | + | + | 92.9 | 91.9 | |
YOLOv7-C | + | + | + | 92.4 | 93.8 |
4.4.3 改进前后与主流检测模型的性能对比
模型训练完成后,采用多次检测抽样的统计方法,将验证集中的数据进行测试,并与改进前的YOLOv7以及Fast-RCNN进行对比,验证集中包含了单人、多人及侧面等不同情况,部分对比结果如
图7 Fast-RCNN、YOLOv7与改进YOLOv7的检测结果对比。
Fig.7 Comparison of Fast-RCNN, YOLOv7 and improved YOLOv7 detection results.
将本文算法与其他的检测算法的检测指标进行对比,结果如
Method | No-mask/% | Wearing correctly/% | Wearing incorrectly/% | mAP/ % |
---|---|---|---|---|
Faster-Rcnn | 74.1 | 70.9 | 72.2 | 72.4 |
YOLOv7 | 87.4 | 94.1 | 89.2 | 90.2 |
YOLOv7-A | 92.1 | 95.7 | 84.8 | 90.9 |
YOLOv7-B | 94.1 | 95.6 | 88.6 | 91.9 |
YOLOv7-C | 94.2 | 96.2 | 90.9 | 93.8 |
针对目前部分居民口罩佩戴不正确等问题,本文提出了一种改进YOLOv7的口罩佩戴检测算法。通过自行拍摄和网上搜集,丰富了口罩佩戴错误类别以及侧面遮挡等情况的数据集。通过在Head层引入卷积注意力机制,加强了网络结构在空间和通道上对有效特征的重视,提高了网络对口罩佩戴目标的学习能力。在Backbone层引入ConvNeXt网络结构,提高了网络结构的性能和鲁棒性。对Head层SPPCSPC模块进行优化,有效减少了损失函数,将平均精度从90.2%提高到93.8%。同时各个类别的检测精度均有提升,没佩戴口罩、正确佩戴口罩、不正确佩戴口罩类别的精度提升分别提升6.8%、2.1%、1.7%;并且减少了漏检和错检的情况,提高了系统的鲁棒性。
马丝妮,包刚升.“平衡抗疫”:前奥密克戎时期的新冠疫情防控研究[J].学术月刊,2022,54(4):78-99. [Baidu Scholar]
MA S N, BAO G S. “Balanced anti-epidemic”: a study on the prevention and control of the COVID-19 epidemic in the pre-omicron period [J]. Academic Monthly, 2022, 54(4): 78-99. (in Chinese) [Baidu Scholar]
曹素珍,温东森,陈星,等.新冠肺炎疫情期间我国居民佩戴口罩防护行为研究[J].环境科学研究,2020,33(7):1649-1658. [Baidu Scholar]
CAO S Z, WEN D S, CHEN X, et al. Protective behavior of Chinese population wearing masks during the COVID-19 epidemic [J]. Research of Environmental Sciences, 2020, 33(7): 1649-1658. (in Chinese) [Baidu Scholar]
SITU G H. Deep holography [J]. Light: Advanced Manufacturing, 2022, 3(2): 278-300. doi: 10.37188/lam.2022.013 [Baidu Scholar]
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/tpami.2016.2577031 [Baidu Scholar]
REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525. doi: 10.1109/cvpr.2017.690 [Baidu Scholar]
曾广华,杨桂忠,郭寿南,等.口罩实时检测系统的设计与应用[J].电视技术,2022,46(9):65-67. [Baidu Scholar]
ZENG G H, YANG G Z, GUO S N, et al. Design and application of real-time mask detection system [J]. Video Engineering, 2022, 46(9): 65-67. (in Chinese) [Baidu Scholar]
朱杰,王建立,王斌.基于YOLOv4-tiny改进的轻量级口罩检测算法[J].液晶与显示,2021,36(11):1525-1534. doi: 10.37188/CJLCD.2021-0059 [Baidu Scholar]
ZHU J, WANG J L, WANG B. Lightweight mask detection algorithm based on improved YOLOv4-tiny [J]. Chinese Journal of Liquid Crystals and Displays, 2021, 36(11): 1525-1534. (in Chinese). doi: 10.37188/CJLCD.2021-0059 [Baidu Scholar]
郑欣,田博,李晶晶.基于YOLO模型的宫颈细胞簇团智能识别方法[J].液晶与显示,2018,33(11):965-971. doi: 10.3788/yjyxs20183311.0965 [Baidu Scholar]
ZHENG X, TIAN B, LI J J. Intelligent recognition method of cervical cell cluster based on YOLO model [J]. Chinese Journal of Liquid Crystals and Displays, 2018, 33(11): 965-971. (in Chinese). doi: 10.3788/yjyxs20183311.0965 [Baidu Scholar]
李国友,李晨光,王维江,等.基于单样本学习的多特征人体姿态模型识别研究[J].光电工程,2021,48(2):200099. doi: 10.12086/oee.2021.200099 [Baidu Scholar]
LI G Y, LI C G, WANG W J, et al. Research on multi-feature human pose model recognition based on one-shot learning [J]. Opto-electronic Engineering, 2021, 48(2): 200099. (in Chinese). doi: 10.12086/oee.2021.200099 [Baidu Scholar]
马双双,王佳,曹少中,等.基于深度学习的二维人体姿态估计算法综述[J].计算机系统应用,2022,31(10):36-43. [Baidu Scholar]
MA S S, WANG J, CAO S Z, et al. Overview on two-dimensional human pose estimation methods based on deep learning [J]. Computer Systems & Applications, 2022, 31(10): 36-43. (in Chinese) [Baidu Scholar]
LUO Y, ZHAO Y F, LI J X, et al. Computational imaging without a computer: seeing through random diffusers at the speed of light [J]. eLight, 2022, 2: 4. doi: 10.1186/s43593-022-00012-4 [Baidu Scholar]
张润梅,毕利君,汪方斌,等.多尺度特征融合与锚框自适应的目标检测算法[J].激光与光电子学进展,2022,59(12):1215019. doi: 10.3788/LOP202259.1215019 [Baidu Scholar]
ZHANG R M, BI L J, WANG F B, et al. Multiscale feature fusion and anchor adaptive object detection algorithm [J]. Laser & Optoelectronics Progress, 2022, 59(12): 1215019. (in Chinese). doi: 10.3788/LOP202259.1215019 [Baidu Scholar]
丁勇,王翔,严晓浪.边缘自适应的四点分段抛物线图像缩放[J].浙江大学学报(工学版),2010,44(9):1637-1642. [Baidu Scholar]
DING Y, WANG X, YAN X L. Edge adaptive four-point piecewise parabolic scaler implementation [J]. Journal of Zhejiang University (Engineering Science), 2010, 44(9): 1637-1642. (in Chinese) [Baidu Scholar]
HU C P, BAI X, QI L, et al. Vehicle color recognition with spatial pyramid deep learning [J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(5): 2925-2934. doi: 10.1109/tits.2015.2430892 [Baidu Scholar]
ZUO C, QIAN J M, FENG S J, et al. Deep learning in optical metrology: a review [J]. Light: Science & Applications, 2022, 11(1): 39. doi: 10.1038/s41377-022-00714-x [Baidu Scholar]
FENG Y B, YANG X, QIU D W, et al. PCXRNet: pneumonia diagnosis from chest X-ray images using condense attention block and multiconvolution attention block [J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(4): 1484-1495. doi: 10.1109/jbhi.2022.3148317 [Baidu Scholar]
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale [C]. 9th International Conference on Learning Representations. Seattle: OpenReview.net, 2021: 1909-1931. [Baidu Scholar]
YANG X K, ZHAO J Y, ZHANG H Y, et al. Remote sensing image detection based on YOLOv4 improvements [J]. IEEE Access, 2022, 10: 95527-95538. doi: 10.1109/access.2022.3204053 [Baidu Scholar]
TANG Y L, GONG W G, CHEN X, et al. Deep inception-residual Laplacian pyramid networks for accurate single-image super-resolution [J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(5):1514-1528. doi: 10.1109/tnnls.2019.2920852 [Baidu Scholar]
506
Views
239
Downloads
2
CSCD
Related Articles
Related Author
Related Institution