1.贵州大学 大数据与信息工程学院, 贵州 贵阳 550025
[ "张吉友(1998—),男,贵州瓮安人,硕士研究生,2021年于上海海事大学获得学士学位,主要从事计算机视觉、多模态语义分割及目标跟踪方面的研究。E-mail:1916537869@qq.com" ]
[ "张荣芬(1977—),女,贵州贵阳人,博士,教授,2014年于贵州大学获得博士学位,主要从事机器视觉、智能算法及智能硬件的研究。E-mail:rfzhang@gzu.edu.cn" ]
扫 描 看 全 文
张吉友, 张荣芬, 刘宇红, 等. 基于注意力机制的多模态图像语义分割[J]. 液晶与显示, 2023,38(7):975-984.
ZHANG Ji-you, ZHANG Rong-fen, LIU Yu-hong, et al. Multimodal image semantic segmentation based on attention mechanism[J]. Chinese Journal of Liquid Crystals and Displays, 2023,38(7):975-984.
张吉友, 张荣芬, 刘宇红, 等. 基于注意力机制的多模态图像语义分割[J]. 液晶与显示, 2023,38(7):975-984. DOI: 10.37188/CJLCD.2022-0309.
ZHANG Ji-you, ZHANG Rong-fen, LIU Yu-hong, et al. Multimodal image semantic segmentation based on attention mechanism[J]. Chinese Journal of Liquid Crystals and Displays, 2023,38(7):975-984. DOI: 10.37188/CJLCD.2022-0309.
当前许多语义分割模型利用的训练数据是RGB图像,在一些极端的环境下其模型的稳定性容易受到很大的影响,不能满足夜间场景自动驾驶的实际需求。为了解决夜间场景的语义分割问题,将ResNet-152作为特征提取网络,构建了一种融合轻量化注意力模块的多模态双编码器-解码器模型。双编码器从RGB-T两种模态中提取关键信息通过注意力模块后进行融合,然后将提取到的特征信息送入解码器,分阶段拼接上采样特征图和各层编码器提取的特征图,再通过卷积层进行特征提取,通过上采样还原分辨率,最后进行语义分割。实验结果表明,该模型在MFNet测试集上的平均准确率和平均交并比分别为76%和55.7%,相比于其他网络模型在指标上取得了一定的提升,达到了应用RGB-T模态图像精准进行日间及夜间场景语义分割的基本要求。
The training data of many current semantic segmentation models are RGB images, and the stability of the model is easily affected in some extreme environments. It cannot meet the actual demand of automatic driving at night. ResNet-152 is used as a feature extraction network to construct a multi-modal dual encoder-decoder model integrating lightweight attention module. The dual encoder extracts key information from the two modes of RGB-T and fuses it through the attention module. Then, the extracted feature information is sent to the decoder. The upsampled feature map and the feature map extracted by the encoder of each layer are spliced in stages, the feature is extracted by the convolution layer, the resolution is restored by upsampling, and the semantic segmentation is carried out at the last. The experimental results show that the mean accuracy and mean intersection over union of the proposed model on the MFNet test set are 76% and 55.7%, respectively, which makes a certain improvement compared with other network models. This model can basically achieve the requirement of accurate semantic segmentation of RGB-T modal images both day and night
夜间语义分割多模态轻量化注意力模块多尺度信息
night semantic segmentationmultimodallightweight attention modulemultiple scale information
YANG M K, YU K, ZHANG C, et al. DenseASPP for semantic segmentation in street scenes [C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 3684-3692. doi: 10.1109/cvpr.2018.00388http://dx.doi.org/10.1109/cvpr.2018.00388
ZUO C, QIAN J M, FENG S J, et al. Deep learning in optical metrology: a review [J]. Light: Science & Applications, 2022, 11(1): 39. doi: 10.1038/s41377-022-00714-xhttp://dx.doi.org/10.1038/s41377-022-00714-x
JIAN L H, YANG X M, LIU Z, et al. SEDRFuse: a symmetric encoder-decoder with residual block network for infrared and visible image fusion [J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 5002215. doi: 10.1109/tim.2020.3022438http://dx.doi.org/10.1109/tim.2020.3022438
CHEN H L, HUANG L Z, LIU T R, et al. Fourier Imager Network (FIN): a deep neural network for hologram reconstruction with superior external generalization [J]. Light: Science & Applications, 2022, 11(1): 254. doi: 10.1038/s41377-022-00949-8http://dx.doi.org/10.1038/s41377-022-00949-8
WU X Y, WU Z Y, GUO H, et al. DANNet: a one-stage domain adaptation network for unsupervised nighttime semantic segmentation [C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA: IEEE, 2021: 15764-15773. doi: 10.1109/cvpr46437.2021.01551http://dx.doi.org/10.1109/cvpr46437.2021.01551
FENG D, HAASE-SCHÜTZ C, ROSENBAUM L, et al. Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges [J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(3): 1341-1360. doi: 10.1109/tits.2020.2972974http://dx.doi.org/10.1109/tits.2020.2972974
熊海涛.车载热成像语义分割方法研究[D].广州:华南理工大学,2021. doi: 10.1016/j.infrared.2020.103628http://dx.doi.org/10.1016/j.infrared.2020.103628
XIONG H T. Research of thermal image semantic segmentation in driving scenes [D]. Guangzhou: South China University of Technology, 2021. (in Chinese). doi: 10.1016/j.infrared.2020.103628http://dx.doi.org/10.1016/j.infrared.2020.103628
杨云,周舒婕,李程辉,等.基于密集循环网络的视网膜图像分割方法[J].液晶与显示,2021,36(12):1702-1711. doi: 10.37188/CJLCD.2021-0142http://dx.doi.org/10.37188/CJLCD.2021-0142
YANG Y, ZHOU S J, LI C H, et al. Retinal image segmentation method based on dense cycle networks [J]. Chinese Journal of Liquid Crystals and Displays, 2021, 36(12): 1702-1711. (in Chinese). doi: 10.37188/CJLCD.2021-0142http://dx.doi.org/10.37188/CJLCD.2021-0142
赵战民,朱占龙,王军芬.改进的基于灰度级的模糊C均值图像分割算法[J].液晶与显示,2020,35(5):499-507. doi: 10.3788/yjyxs20203505.0499http://dx.doi.org/10.3788/yjyxs20203505.0499
ZHAO Z M, ZHU Z L, WANG J F. Improved fuzzy C-means algorithm based on gray-level for image segmentation [J]. Chinese Journal of Liquid Crystals and Displays, 2020, 35(5): 499-507. (in Chinese). doi: 10.3788/yjyxs20203505.0499http://dx.doi.org/10.3788/yjyxs20203505.0499
赵为平,陈雨,项松,等.基于改进的DeepLabv3+图像语义分割算法研究[J/OL].系统仿真学报,2022:1-12.[2022-10-12].https://doi.org/10.16182/j.issn1004731x.joss.22-0690https://doi.org/10.16182/j.issn1004731x.joss.22-0690.
ZHAO W P, CHEN Y, XIANG S, et al. Research on image semantic segmentation algorithm based on improved DeepLabv3+[J/OL]. Journal of System Simulation, 2022: 1-12. [2022-10-12]. https://doi.org/10.16182/j.issn1004731x.joss.22-0690.https://doi.org/10.16182/j.issn1004731x.joss.22-0690.(in Chinese)
任莎莎,刘琼,张晓东.基于Deeplab-v3+的小目标与边缘增强热图像语义分割网络[J].厦门大学学报(自然科学版),2022,61(4):701-713.
REN S S, LIU Q, ZHANG X D. A small-targets and edge-enhanced network for thermal image segmentation based on Deeplab-v3 + architecture [J]. Journal of Xiamen University (Natural Science), 2022, 61(4): 701-713. (in Chinese)
HAZIRBAS C, MA L N, DOMOKOS C, et al. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture [C]. 13th Asian Conference on Computer Vision. Taipei, China: Springer, 2017: 213-228. doi: 10.1007/978-3-319-54181-5_14http://dx.doi.org/10.1007/978-3-319-54181-5_14
LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3431-3440. doi: 10.1109/cvpr.2015.7298965http://dx.doi.org/10.1109/cvpr.2015.7298965
HA Q S, WATANABE K, KARASAWA T, et al. MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes [C]. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver, Canada: IEEE, 2017: 5108-5155. doi: 10.1109/iros.2017.8206396http://dx.doi.org/10.1109/iros.2017.8206396
SUN Y X, ZUO W X, LIU M. RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes [J]. IEEE Robotics and Automation Letters, 2019, 4(3): 2576-2583. doi: 10.1109/lra.2019.2904733http://dx.doi.org/10.1109/lra.2019.2904733
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. doi: 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90
SUN Y X, ZUO W X, YUN P, et al. FuseSeg: semantic segmentation of urban scenes based on RGB and thermal data fusion [J]. IEEE Transactions on Automation Science and Engineering, 2021, 18(3): 1000-1011. doi: 10.1109/tase.2020.2993143http://dx.doi.org/10.1109/tase.2020.2993143
HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269. doi: 10.1109/cvpr.2017.243http://dx.doi.org/10.1109/cvpr.2017.243
DENG F Q, FENG H, LIANG M J, et al. FEANet: feature-enhanced attention network for RGB-thermal real-time semantic segmentation [C]. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague: IEEE, 2021: 4467-4473. doi: 10.1109/iros51168.2021.9636084http://dx.doi.org/10.1109/iros51168.2021.9636084
WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA: IEEE, 2020: 11531-11539. doi: 10.1109/cvpr42600.2020.01155http://dx.doi.org/10.1109/cvpr42600.2020.01155
MILLETARI F, NAVAB N, AHMADI S A. V-Net: fully convolutional neural networks for volumetric medical image segmentation [C]. 2016 Fourth International Conference on 3D Vision (3DV). Stanford, CA, USA: IEEE, 2016: 565-571. doi: 10.1109/3dv.2016.79http://dx.doi.org/10.1109/3dv.2016.79
YI K, WU J X. Probabilistic end-to-end noise correction for learning with noisy labels [C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 7010-7018. doi: 10.1109/cvpr.2019.00718http://dx.doi.org/10.1109/cvpr.2019.00718
WANG W Y, NEUMANN U. Depth-aware CNN for RGB-D segmentation [C]//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich Germany: Springer, 2018: 144-161. doi: 10.1007/978-3-030-01252-6_9http://dx.doi.org/10.1007/978-3-030-01252-6_9
刘映雪, 朱福荣. 近红外光的可视化及其应用 [J]. 液晶与显示, 2021, 36(1):78-104. doi: 10.37188/CJLCD.2020-0287http://dx.doi.org/10.37188/CJLCD.2020-0287
LIU Y S, ZHU F R. Visualization of near-infrared light and applications [J]. Chinese Journal of Liquid Crystals and Displays, 2021, 36(1): 78-104.(in Chinese). doi: 10.37188/CJLCD.2020-0287http://dx.doi.org/10.37188/CJLCD.2020-0287
0
浏览量
24
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构