基于语义分割的高分辨率场景解析网络

史健锋; 相宁; 王阿川

doi:10.37188/CJLCD.2022-0174

您当前的位置：

首页 >

文章列表页 >

基于语义分割的高分辨率场景解析网络

图像处理 | 更新时间：2022-11-22

- 基于语义分割的高分辨率场景解析网络
- High resolution scene parsing network based on semantic segmentation
- 液晶与显示 2022年37卷第12期页码：1598-1606
- 作者机构：
  
  东北林业大学信息与计算机工程学院，黑龙江哈尔滨 150040
- 作者简介：
  
  [ "史健锋（1997—），男，黑龙江齐齐哈尔人，硕士研究生，2019年于黑龙江大学获得学士学位，主要从事计算机视觉方面的研究。E-mail：2020111879@nefu.edu.cn" ]
  [ "王阿川（1964—），男，黑龙江哈尔滨人，博士，教授，2011年于东北林业大学获得博士学位，主要从事计算机视觉、高分辨率遥感影像方面的研究。E-mail： wangca1964@126.com" ]
- 基金信息：
  
  国家自然科学基金(61975028);黑龙江省自然科学基金(F2018002)
- DOI：10.37188/CJLCD.2022-0174
  中图分类号： TP391.4
- 收稿日期：2022-05-19，
  
  修回日期：2022-06-09，
  
  纸质出版日期：2022-12-05
- 稿件说明：
移动端阅览
史健锋, 相宁, 王阿川. 基于语义分割的高分辨率场景解析网络[J]. 液晶与显示, 2022,37(12):1598-1606.

SHI Jian-feng, XANG Ning, WANG A-chuan. High resolution scene parsing network based on semantic segmentation[J]. Chinese journal of liquid crystals and displays, 2022, 37(12): 1598-1606.
史健锋, 相宁, 王阿川. 基于语义分割的高分辨率场景解析网络[J]. 液晶与显示, 2022,37(12):1598-1606. DOI： 10.37188/CJLCD.2022-0174.

SHI Jian-feng, XANG Ning, WANG A-chuan. High resolution scene parsing network based on semantic segmentation[J]. Chinese journal of liquid crystals and displays, 2022, 37(12): 1598-1606. DOI： 10.37188/CJLCD.2022-0174.

摘要

为了高效地对城市景观等复杂场景进行分割解析，本文结合高分辨率网络（HRNet），通过金字塔池化模块（Pyramid pooling module，PPM）补充全局上下文信息，提出了一个高分辨率场景解析网络。首先，以HRNet为基干特征提取网络，并利用空洞可分离卷积改进其大量使用的残差模块，在减少参数量的同时提高了对于多尺度目标的分割能力；其次，利用混合空洞卷积框架设计了多级空洞率，在稠密感受野的同时减小了网格问题的影响；然后，设计了多阶段的连续上采样结构以改进HRNetV2简单的后融合机制；最后，使用改进的可适应不同图像分辨率的金字塔池化模块聚合不同区域的上下文信息获得高质量的分割图。在城市景观数据集（CityScapes）上仅以16.4 Mbit的参数数量实现了83.3% MIOU的精度，在Camvid数据集也取得了良好的效果，实现了更加可靠、准确、低计算量的基于语义分割的场景解析方法。

Abstract

In order to efficiently segment and analyze complex scenes such as urban landscapes， this paper combines the high-resolution network （HRNet） and supplements the global context information through the pyramid pooling module， and proposes a high-resolution scene analysis network. Firstly， HRNet was used as the backbone feature extraction network， and the atrous separable convolution was used to improve its widely used residual module， so as to reduce the amount of parameters and improve the segmentation ability of multi-scale targets. Secondly， the mixed cavity convolution framework was used to design the multi-level cavity rate， which can dense the receptive field and reduce the influence of the grid problem. Then， a multi-stage continuous up-sampling structure was designed to improve the simple post fusion mechanism of HRNetV2. Finally， the improved pyramid pooling module which can adapt to different image resolutions was used to aggregate the context information of different regions to obtain high-quality segmentation images. The accuracy of 83.3% MIOU is achieved with only 16.4 Mbit parameters on the CityScapes urban landscape dataset， and good results are also achieved on the Camvid dataset. A more reliable， accurate， and low-computing scene analysis method based on semantic segmentation has realized.

关键词

Keywords

references

王曦，于鸣，任洪娥 . UNET与FPN相结合的遥感图像语义分割［J］. 液晶与显示， 2021 ， 36 （ 3 ）： 475 - 483 . doi: 10.37188/CJLCD.2020-0116 http://dx.doi.org/10.37188/CJLCD.2020-0116

WANG X ， YU M ， REN H E . Remote sensing image semantic segmentation combining UNET and FPN ［J］. Chinese Journal of Liquid Crystals and Displays ， 2021 ， 36 （ 3 ）： 475 - 483 . （in Chinese） . doi: 10.37188/CJLCD.2020-0116 http://dx.doi.org/10.37188/CJLCD.2020-0116

史健锋，高志明，王阿川 . 结合ASPP与改进HRNet的多尺度图像语义分割方法研究［J］. 液晶与显示， 2021 ， 36 （ 11 ）： 1497 - 1505 . doi: 10.37188/CJLCD.2021-0093 http://dx.doi.org/10.37188/CJLCD.2021-0093

SHI J F ， GAO Z M ， WANG A C . Multi-scale image semantic segmentation based on ASPP and improved HRNet ［J］. Chinese Journal of Liquid Crystals and Displays ， 2021 ， 36 （ 11 ）： 1497 - 1505 . （in Chinese） . doi: 10.37188/CJLCD.2021-0093 http://dx.doi.org/10.37188/CJLCD.2021-0093

SHELHAMER E ， LONG J ， DARRELL T . Fully convolutional networks for semantic segmentation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2017 ， 39 （ 4 ）： 640 - 651 . doi: 10.1109/tpami.2016.2572683 http://dx.doi.org/10.1109/tpami.2016.2572683

RONNEBERGER O ， FISCHER P ， BROX T . U-Net： convolutional networks for biomedical image segmentation ［C］// Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention . Munich ： Springer ， 2015 ： 234 - 241 . doi: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28

BADRINARAYANAN V ， KENDALL A ， CIPOLLA R . SegNet： a deep convolutional encoder-decoder architecture for image segmentation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2017 ， 39 （ 12 ）： 2481 - 2495 . doi: 10.1109/tpami.2016.2644615 http://dx.doi.org/10.1109/tpami.2016.2644615

YU F ， KOLTUN V . Multi-scale context aggregation by dilated convolutions ［C］// Proceedings of the 4th International Conference on Learning Representations . San Juan ： ICLR ， 2016 .

HE K M ， ZHANG X Y ， REN S Q ， et al . Deep residual learning for image recognition ［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas ： IEEE ， 2016 ： 770 - 778 . doi: 10.1109/cvpr.2016.90 http://dx.doi.org/10.1109/cvpr.2016.90

CHEN L C ， PAPANDREOU G ， KOKKINOS I ， et al . Semantic image segmentation with deep convolutional nets and fully connected CRFs ［C］// Proceedings of the 3rd International Conference on Learning Representations . San Diego ： ICLR ， 2015 ： 357 - 361 .

CHEN L C ， PAPANDREOU G ， KOKKINOS I ， et al . DeepLab： semantic image segmentation with deep convolutional Nets， atrous convolution， and fully connected CRFs ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2018 ， 40 （ 4 ）： 834 - 848 . doi: 10.1109/tpami.2017.2699184 http://dx.doi.org/10.1109/tpami.2017.2699184

CHEN L C ， PAPANDREOU G ， SCHROFF F ， et al . Rethinking atrous convolution for semantic image segmentation ［J］. arXiv ， 2017 ： 1706 .05587. doi: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49

CHEN L C ， ZHU Y K ， PAPANDREOU G ， et al . Encoder-decoder with atrous separable convolution for semantic image segmentation ［C］// Proceedings of the 15th European Conference on Computer Vision . Munich ： Springer ， 2018 ： 833 - 851 . doi: 10.1007/978-3-030-01234-2_49 http://dx.doi.org/10.1007/978-3-030-01234-2_49

ZHAO H S ， SHI J P ， QI X J ， et al . Pyramid scene parsing network ［C］// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu ： IEEE ， 2017 ： 6230 - 6239 . doi: 10.1109/cvpr.2017.660 http://dx.doi.org/10.1109/cvpr.2017.660

SUN K ， XIAO B ， LIU D ， et al . Deep high-resolution representation learning for human pose estimation ［C］// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach ： IEEE ， 2019 ： 5686 - 5696 . doi: 10.1109/cvpr.2019.00584 http://dx.doi.org/10.1109/cvpr.2019.00584

LIN T Y ， DOLLÁR P ， GIRSHICK R ， et al . Feature pyramid networks for object detection ［C］// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu ： IEEE ， 2017 ： 936 - 944 . doi: 10.1109/cvpr.2017.106 http://dx.doi.org/10.1109/cvpr.2017.106

KRIZHEVSKY A ， SUTSKEVER I ， HINTON G E . ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems . Lake Tahoe ： Curran Associates Inc. ， 2012 ： 1097 - 1105 .

SIMONYAN K ， ZISSERMAN A . Very deep convolutional networks for large-scale image recognition ［C］// Proceedings of the 3rd International Conference on Learning Representations . San Diego ： ICLR ， 2015 .

SUN K ， ZHAO Y ， JIANG B R ， et al . High-resolution representations for labeling pixels and regions ［J］. arXiv ， 2019 ： 1904 .04514.

WANG P Q ， CHEN P F ， YUAN Y ， et al . Understanding convolution for semantic segmentation ［C］// Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision （WACV） . Lake Tahoe ： IEEE ， 2018 ： 1451 - 1460 . doi: 10.1109/wacv.2018.00163 http://dx.doi.org/10.1109/wacv.2018.00163

CORDTS M ， OMRAN M ， RAMOS S ， et al . The cityscapes dataset for semantic urban scene understanding ［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Las Vegas ： IEEE ， 2016 ： 3213 - 3223 . doi: 10.1109/cvpr.2016.350 http://dx.doi.org/10.1109/cvpr.2016.350

BROSTOW G J ， SHOTTON J ， FAUQUEUR J ， et al . Segmentation and recognition using structure from motion point clouds ［C］// Proceedings of the 10th European Conference on Computer Vision . Marseille ： Springer ， 2008 ： 44 - 57 . doi: 10.1007/978-3-540-88682-2_5 http://dx.doi.org/10.1007/978-3-540-88682-2_5

浏览量

279

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

结合ASPP与改进HRNet的多尺度图像语义分割方法研究

FRKDNet：基于知识蒸馏的特征提炼语义分割网络