1.东北林业大学 信息与计算机工程学院, 黑龙江 哈尔滨 150040
[ "史健锋(1997—),男,黑龙江齐齐哈尔人,硕士研究生,2019年于黑龙江大学获得学士学位,主要从事计算机视觉方面的研究。E-mail:2020111879@nefu.edu.cn" ]
[ "王阿川(1964—),男,黑龙江哈尔滨人,博士,教授,2011年于东北林业大学获得博士学位,主要从事计算机视觉、高分辨率遥感影像方面的研究。E-mail: wangca1964@126.com" ]
扫 描 看 全 文
史健锋, 相宁, 王阿川. 基于语义分割的高分辨率场景解析网络[J]. 液晶与显示, 2022,37(12):1598-1606.
SHI Jian-feng, XANG Ning, WANG A-chuan. High resolution scene parsing network based on semantic segmentation[J]. Chinese Journal of Liquid Crystals and Displays, 2022,37(12):1598-1606.
史健锋, 相宁, 王阿川. 基于语义分割的高分辨率场景解析网络[J]. 液晶与显示, 2022,37(12):1598-1606. DOI: 10.37188/CJLCD.2022-0174.
SHI Jian-feng, XANG Ning, WANG A-chuan. High resolution scene parsing network based on semantic segmentation[J]. Chinese Journal of Liquid Crystals and Displays, 2022,37(12):1598-1606. DOI: 10.37188/CJLCD.2022-0174.
为了高效地对城市景观等复杂场景进行分割解析,本文结合高分辨率网络(HRNet),通过金字塔池化模块(Pyramid pooling module,PPM)补充全局上下文信息,提出了一个高分辨率场景解析网络。首先,以HRNet为基干特征提取网络,并利用空洞可分离卷积改进其大量使用的残差模块,在减少参数量的同时提高了对于多尺度目标的分割能力;其次,利用混合空洞卷积框架设计了多级空洞率,在稠密感受野的同时减小了网格问题的影响;然后,设计了多阶段的连续上采样结构以改进HRNetV2简单的后融合机制;最后,使用改进的可适应不同图像分辨率的金字塔池化模块聚合不同区域的上下文信息获得高质量的分割图。在城市景观数据集(CityScapes)上仅以16.4 Mbit的参数数量实现了83.3% MIOU的精度,在Camvid数据集也取得了良好的效果,实现了更加可靠、准确、低计算量的基于语义分割的场景解析方法。
In order to efficiently segment and analyze complex scenes such as urban landscapes, this paper combines the high-resolution network (HRNet) and supplements the global context information through the pyramid pooling module, and proposes a high-resolution scene analysis network. Firstly, HRNet was used as the backbone feature extraction network, and the atrous separable convolution was used to improve its widely used residual module, so as to reduce the amount of parameters and improve the segmentation ability of multi-scale targets. Secondly, the mixed cavity convolution framework was used to design the multi-level cavity rate, which can dense the receptive field and reduce the influence of the grid problem. Then, a multi-stage continuous up-sampling structure was designed to improve the simple post fusion mechanism of HRNetV2. Finally, the improved pyramid pooling module which can adapt to different image resolutions was used to aggregate the context information of different regions to obtain high-quality segmentation images. The accuracy of 83.3% MIOU is achieved with only 16.4 Mbit parameters on the CityScapes urban landscape dataset, and good results are also achieved on the Camvid dataset. A more reliable, accurate, and low-computing scene analysis method based on semantic segmentation has realized.
深度学习神经网络语义分割高分辨率网络空洞卷积
deep learningneural networksemantic segmentationhigh resolution networkatrous convolution
王曦,于鸣,任洪娥.UNET与FPN相结合的遥感图像语义分割[J].液晶与显示,2021,36(3):475-483. doi: 10.37188/CJLCD.2020-0116http://dx.doi.org/10.37188/CJLCD.2020-0116
WANG X, YU M, REN H E. Remote sensing image semantic segmentation combining UNET and FPN [J]. Chinese Journal of Liquid Crystals and Displays, 2021, 36(3): 475-483. (in Chinese). doi: 10.37188/CJLCD.2020-0116http://dx.doi.org/10.37188/CJLCD.2020-0116
史健锋,高志明,王阿川.结合ASPP与改进HRNet的多尺度图像语义分割方法研究[J].液晶与显示,2021,36(11):1497-1505. doi: 10.37188/CJLCD.2021-0093http://dx.doi.org/10.37188/CJLCD.2021-0093
SHI J F, GAO Z M, WANG A C. Multi-scale image semantic segmentation based on ASPP and improved HRNet [J]. Chinese Journal of Liquid Crystals and Displays, 2021, 36(11): 1497-1505. (in Chinese). doi: 10.37188/CJLCD.2021-0093http://dx.doi.org/10.37188/CJLCD.2021-0093
SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651. doi: 10.1109/tpami.2016.2572683http://dx.doi.org/10.1109/tpami.2016.2572683
RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich: Springer, 2015: 234-241. doi: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28
BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/tpami.2016.2644615http://dx.doi.org/10.1109/tpami.2016.2644615
YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [C]//Proceedings of the 4th International Conference on Learning Representations. San Juan: ICLR, 2016.
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. doi: 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90
CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs [C]//Proceedings of the 3rd International Conference on Learning Representations. San Diego: ICLR, 2015: 357-361.
CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional Nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/tpami.2017.2699184http://dx.doi.org/10.1109/tpami.2017.2699184
CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [J]. arXiv, 2017: 1706.05587. doi: 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49
CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 833-851. doi: 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49
ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network [C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230-6239. doi: 10.1109/cvpr.2017.660http://dx.doi.org/10.1109/cvpr.2017.660
SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation [C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5686-5696. doi: 10.1109/cvpr.2019.00584http://dx.doi.org/10.1109/cvpr.2019.00584
LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 936-944. doi: 10.1109/cvpr.2017.106http://dx.doi.org/10.1109/cvpr.2017.106
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2012: 1097-1105.
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]//Proceedings of the 3rd International Conference on Learning Representations. San Diego: ICLR, 2015.
SUN K, ZHAO Y, JIANG B R, et al. High-resolution representations for labeling pixels and regions [J]. arXiv, 2019: 1904.04514.
WANG P Q, CHEN P F, YUAN Y, et al. Understanding convolution for semantic segmentation [C]//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Tahoe: IEEE, 2018: 1451-1460. doi: 10.1109/wacv.2018.00163http://dx.doi.org/10.1109/wacv.2018.00163
CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 3213-3223. doi: 10.1109/cvpr.2016.350http://dx.doi.org/10.1109/cvpr.2016.350
BROSTOW G J, SHOTTON J, FAUQUEUR J, et al. Segmentation and recognition using structure from motion point clouds [C]//Proceedings of the 10th European Conference on Computer Vision. Marseille: Springer, 2008: 44-57. doi: 10.1007/978-3-540-88682-2_5http://dx.doi.org/10.1007/978-3-540-88682-2_5
0
浏览量
144
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构