最新刊期

    39 8 2024

      Image Segmentation

    • 在遥感图像地物分割领域,专家提出了一种基于轻量化网络的DeepLabV3+改进方法,有效解决了细节信息丢失和类别不均衡问题,显著提升了分割性能和准确性。
      MA Jing,GUO Zhonghua,MA Zhiqiang,MA Xiaoyan,LI Jialong
      Vol. 39, Issue 8, Pages: 1001-1013(2024) DOI: 10.37188/CJLCD.2023-0293
      摘要:A lightweight network based DeepLabV3+ remote sensing image land feature segmentation method is proposed to address the errors caused by the loss of detail information and imbalanced categories in remote sensing image segmentation. Firstly, MobileNetV2 is adopted to replace the backbone network in original baseline network to improve training efficiency and reduce model complexity. Secondly, the dilation rate of atrous convolutions within ASPP structure is increased and max-pooling in final ASPP layer is incorporated to effectively capture context information at different scales. At the same time, SE attention mechanism is introduced into each branch of ASPP, and ECA attention mechanism is introduced after extracting shallow features to improve the model’s perception ability for different categories and details. Finally, the weighted Dice-Local joint loss function is used for optimization to address class imbalance issues. The improved model is validated on both the CCF and Huawei Ascend Cup competition datasets. Experimental results show that the proposed method outperforms original DeepLabV3+ model on both test sets, with various metrics showing different degrees of improvement. Among them, mIoU reaches 73.47% and 63.43%, representing improvements of 3.24% and 15.11%, respectively. The accuracy reaches 88.28% and 86.47%, showing enhancements of 1.47% and 7.83%, respectively. The F1 index reaches 84.29% and 77.04%, increasing by 3.86% and 13.46%, respectively. The improved DeepLabV3+ model can better solve the problems of loss of detail information and class imbalance, which improves the performance and accuracy of remote sensing image feature segmentation.  
      关键词:MobileNetV2;Dilated convolution;Geometric figure;Attention mechanism   
      47
      |
      9
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 42811623 false
      发布时间:2024-08-16
    • 在中医舌诊领域,研究者提出了一种基于双模型互学习的半监督舌体图像分割方法,通过少量标签数据实现高精度分割,为中医舌象分析和诊疗数字化提供了新方案。
      LI Fangxu,XU Wangming,XU Xue,JIA Yun
      Vol. 39, Issue 8, Pages: 1014-1023(2024) DOI: 10.37188/CJLCD.2023-0308
      摘要:Accurate tongue image segmentation is a crucial prerequisite for objective analysis in tongue diagnosis in traditional Chinese medicine (TCM). At present, the widely-used full-supervised segmentation methods require a large number of pixel-level annotated samples for training, and the single-model-based semi-supervised segmentation methods lack the ability to self-correct the learned error knowledge. To address this issue, a novel semi-supervised tongue image segmentation method based on mutual learning with dual models is proposed. Firstly, model A and B undergo supervised training on the labeled datasets. Subsequently, model A and B enter the mutual learning phase, utilizing a designed mutual learning loss function, in which different weights are assigned based on the disagreement between predictions of the two models on the unlabeled data. Model A generates the pseudo-labels for the unlabeled dataset, and model B fine-tunes on both the labeled dataset and the pseudo-labeled dataset. Then, model B generates the pseudo-labels for the unlabeled dataset, and model A fine-tunes in the same manner. After the dual-model fine-tuning process, the model with better performance is selected as the final tongue image segmentation model. Experimental results show that with labeled data proportions of 1/100, 1/50, 1/25, and 1/8, the mean intersection over union (mIoU) achieved by the proposed method is 96.67%, 97.92%, 98.52%, and 98.85%, respectively, outperforming other typical semi-supervised methods compared. The proposed method achieves high precision in tongue image segmentation with only a small number of labeled data, laying a solid foundation for subsequent applications in TCM such as tongue color, tongue shape and other tongue image analysis, which can promote the digitization of TCM diagnosis and treatment.  
      关键词:semi-supervised;mutual learning;tongue image segmentation;loss function;digitization of TCM   
      42
      |
      10
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 43464456 false
      发布时间:2024-08-16

      Image Enhancement

    • WU Chunlin,ZHANG Yongai,LIN Zhixian,GUO Tailiang,LIN Pengfei,LIN Jianpu
      Vol. 39, Issue 8, Pages: 1024-1036(2024) DOI: 10.37188/CJLCD.2023-0255
      摘要:Deep learning based high dynamic range (HDR) image processing algorithms has the problem of skin color deviation when processing images containing human figures. In response to this issue, this article proposes a portrait HDR image processing algorithm based on multi feature fusion-U²HDRnet. This algorithm consists of three parts: skin feature extraction module, trilateral feature extraction module and color reconstruction module. Firstly, the skin feature extraction module separates the color and position information of the skin region. Secondly, the trilateral feature extraction module extracts local features, global features and semantic features of the image, and fuses them with skin features. Finally, the color reconstruction module interpolates the grid in terms of space and color depth. In addition, this article adds an improved fusion module of self attention and convolution to improve the processing performance of HDR. At the same time, this article also produces the PortraitHDR dataset for portraits, filling the gap in the dataset in this field. The test results show that the PSNR of U²HDRnet reaches 31.42 dB, and the SSIM reaches 0.985, both of which are superior to the commonly used HDR algorithms. They obtain high-quality portrait HDR images while avoiding skin distortion.  
      关键词:deep learning;high dynamic range;Skin feature extraction;attention mechanism;Color reconstruction   
      1
      |
      4
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 43069136 false
      发布时间:2024-08-16
    • 在水下图像增强领域,研究者提出了一种基于像素级通道自适应的算法,通过分通道特征提取和色偏纠正,显著提升了图像质量。
      PENG Yanfei,ZHANG Tianqi,AN Tong
      Vol. 39, Issue 8, Pages: 1037-1045(2024) DOI: 10.37188/CJLCD.2023-0276
      摘要:The existing algorithms based on deep learning enhance underwater images in high-dimensional features by encoding and decoding, without considering the channel difference degradation characteristics of underwater images, resulting in generally poor enhancement effects. To solve this problem, this paper proposes an underwater image pixel-level channel enhancement algorithm based on deep learning, which enhances the underwater image at the pixel level into three channels: R, G and B. The algorithm is divided into four stages, and the whole enhancement process is completed through four stages of sub-channel feature extraction. The network first fixes the color channel of the context by enhancing the local and global semantics of the network and optimizing the channel attenuation. Secondly, the spatial and channel features are aggregated by an attention mechanism, and irrelevant color localization jump information is suppressed. Then, the adaptive features are adjusted by optimizing the attention mechanism. Finally, in order to improve the ability of color shift correction, a color shift correction module is proposed. In the fourth stage, a color shift adjustment module is used to further adjust the color shift problem of the image. Experimental results show that compared with other algorithms on the UIEB dataset and EUVP dataset, the proposed algorithm improves the PSNR index by 14.35%, the SSIM index by 5.8%, the UIQM index by 3.2%, and the UCIQE index by 13.7%, and has the best subjective effect.  
      关键词:underwater image enhancement;channel enhancement;pixel-wise enhancement;deep learning   
      96
      |
      14
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 42232401 false
      发布时间:2024-08-16
    • 在水下图像增强领域,本研究基于Unet网络,通过多尺度输入和特征融合,引入颜色校正和双重注意力模块,有效提升了图像质量。实验结果显示,算法在多个数据集上显著优于其他方法。
      TAO Yang,WU Ping,LIU Yuting,FANG Wenjun,ZHOU Liqun
      Vol. 39, Issue 8, Pages: 1046-1056(2024) DOI: 10.37188/CJLCD.2023-0274
      摘要:To solve the problem of color distortion and detail blur in underwater images, the algorithm in this paper takes Unet network as the basic framework, inputs multi-scale images in different coding layers at the same time, fuses feature streams between upper and lower layers to obtain better detail preservation effect, and realizes the ability of extracting details from coarse to fine. In addition, the color correction module and dual attention module are introduced to effectively solve the problem of color deviation and uneven detail recovery in underwater images. The experimental results show that PSNR and UIQM indexes of the images enhanced by the proposed algorithm on UFO, EUVP and UIEB data sets increase by 21.3% and 25.6% respectively, compared with the original images. This algorithm can effectively improve the visual quality of underwater images, and is superior to other algorithms in subjective visual and objective evaluation indexes.  
      关键词:underwater image enhancement;Multiscale feature;Color correction module;attention mechanism   
      134
      |
      17
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 42811398 false
      发布时间:2024-08-16
    • 在光伏组件电致发光缺陷自动识别领域,研究者采用cycleGAN生成太阳电池EL缺陷图像,并通过与DCGAN的对比实验,验证了cycleGAN在图像有效性、相似性和多样性方面的显著优势,为该领域研究提供了有效解决方案。
      HE Xiang,YANG Aijun,LI Jiansheng,CHEN Caiyun,YOU Hongliang
      Vol. 39, Issue 8, Pages: 1057-1069(2024) DOI: 10.37188/CJLCD.2023-0234
      摘要:In order to solve the problems of insufficient training images and poor quality of generated images in the automatic recognition research of electroluminescence (EL) defects in photovoltaic modules, the solar cell EL defect images are generated by using the cycleGAN, and the generated images are compared with the images generated by the representative DCGAN. The captured EL images are classified and performed data augmentation to form a training set. Next, cycleGAN and DCGAN are trained using training set. Finally, a detailed comparison is made between the generated images of the two models from three perspectives: effectiveness, similarity and diversity. The experimental results show that the proportion of effective images generated by cycleGAN is significantly higher than that of images generated by DCGAN. Compared with captured EL images, the images generated by cycleGAN have extremely high sensory similarity, making it difficult to distinguish them through the human eye. The FID indicators of the images generated by cycleGAN are significantly lower than images generated by DCGAN. The classification model trained with images generated by cycleGAN achieves a 93.45% accuracy rate on the test set composed of captured EL images. When a small number of captured EL images are included in the training dataset, the accuracy is improved to 98.26%, significantly higher than that of DCGAN. Finally, the average MS-SSIM indicators of images generated by cycleGAN are significantly lower than that of DCGAN. The use of cycleGAN is an effective method for data augmentation of solar cell EL images, which is significantly superior to DCGAN in terms of effectiveness, similarity and diversity.  
      关键词:Photovoltaics module;Solar cells;electroluminescence;cycleGAN;DCGAN   
      4
      |
      3
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 67293254 false
      发布时间:2024-08-16
    • 最新研究进展:针对眼底视网膜图像去雾问题,专家提出基于暗通道理论结合Gamma变换的算法,有效提高图像清晰度,保留血管细节信息。
      GAI Junshuai,MA Yuting,ZHANG Yunhai,YANG Haomin,LIU Yulong,XIAO Yun,WEI Tongda
      Vol. 39, Issue 8, Pages: 1070-1078(2024) DOI: 10.37188/CJLCD.2023-0289
      摘要:This paper addresses the issue of hazy stray light in fundus retinal images, which leads to unclear blood vessel details. The proposed dehazing algorithm for fundus retinal images is based on the dark channel theory and incorporates Gamma transformation. The algorithm enhances the clarity of the image while preserving blood vessel information. This algorithm aims to defog images by processing the R, G and B channels separately. Firstly, the algorithm calculates the dark channel image using adaptive window minimum filtering and takes the average value of the top 0.1% pixels as the atmospheric illumination intensity value. Secondly, the algorithm solves the rough transmittance of the image and improves it using the guided filtering algorithm. Finally, the algorithm restores the haze-free image using the atmospheric scattering model and applies Gamma transformation. The experimental results show that the information entropy and average gradient of the restored image increase by an average of about 6.8% and 11.6%, respectively. The algorithm in this paper can quickly and effectively remove hazy stray light in the fundus retinal image, restore the image to be clear and natural, and retain the details information of retinal blood vessels.  
      关键词:image dehazing;retinal image;dark channel prior;atmospheric scattering Model;gamma correction   
      42
      |
      33
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 43464365 false
      发布时间:2024-08-16

      Object Tracking and Recognition

    • 在视频目标跟踪领域,研究者提出了一种基于双分支在线优化和特征融合的算法,有效提升了跟踪目标的判别能力,精度和鲁棒性。
      LI Xinpeng,WANG Peng,LI Xiaoyan,SUN Mengyu,CHEN Zuntian,GAO Hui
      Vol. 39, Issue 8, Pages: 1079-1089(2024) DOI: 10.37188/CJLCD.2023-0256
      摘要:In response to the issue of inadequate discrimination capability in the D3S algorithm for tracking target,a video object tracking algorithm based on dual-branch online optimization and feature fusion is proposed. Firstly, a dual-branch online optimization classifier is constructed, which achieves secondary location of the target, resulting in a more accurate target position response map. Secondly, the fusion of the response map and search features is realized on the feature layer, and the encoder module promotes the fusion process, further highlighting the features of tracking target. Finally, by updating the template features with the encoder module, the differences between features are fitted, thereby enhancing the discriminative capability of the segmentation module. Experimental evaluations are conducted on the VOT2018 and UAV123 datasets. Compared with the original algorithm, the improved algorithm improves EAO by 2.9% on the VOT2018 dataset, increases success rate by 2.4% and accuracy by 2.9% on the UAV123 dataset. The experimental results demonstrate that the method in this paper improves the algorithm's discriminative ability and further improves accuracy and robustness.  
      关键词:object tracking;object segmentation;online optimization;feature fusion;Mechanism of attention   
      144
      |
      16
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 40708198 false
      发布时间:2024-08-16
    • OU Zhuolin,LÜ Xiaoqi,GU Yu
      Vol. 39, Issue 8, Pages: 1090-1102(2024) DOI: 10.37188/CJLCD.2023-0278
      摘要:Image registration plays an important role in computer-aided diagnosis of brain diseases and remote surgery. The U-Net and its variants have been widely used in the field of medical image registration, achieving good results in registration accuracy and time. However, existing registration models have difficulty in learning the edge features of small structures in complex image deformations and ignore the correlation of contextual information at different scales. To address these issues, a registration model is proposed based on cross-scale point matching combined with multi-scale feature fusion. Firstly, a cross-scale point matching module is introduced into encoding structure of the model to enhance the representation of prominent region features and grasp the edge details of small structure features. Then, multi-scale features are fused in the decoding structure to form a more comprehensive feature description. Finally, an attention module is integrated into the multi-scale feature fusion module to highlight spatial and channel information. The experimental results on three brain Magnetic Resonance (MR) datasets show that, taking the OASIS-3 dataset as an example, the registration accuracy has been improved by 23.5%, 12.4%, 0.9%, and 2.1% compared to methods such as Affine, SyN, VoxelMorph and CycleMorph, respectively. The corresponding ASD values for each method have decreased by 1.074, 0.434, 0.043, and 0.076. The proposed model can better grasp the feature information of images, which improves registration accuracy and has important implications for the development of medical image registration.  
      关键词:medical image registration;encoder decoder structure;feature weighting;feature matching;attention mechanism   
      2
      |
      3
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 67591386 false
      发布时间:2024-08-16

      Object Detection

    • 在图像篡改定位领域,本文提出了一种基于双流增强编码和注意优化解码的框架,有效提高了定位精度和鲁棒性,为图像篡改检测提供了新的解决方案。
      ZHU Ye,ZHAO Xiaoxiang,YU Yang
      Vol. 39, Issue 8, Pages: 1103-1115(2024) DOI: 10.37188/CJLCD.2023-0280
      摘要:Mainstream image manipulation localization methods usually fuse inconsistent features of different streams through simple operations, resulting in feature redundancy and pixel misdetection of tampered regions. Therefore, we propose a novel network of dual-stream enhancement encoder and attention optimization decoder for image manipulation localization. Firstly, the dual-stream enhancement encoder module can self-reinforce and interact with the extracted dual-stream multi-scale features, and can make full use of a variety of tampered information, so that a variety of tampered information can be complemented by interaction, and more attention is paid to the tampering features. Then, a multi-scale receptive field strategy is introduced to explore multi-scale context information, and an adjacent-level feature aggregation module is designed to fuse multi-scale adjacent features. Finally, the capability of manipulation localization is enhanced with the cooperation of tamper region and genuine region, the attention optimization decoder module is designed to eliminate the wrong prediction of edge pixels in the initial tamper region prediction, and the manipulation localization is refined step by step. Extensive experiments are constructed on four mainstream public datasets, NIST16, Coverage, Columbia and CASIA, and two realistic challenge datasets, IMD20 and Wild, to compare with mainstream manipulation localization methods. Our proposed method has superior performance under six datasets in the settings of none fine-tuning and fine-tuning model, which demonstrates that our proposed method can make full use of various forgery clues to achieve greater localization accuracy and stronger robustness.  
      关键词:Image manipulation localization;dual-stream enhancement encoder;attention optimization decoder;adjacent feature aggregation module   
      81
      |
      18
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 42811729 false
      发布时间:2024-08-16
    • ZHU Siyu,ZHU Lei,WANG Wenwu,YUE Huagang
      Vol. 39, Issue 8, Pages: 1116-1127(2024) DOI: 10.37188/CJLCD.2023-0304
      摘要:Unsupervised learning is the main research direction in the field of industrial product defect detection at present, and it is mainly divided into two types of methods: reconstruction based and feature based methods. The former constructs content-aware mappings to map abnormal regions into normal regions and detect defects through residual images, focusing on overall performance of the images. The latter uses high-level semantic features to achieve positioning exceptions and pay more attention to image detail presentation. According to the advantages and disadvantages of the two methods, a method is proposed based on the fusion of characteristics and reconstruction, which effectively combines advantages of the two methods to complement their shortcomings and realize unified end-to-end learning and reasoning. Areconstructed model is trained firstly, then a normalized flow model is adopted to fully learn the probability distribution of high probability data of input normal samples, and it is integrated with the reconstructed model to effectively improve the accuracy of defect detection and defect positioning of the reconstructed model. On the widely used MVTec AD data set, the average image level AUROC of the proposed fusion model reaches 98.7%, the average pixel-level AUROC reaches 94.2%, in particular, an increase of 3.3% compared to a single reconstruction model. The convergence model of characteristics and proposed reconstruction network has significantly improved the shortcomings of defect positioning in the reconstruction network part, which makes experimental results more accurate.  
      关键词:abnormal detection;rebuild;memory modules;flow model;fusion algorithm   
      1
      |
      2
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 43069195 false
      发布时间:2024-08-16
    • 在视频异常检测领域,研究者提出了一种结合混合卷积和多尺度注意力的算法,并引入结构相似性损失函数优化模型。实验结果显示,该算法在UCSD-Ped2和CUHK Avenue数据集上的AUC指标显著提升,验证了模型的有效性。
      YANG Dawei,LIU Zhiquan,WANG Hongxia
      Vol. 39, Issue 8, Pages: 1128-1137(2024) DOI: 10.37188/CJLCD.2023-0320
      摘要:Unsupervised video anomaly detection model based on U-net style has good detection results, but due to the inherent local nature of ordinary convolutional operations use, the U-Net style encoder can not effectively extract the global contextual information, the use of simple jump connections can not obtain effective feature information, and the use of the L2 loss function only considers the pixel level differences and can not capture the image’s structural features. In this regard, a video anomaly detection algorithm combining hybrid convolution and multi-scale attention is proposed, and a structural similarity loss function (SSIM) optimisation model is added. Specifically, a hybrid convolution module is added to the last layer of the encoder, which mixes spatial and positional features to extract global contextual information. A multiscale attention module is added to the hopping connection between the encoder and the decoder, which enables the model to extract more valuable features for effective hopping connection. The weights of the structural similarity loss function and the L2 loss function are constrained using parameters to optimise the model more accurately. Experimental results show that the proposed algorithm achieves AUC metrics of 96.7% and 86.1% on the UCSD-Ped2 and CUHK Avenue public datasets, which is an improvement of 1.6% and 1.4% compared with the pre-improvement model, proving the effectiveness of the proposed model.  
      关键词:Context information;Residual connection;Mixed convolution;Multi-scale attention;SSIM   
      44
      |
      7
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 43464560 false
      发布时间:2024-08-16
    • 最新研究进展,科学家设计了改进的U-Net网络模型,通过引入注意力机制,显著提升了作物叶片病虫害程度检测的准确性和可靠性,为实现绿色防治提供了有效解决方案。
      LIU Lin,LIN Shanchi,LI Xiangguo,FENG Min,XU Liang
      Vol. 39, Issue 8, Pages: 1138-1144(2024) DOI: 10.37188/CJLCD.2023-0247
      摘要:To meet the demand of green prevention and control of crop diseases and pests for the detection of disease and pest severity, an improved U-Net network model is designed for the detection of crop leaf disease and pest severity. First, the ResNet50 network is selected as the backbone network of the model, and transfer learning is used to improve the training convergence speed and reduce the computational cost. Second, the attention mechanism is introduced to optimize the feature extraction and fusion of each layer of the U-Net network, so as to improve the ability of the network model to receive key information. The experimental results show that the improved U-Net512 network model has the best detection performance, with an average detection accuracy of 90.14% and an average absolute error of 276.3. By analyzing the feature maps of each layer of the model under different sampling depths, it is found that the introduction of attention mechanism enables the network model to obtain and fuse two dimensions of information: the overall feature of the leaf and the disease area feature, which further improves the model detection performance. This method can not only effectively detect the disease and pest severity of crop leaves, but also has high accuracy and reliability, which is conducive to achieving green prevention and control of crop diseases and pests.  
      关键词:Pest detection;U-Net network;Attention Gate;Pest control   
      199
      |
      23
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 40850271 false
      发布时间:2024-08-16
    0