最新刊期

    39 2 2024

      Image Segmentation

    • LIU Lamei,DU Baochang,HUANG Huiling,ZHANG Yongjian,HAN Jun
      Vol. 39, Issue 2, Pages: 121-130(2024) DOI: 10.37188/CJLCD.2023-0036
      摘要:To tackle the challenges posed by the cumbersome computation and intricate decoding structure of codec semantic segmentation networks, we present a novel decoder-free binary semantic segmentation model DFNet. By discarding the complex decoding structure and jump connections that are ubiquitous in conventional segmentation networks, our model adopts a convolutional remolding upsampling method to directly reshape feature coding and obtain precise segmentation results, significantly streamlining the network architecture. Moreover, our encoder integrates a lightweight dual attention mechanism EC&SA to facilitate the effective communication of channel and spatial information, bolstering the network’s coding capability. To further enhance the model’‍s segmentation accuracy, we replace the traditional segmentation loss with PolyCE loss, a powerful tool that resolves the issue of positive and negative sample imbalance. Experimental results on binary segmentation datasets such as DeepGlobe road segmentation and Crack Forest defect detection show that the segmentation accuracy F1 mean and IoU mean of this model reach 84.69% and 73.95%, respectively, and the segmentation speed is as high as 94 FPS, which far exceeds the mainstream semantic segmentation model and greatly improves the efficiency of the segmentation task.  
      关键词:Binary Segmentation;Convolution Remolding Upsampling;EC&SA;PolyCE;Road Segmentation;defect detection   
      233
      |
      11
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36411265 false
      发布时间:2024-02-27
    • LÜ Huanhuan,HUANG Yucheng,ZHANG Hui,WANG Yali
      Vol. 39, Issue 2, Pages: 131-145(2024) DOI: 10.37188/CJLCD.2023-0054
      摘要:Making full use of the spatial spectral features contained in hyperspectral images, a hyperspectral image feature extraction algorithm (S4LFDA) for semi-supervised spatial spectral local discriminant analysis is proposed. In view of the spatial consistency of hyperspectral datasets, the pixels are first spatially reconstructed to preserve the neighbor relationship of hyperspectral data, and the spectral information divergence is introduced to reconstruct the similarity between cells. In order to make full use of a large number of unlabeled samples to improve the performance of the algorithm, the fuzzy C-means clustering algorithm is used to cluster the samples to obtain pseudo-labels. Then, the normalization term is added to the intra-class divergence matrix and interclass divergence matrix of the local FDA algorithm to maintain the consistency of the cluster structure of the unlabeled samples. Finally, the local FDA algorithm is used to maximize the interclass divergence and minimize the intra-class divergence of the labeled samples and solve the best projection vector. The S4LFDA algorithm not only maintains the divisibility of the data set in the spectral domain, but also maintains the neighbor relationship of the pixels in the spatial region, rationally uses labeled samples and unlabeled samples, and improves the classification performance of the algorithm. Experiments are carried out in Pavia University and Indian Pines, and the overall classification accuracy reaches to 95.60% and 94.38%, which effectively improves the performance of feature classification compared with other dimensional reduction algorithms.  
      关键词:Hyperspectral image;Semi-supervision;Spatial spectrum;Discriminant analysis;feature extraction;Feature classification   
      223
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36013648 false
      发布时间:2024-02-27
    • ZHANG Huanhuan,WANG Huiqin,WANG Ke,WANG Zhan,ZHEN Gang,HE Zhang
      Vol. 39, Issue 2, Pages: 146-156(2024) DOI: 10.37188/CJLCD.2023-0110
      摘要:Extracting the line drawings of ancient frescoes using existing edge detection methods suffers from high noise interference and more information loss. In this paper, we propose a fusion pixel difference convolution method to extract the optimal band of mural lines. The minimum noise separation method is used to separate the effective information and noise from the multispectral data of the mural, and the optimal principal component band is selected for the extraction of the line art. For the problem of traditional convolution to extract image gradient information, pixel difference convolution is introduced to improve the image gradient information for edge detection. A scale enhancement module (SEM) is added to the side output network to enrich the multiscale features. Meanwhile, for the pixel misclassification issues caused by pixel level imbalance, Dice loss function strategy based on image similarity is designed to minimize the pixel distance step by step to obtain clear image edges, and the mural dataset prior knowledge fine-tuning model is used to solve the problem of insufficient dataset. The experimental results show that the method in this paper can extract clearer line drawings in scenes with faded and noisy murals, and the SSIM and RMSE of the line drawing images are better than other algorithms, improving 2%~10% and 2%~4%, respectively, compared with PiDiNet. The model is validated on the public dataset BIPED, and the ODS and OIS of the proposed method are improved compared with PiDiNet by 0.005 and 0.007, respectively. The method can extract clear and complete line images for faded and diseased murals.  
      关键词:Sketch extraction;spectral imaging;Pixel difference convolution;Pixel-level balancing;Mural   
      221
      |
      6
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36411361 false
      发布时间:2024-02-27
    • PENG Yanfei,WANG Jing,LIU Xiaoxuan,GONG Shengjie
      Vol. 39, Issue 2, Pages: 157-167(2024) DOI: 10.37188/CJLCD.2023-0104
      摘要:Aiming at the problem of poor visual effect and slow processing speed of existing image retargeting methods, a content-aware image retargeting method based on principal component analysis and blocking is proposed. First, the principal component analysis method is used to fuse the gradient map and the saliency map to extract more abundant image features to avoid the distortion of the main information. Then, the adjacent seams are replaced by the mean value to avoid pixel incoherence. Finally, according to the size of the column energy value in the energy map, the image is divided into salient regions and non-salient regions, and the blocks are scaled in parallel to pay more attention to image features and improve operating efficiency. The experimental analysis is carried out on the MIT RetargetMe, DUT-OMRON and NJU2000 datasets, and the subjective perception, the objective factor running time and SIFT-flow are used as evaluation indicators to compare with several commonly used algorithms. The experimental results show that the method ensures the integrity of the image subject information, and the average running time is 1/3 of the seam carving algorithm. The proposed method not only has better visual effect, but also can reduce the computational complexity.  
      关键词:principal component analysis;energy map;blocking;seams;scaling   
      236
      |
      3
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36013800 false
      发布时间:2024-02-27

      Image Enhancement

    • ZHAO Zewei,CHE Jin,LÜ Wenhan
      Vol. 39, Issue 2, Pages: 168-179(2024) DOI: 10.37188/CJLCD.2023-0076
      摘要:In order to solve the problem that the text encoder cannot dig the text information deeply in the task of text image generation, which leads to the semantic inconsistency of the subsequent generated images, a text image generation method is proposed based on improved DMGAN model. Firstly, XLnet’s pre-training model is used to encode the text. This model can capture a large number of prior knowledge of the text under the pre-training of large-scale corpus, and realize the deep mining of context information. Then, the channel attention module is added to initial stage of image generation by DMGAN model and the image refinement stage to highlight important feature channels, and further improve the semantic consistency and spatial layout rationality of the generated images, as well as the convergence speed and stability of the model. Experimental results show that in comparison with original DMGAN model, the image on CUB dataset generated by the proposed model has a 0.47 increase in the IS index and a 2.78 decrease in the FID in dex, which fully indicates that the model has better cross-mode generation ability.  
      关键词:text-to-image;XLnet model;Generate adversarial networks;attention of channel   
      228
      |
      1
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36015293 false
      发布时间:2024-02-27
    • LIAO Yonghui,ZHANG Haitao,JIN Haibo
      Vol. 39, Issue 2, Pages: 180-191(2024) DOI: 10.37188/CJLCD.2023-0107
      摘要:Current hierarchical text-to-image generation methods only use up-sampling for feature extraction during the initial image generation stage, but up-sampling process is essentially convolutional operations, and the limitations of convolutional operations can cause global information to be ignored and remote semantics to be unable to interact. Although there have been methods to add self-attention mechanisms to models, there are still problems such as lack of image details, image structural errors, and so on. In response to the above existing problems, a generation countermeasure network model SAF-GAN based on self-supervised attention and image feature fusion is proposed. A self-supervised module based on ContNet is added to the initial feature generation stage, and attention mechanism is used for autonomous mapping learning between image features. The dynamic attention matrix is guided by the context relationship of features, achieving a high combination of context mining and self-attention learning, which improves the feature generation effect of low resolution images, and subsequently refines and generates high-resolution images through alternating training of networks at different stages. At the same time, the feature fusion enhancement module is added. By fusing low resolution features of previous stage of the model with features of the current stage, the generation network can make full use of the high semantic information of low level features and high resolution information of the high level features. The semantic consistency of feature maps with different resolutions is further guaranteed, so as to achieve the high-resolution realistic image generation.Experimental results show that in comparison with benchmark model (AttnGAN), the IS score of the SAF-GAN model is increased by 0.31 and the FID index is decreased by 3.45 on the CUB dataset, while the IS score of the SAF-GAN model is increased by 2.68 and the FID index is decreased by 5.18 on the COCO dataset. It is concluded that the proposed model can effectively generate more realistic images, which proves the effectiveness of the proposed method.  
      关键词:computer vision;generative adversarial networks;text-to-image;CotNet;Image feature fusion   
      253
      |
      15
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36012686 false
      发布时间:2024-02-27

      Object Tracking and Recognition

    • HE Zemin,ZENG Juntao,YUAN Baoxi,LIANG Dejian,MIAO Zongcheng
      Vol. 39, Issue 2, Pages: 192-204(2024) DOI: 10.37188/CJLCD.2023-0113
      摘要:In the field of computer vision, twin network-based tracking algorithms improve accuracy and speed in comparison with traditional algorithms, but they are still affected by target occlusion, deformation, and environmental changes, which leads to the performance degradation of twin network-based tracking algorithms. In order to gain an in-depth understanding of the single target tracking algorithm based on twin networks, the existing target tracking algorithms based on twin networks are summarized and analyzed, mainly including the introduction of attention mechanism method, hyper-parameter inference method and template update method in twin networks, which reviews target tracking algorithms of these three methods and introduces in detail the research and development status of algorithms based on twin networks at home and abroad in recent years. The representative algorithms of the three aspects are experimentally compared using VOT2016, VOT2017, VOT2018 and OTB-2015 datasets to obtain the performance of multiple twin network-based target tracking algorithms. Finally, the twin network-based target tracking algorithms are summarized and the future development direction is prospected.  
      关键词:computer vision;target tracking;siamese networks;deep learning   
      243
      |
      4
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36209241 false
      发布时间:2024-02-27
    • YANG Yun,WANG Jing,JIANG Jiale
      Vol. 39, Issue 2, Pages: 205-216(2024) DOI: 10.37188/CJLCD.2023-0123
      摘要:A deep learning based license plate number recognition method is proposed to address the issues of low accuracy and missed detection in license plate recognition under haze weather. Firstly, the AOD-Net algorithm is used to pre-process the vehicle image for defogging. Then, a license plate detection network ACG_YOLOv5s is designed based on YOLOv5 network. ACG_YOLOv5s integrates CBAM attention mechanism on the basis of YOLOv5s network to improve the model’‍s anti-interference ability. An adaptive feature fusion network (ASFF) is introduced, which assigns weights to different feature layers of the network based on the weights adaptively learned by the model, thereby highlighting important feature information. The traditional convolution is replaced with Ghost convolution module and the number of parameters during network training is reduced while ensuring model performance. Finally, LPRNet is used to recognize the detected license plate images. The experimental results indicate that the improved ACG_YOLOv5s network has a license plate detection accuracy of 99.6%, LPRNet recognition accuracy of 96%, and a small memory footprint. The combination of AOD-Net algorithm and YOLO algorithm can more effectively detect license plate numbers in license plate images under haze weather.  
      关键词:License plate number recognition;AOD-Net algorithm;YOLOv5 network;attention mechanism   
      258
      |
      7
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36411415 false
      发布时间:2024-02-27
    • FU Huichen,GAO Junwei,CHE Luyang
      Vol. 39, Issue 2, Pages: 217-227(2024) DOI: 10.37188/CJLCD.2023-0127
      摘要:Human pose estimation and motion recognition have important application value in the fields of security, medical treatment and sports. In order to solve the problem of human pose estimation and motion recognition of various movements under complex background, an improved YOLOv7-POSE algorithm is proposed, and data sets of various shooting angles are made by oneself for training. Based on YOLOv7, this algorithm adds classification function to original network model. CA convolutional attention mechanism is introduced into Backbone network, which improves recognition ability of important features in the classification of human bone nodes and actions. The CBS convolution kernel of original model is replaced by HorNet network structure, which improves detection accuracy of human key points and accuracy of action classification. The pyramidal structure of the Head layer is replaced by pyramidal structure of empty space, which improves the precision and speeds up model convergence. The regression function of target detection box is replaced by CIOU with EIOU, which improves the precision of coordinate regression. The data sets of bodybuilding movements under complex background and various shooting angles are made by self-shooting, and the comparison experiment is carried out on the self-made data set. Experimental results show that mAP of the improved Yolov7-POSE on the test set is 95.7%, 4% higher than that of original YOLOv7 algorithm. The recognition accuracy of all kinds of movements increases significantly, and the detection of key point errors and omissions decreases significantly.  
      关键词:image processing;key point detection;pose estimation;convolutional attention mechanism;Atrous spatial pyramid pooling   
      228
      |
      7
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36411152 false
      发布时间:2024-02-27
    • WANG Yin,JIANG Zheng,LIU Bin
      Vol. 39, Issue 2, Pages: 228-236(2024) DOI: 10.37188/CJLCD.2023-0085
      摘要:Aiming at the problems of complex traditional SIFT matching algorithm, many feature redundancy points, and difficulty in meeting real-time performance, this paper proposes a SIFT fast image matching algorithm with local adaptive threshold. Based on the SIFT algorithm, the proposed method optimizes the construction of Gaussian pyramids, eliminates redundant feature points by reducing the number of pyramid layers to improve the detection efficiency. The threshold in the FAST algorithm is extracted according to the local contrast of the image, so as to achieve high-quality feature point detection. The feature points with strong robustness are screened out for more accurate matching. Secondly, a Gaussian circular window is used to establish a 32-dimensional dimensionality reduction feature vector to improve the operation efficiency of the algorithm. Finally, the feature points are purified according to the geometric consistency between the matching feature point pairs, which effectively reduces the false matching. The experimental results show that the comprehensive performance of the proposed method in terms of matching accuracy and computational efficiency is better than that of SIFT algorithm and other comparative matching algorithms, and the matching accuracy is improved by about 10% and the algorithm execution time is shortened by about 49% compared with the traditional SIFT algorithm. The correct matching rate is above 93% in the case of image scale, rotation and lighting change.  
      关键词:SIFT algorithm;Gaussian pyramid;Adaptive thresholds;Feature descriptor;image matching   
      242
      |
      5
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 36015248 false
      发布时间:2024-02-27
    0