基于多尺度多分支特征的动作识别

张磊; 韩广良

doi:10.37188/CJLCD.2022-0176

您当前的位置：

首页 >

文章列表页 >

基于多尺度多分支特征的动作识别

图像处理 | 更新时间：2022-11-22

- 基于多尺度多分支特征的动作识别
- Action recognition algorithm based on multi-scale and multi-branch features
- 液晶与显示 2022年37卷第12期页码：1614-1625
- 作者机构：
  
  1.中国科学院长春光学精密机械与物理研究所，吉林长春 130033
  2.中国科学院大学，北京 100049
- 作者简介：
  
  [ "张磊（1996—），男，内蒙古呼和浩特人，硕士研究生，2019年于哈尔滨工程大学获得学士学位，主要从事计算机视觉、图像处理方面的研究。E-mail： zhangleiused@163.com" ]
  [ "韩广良（1968—），男，山东嘉祥人，博士，研究员，2003年于中国科学院长春光学精密机械与物理研究所获得博士学位，主要从事图像和视频信息处理、目标识别与跟踪、机器视觉与人工智能等方面的研究。E-mail： hangl＠ciomp.ac.cn" ]
- 基金信息：
  
  吉林省科技厅重点项目(20210201132GX)
- DOI：10.37188/CJLCD.2022-0176
  中图分类号： TP391.4
- 收稿日期：2022-05-25，
  
  修回日期：2022-06-06，
  
  纸质出版日期：2022-12-05
- 稿件说明：
移动端阅览
张磊, 韩广良. 基于多尺度多分支特征的动作识别[J]. 液晶与显示, 2022,37(12):1614-1625.

ZHANG Lei, HAN Guang-liang. Action recognition algorithm based on multi-scale and multi-branch features[J]. Chinese journal of liquid crystals and displays, 2022, 37(12): 1614-1625.
张磊, 韩广良. 基于多尺度多分支特征的动作识别[J]. 液晶与显示, 2022,37(12):1614-1625. DOI： 10.37188/CJLCD.2022-0176.

ZHANG Lei, HAN Guang-liang. Action recognition algorithm based on multi-scale and multi-branch features[J]. Chinese journal of liquid crystals and displays, 2022, 37(12): 1614-1625. DOI： 10.37188/CJLCD.2022-0176.

摘要

针对基于人体骨架序列的动作识别存在的特征提取不充分、不全面及识别准确率不高的问题，本文提出了基于多分支特征和多尺度时空特征的动作识别模型。首先，利用多种算法的结合对原始数据进行了特征增强；其次，将多分支的特征输入形式改进为多分支的融合特征信息并分别输入到网络中，经过一定深度的网络模块后融合在一起；然后，构建多尺度的时空卷积模块作为网络的基本模块，用来提取多尺度的时空特征；最后，构建整体网络模型输出动作类别。实验结果表明，在NTU RGB-D 60数据集的两种划分标准Cross-subject和Cross-view上的识别准确率分别为89.6%和95.1%，在NTU RGB-D 120数据集的两种划分标准Cross-subject和Cross-setup上的识别准确率分别为84.1%和86.0%。与其他算法相对比，本文算法提取到了更为多样化、多尺度的动作特征，动作类别的识别准确率有一定的提升。

Abstract

Aiming at the problems of insufficient feature extraction， incompleteness and low recognition accuracy in action recognition based on human skeleton sequence， a action recognition model based on multi-branch feature and multi-scale spatio-temporal feature is proposed in this paper. Firstly， the original data are enhanced by the combination of various algorithms. Secondly， the multi-branch feature input form is improved to multi-branch fusion feature information， which is input into the network， respectively. After a certain depth of network modules， it is fused together. Then， a multi-scale spatio-temporal convolution module is constructed as the basic module of the network to extract multi-scale spatio-temporal features. Finally， the overall network model is constructed to output action categories. The experimental results show that the recognition accuracy on Cross-subject and Cross-view of NTU RGB-D 60 data set is 89.6% and 95.1%， and the recognition accuracy on Cross-subject and Cross-setup of NTU RGB-D 120 data set is 84.1% and 86.0%， respectively. Compared with other algorithms，the more diversified and multi-scale action features are extracted， and the recognition accuracy of action categories is improved to a certain extent.

关键词

Keywords

references

DALAL N ， TRIGGS B . Histograms of oriented gradients for human detection ［C］// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . San Diego ： IEEE ， 2005 ： 886 - 893 . doi: 10.1109/cvpr.2005.4 http://dx.doi.org/10.1109/cvpr.2005.4

DALAL N ， TRIGGS B ， SCHMID C . Human detection using oriented histograms of flow and appearance ［C］// 9th European Conference on Computer Vision . Graz ： Springer ， 2006 ： 428 - 441 . doi: 10.1007/11744047_33 http://dx.doi.org/10.1007/11744047_33

WANG H ， KLÄSER A ， SCHMID C ， et al . Action recognition by dense trajectories ［C］// Conference on Computer Vision and Pattern Recognition . Colorado Springs ： IEEE ， 2011 ： 3169 - 3176 . doi: 10.1109/cvpr.2011.5995407 http://dx.doi.org/10.1109/cvpr.2011.5995407

WANG H ， SCHMID C . Action recognition with improved trajectories ［C］// Proceedings of the IEEE International Conference on Computer Vision . Sydney ： IEEE ， 2013 ： 3551 - 3558 . doi: 10.1109/iccv.2013.441 http://dx.doi.org/10.1109/iccv.2013.441

钱慧芳，易剑平，付云虎 . 基于深度学习的人体动作识别综述［J］. 计算机科学与探索， 2021 ， 15 （ 3 ）： 438 - 455 . doi: 10.3778/j.issn.1673-9418.2009095 http://dx.doi.org/10.3778/j.issn.1673-9418.2009095

QIAN H F ， YI J P ， FU Y H . Review of human action recognition based on deep learning ［J］. Journal of Frontiers of Computer Science and Technology ， 2021 ， 15 （ 3 ）： 438 - 455 . （in Chinese） . doi: 10.3778/j.issn.1673-9418.2009095 http://dx.doi.org/10.3778/j.issn.1673-9418.2009095

DU Y ， FU Y ， WANG L . Skeleton based action recognition with convolutional neural network ［C］// 2015 3rd IAPR Asian Conference on Pattern Recognition . Kuala Lumpur ： IEEE ， 2015 ： 579 - 583 . doi: 10.1109/acpr.2015.7486569 http://dx.doi.org/10.1109/acpr.2015.7486569

WANG P C ， LI W Q ， LI C K ， et al . Action recognition based on joint trajectory maps with convolutional neural networks ［J］. Knowledge-based Systems ， 2018 ， 158 ： 43 - 53 . doi: 10.1016/j.knosys.2018.05.029 http://dx.doi.org/10.1016/j.knosys.2018.05.029

LI C ， ZHONG Q Y ， XIE D ， et al . Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation ［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence . Stockholm ： ACM ， 2018 ， 786 - 792 . doi: 10.24963/ijcai.2018/109 http://dx.doi.org/10.24963/ijcai.2018/109

李庆辉，李艾华，郑勇，等 . 利用几何特征和时序注意递归网络的动作识别［J］. 光学精密工程， 2018 ， 26 （ 10 ）： 2584 - 2591 . doi: 10.3788/ope.20182610.2584 http://dx.doi.org/10.3788/ope.20182610.2584

LI Q H ， LI A H ， ZHENG Y ， et al . Action recognition using geometric features and recurrent temporal attention network ［J］. Optics and Precision Engineering ， 2018 ， 26 （ 10 ）： 2584 - 2591 . （in Chinese） . doi: 10.3788/ope.20182610.2584 http://dx.doi.org/10.3788/ope.20182610.2584

李颀，邓耀辉，王娇 . 基于轻量级图卷积网络的校园暴力行为识别［J］. 液晶与显示， 2022 ， 37 （ 4 ）： 530 - 538 . doi: 10.37188/CJLCD.2021-0229 http://dx.doi.org/10.37188/CJLCD.2021-0229

LI Q ， DENG Y H ， WANG J . Campus violence action recognition based on lightweight graph convolution network ［J］. Chinese Journal of Liquid Crystals and Displays ， 2022 ， 37 （ 4 ）： 530 - 538 . （in Chinese） . doi: 10.37188/CJLCD.2021-0229 http://dx.doi.org/10.37188/CJLCD.2021-0229

SHAHROUDY A ， LIU J ， NG T T ， et al . NTU RGB+D： a large scale dataset for 3D human activity analysis ［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas ： IEEE ， 2016 ： 1010 - 1019 . doi: 10.1109/cvpr.2016.115 http://dx.doi.org/10.1109/cvpr.2016.115

YAN S J ， XIONG Y J ， LIN D H . Spatial temporal graph convolutional networks for skeleton-based action recognition ［C］// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence . New Orleans ： ACM ， 2018 ： 7444 - 7452 . doi: 10.1609/aaai.v32i1.12328 http://dx.doi.org/10.1609/aaai.v32i1.12328

LI M S ， CHEN S H ， CHEN X ， et al . Actional-structural graph convolutional networks for skeleton-based action recognition ［C］ // IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach ： IEEE ， 2019 ： 3590 - 3598 . doi: 10.1109/cvpr.2019.00371 http://dx.doi.org/10.1109/cvpr.2019.00371

YANG Z Y ， LI Y C ， YANG J C ， et al . Action recognition with spatio-temporal visual attention on skeleton image sequences ［J］. IEEE Transactions on Circuits and Systems for Video Technology ， 2019 ， 29 （ 8 ）： 2405 - 2415 . doi: 10.1109/tcsvt.2018.2864148 http://dx.doi.org/10.1109/tcsvt.2018.2864148

SONG Y ， ZHANG Z ， SHAN C ， et al . Stronger， faster and more explainable： a graph convolutional baseline for skeleton-based action recognition ［C］// Proceedings of the 28th ACM International Conference on Multimedia . Seattle ： ACM ， 2020 ： 1625 - 1633 . doi: 10.1145/3394171.3413802 http://dx.doi.org/10.1145/3394171.3413802

吴海滨，魏喜盈，刘美红，等 . 结合空洞卷积和迁移学习改进YOLOv4的X光安检危险品检测［J］. 中国光学， 2021 ， 14 （ 6 ）： 1417 - 1425 . doi: 10.37188/CO.2021-0078 http://dx.doi.org/10.37188/CO.2021-0078

WU H B ， WEI X Y ， LIU M H ， et al . Improved YOLOv4 for dangerous goods detection in X-ray inspection combined with Atrous convolution and transfer learning ［J］. Chinese Optics ， 2021 ， 14 （ 6 ）： 1417 - 1425 . （in Chinese） . doi: 10.37188/CO.2021-0078 http://dx.doi.org/10.37188/CO.2021-0078

KIM T S ， REITER A . Interpretable 3D human action analysis with temporal convolutional networks ［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops （CVPRW） . Honolulu ： IEEE ， 2017 ： 1623 - 1631 . doi: 10.1109/cvprw.2017.207 http://dx.doi.org/10.1109/cvprw.2017.207

LI S ， LI W Q ， COOK C ， et al . Independently recurrent neural network （IndRNN）： building a longer and deeper RNN ［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City ： IEEE ， 2018 ： 5457 - 5466 . doi: 10.1109/cvpr.2018.00572 http://dx.doi.org/10.1109/cvpr.2018.00572

LIANG D H ， FAN G L ， LIN G F ， et al . Three-stream convolutional neural network with multi-task and ensemble learning for 3D action recognition ［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW） . Long Beach ： IEEE ， 2019 ： 934 - 940 . doi: 10.1109/cvprw.2019.00123 http://dx.doi.org/10.1109/cvprw.2019.00123

ZHANG P F ， LAN C L ， ZENG W J ， et al . Semantics-guided neural networks for efficient skeleton-based human action recognition ［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle ： IEEE ， 2020 ： 1109 - 1118 . doi: 10.1109/cvpr42600.2020.00119 http://dx.doi.org/10.1109/cvpr42600.2020.00119

LI S J ， YI J H ， FARHA Y A ， et al . Pose refinement graph convolutional network for skeleton-based action recognition ［J］. IEEE Robotics and Automation Letters ， 2021 ， 6 （ 2 ）： 1028 - 1035 . doi: 10.1109/lra.2021.3056361 http://dx.doi.org/10.1109/lra.2021.3056361

YOON Y ， YU J ， JEON M . Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition ［J］. Applied Intelligence ， 2022 ， 52 （ 3 ）： 2317 - 2331 . doi: 10.1007/s10489-021-02487-z http://dx.doi.org/10.1007/s10489-021-02487-z

CAETANO C ， BRÉMOND F ， SCHWARTZ W R . Skeleton image representation for 3D action recognition based on tree structure and reference joints ［C］// 2019 32nd SIBGRAPI Conference on Graphics， Patterns and Images . Rio De Janeiro ： IEEE ， 2019 ： 16 - 23 . doi: 10.1109/sibgrapi.2019.00011 http://dx.doi.org/10.1109/sibgrapi.2019.00011

SONG Y F ， ZHANG Z ， WANG L . Richly activated graph convolutional network for action recognition with incomplete skeletons ［C］// 2019 IEEE International Conference on Image Processing （ICIP） . Taipei， China ： IEEE ， 2019 ： 1 - 5 . doi: 10.1109/icip.2019.8802917 http://dx.doi.org/10.1109/icip.2019.8802917

MEMMESHEIMER R ， THEISEN N ， PAULUS D . Gimme signals： discriminative signal encoding for multimodal activity recognition ［C］// 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Las Vegas ： IEEE ， 2020 ： 10394 - 10401 . doi: 10.1109/iros45743.2020.9341699 http://dx.doi.org/10.1109/iros45743.2020.9341699

浏览量

278

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于双分支在线优化和特征融合的视频目标跟踪算法

基于多分支空谱特征增强的高光谱图像分类

基于边缘增强和特征融合的伪装目标分割

基于Double-Head的雾天图像目标检测