浏览全部资源
扫码关注微信
1.中国科学院 长春光学精密机械与物理研究所, 吉林 长春 130033
2.中国科学院大学, 北京 100049
[ "张 磊(1996—),男,内蒙古呼和浩特人,硕士研究生,2019年于哈尔滨工程大学获得学士学位,主要从事计算机视觉、图像处理方面的研究。E-mail: zhangleiused@163.com" ]
[ "韩广良(1968—),男,山东嘉祥人,博士,研究员,2003年于中国科学院长春光学精密机械与物理研究所获得博士学位,主要从事图像和视频信息处理、目标识别与跟踪、机器视觉与人工智能等方面的研究。E-mail: hangl@ciomp.ac.cn" ]
收稿日期:2022-05-25,
修回日期:2022-06-06,
纸质出版日期:2022-12-05
移动端阅览
张磊, 韩广良. 基于多尺度多分支特征的动作识别[J]. 液晶与显示, 2022,37(12):1614-1625.
ZHANG Lei, HAN Guang-liang. Action recognition algorithm based on multi-scale and multi-branch features[J]. Chinese journal of liquid crystals and displays, 2022, 37(12): 1614-1625.
张磊, 韩广良. 基于多尺度多分支特征的动作识别[J]. 液晶与显示, 2022,37(12):1614-1625. DOI: 10.37188/CJLCD.2022-0176.
ZHANG Lei, HAN Guang-liang. Action recognition algorithm based on multi-scale and multi-branch features[J]. Chinese journal of liquid crystals and displays, 2022, 37(12): 1614-1625. DOI: 10.37188/CJLCD.2022-0176.
针对基于人体骨架序列的动作识别存在的特征提取不充分、不全面及识别准确率不高的问题,本文提出了基于多分支特征和多尺度时空特征的动作识别模型。首先,利用多种算法的结合对原始数据进行了特征增强;其次,将多分支的特征输入形式改进为多分支的融合特征信息并分别输入到网络中,经过一定深度的网络模块后融合在一起;然后,构建多尺度的时空卷积模块作为网络的基本模块,用来提取多尺度的时空特征;最后,构建整体网络模型输出动作类别。实验结果表明,在NTU RGB-D 60数据集的两种划分标准Cross-subject和Cross-view上的识别准确率分别为89.6%和95.1%,在NTU RGB-D 120数据集的两种划分标准Cross-subject和Cross-setup上的识别准确率分别为84.1%和86.0%。与其他算法相对比,本文算法提取到了更为多样化、多尺度的动作特征,动作类别的识别准确率有一定的提升。
Aiming at the problems of insufficient feature extraction, incompleteness and low recognition accuracy in action recognition based on human skeleton sequence, a action recognition model based on multi-branch feature and multi-scale spatio-temporal feature is proposed in this paper. Firstly, the original data are enhanced by the combination of various algorithms. Secondly, the multi-branch feature input form is improved to multi-branch fusion feature information, which is input into the network, respectively. After a certain depth of network modules, it is fused together. Then, a multi-scale spatio-temporal convolution module is constructed as the basic module of the network to extract multi-scale spatio-temporal features. Finally, the overall network model is constructed to output action categories. The experimental results show that the recognition accuracy on Cross-subject and Cross-view of NTU RGB-D 60 data set is 89.6% and 95.1%, and the recognition accuracy on Cross-subject and Cross-setup of NTU RGB-D 120 data set is 84.1% and 86.0%, respectively. Compared with other algorithms,the more diversified and multi-scale action features are extracted, and the recognition accuracy of action categories is improved to a certain extent.
DALAL N , TRIGGS B . Histograms of oriented gradients for human detection [C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . San Diego : IEEE , 2005 : 886 - 893 . doi: 10.1109/cvpr.2005.4 http://dx.doi.org/10.1109/cvpr.2005.4
DALAL N , TRIGGS B , SCHMID C . Human detection using oriented histograms of flow and appearance [C]// 9th European Conference on Computer Vision . Graz : Springer , 2006 : 428 - 441 . doi: 10.1007/11744047_33 http://dx.doi.org/10.1007/11744047_33
WANG H , KLÄSER A , SCHMID C , et al . Action recognition by dense trajectories [C]// Conference on Computer Vision and Pattern Recognition . Colorado Springs : IEEE , 2011 : 3169 - 3176 . doi: 10.1109/cvpr.2011.5995407 http://dx.doi.org/10.1109/cvpr.2011.5995407
WANG H , SCHMID C . Action recognition with improved trajectories [C]// Proceedings of the IEEE International Conference on Computer Vision . Sydney : IEEE , 2013 : 3551 - 3558 . doi: 10.1109/iccv.2013.441 http://dx.doi.org/10.1109/iccv.2013.441
钱慧芳 , 易剑平 , 付云虎 . 基于深度学习的人体动作识别综述 [J]. 计算机科学与探索 , 2021 , 15 ( 3 ): 438 - 455 . doi: 10.3778/j.issn.1673-9418.2009095 http://dx.doi.org/10.3778/j.issn.1673-9418.2009095
QIAN H F , YI J P , FU Y H . Review of human action recognition based on deep learning [J]. Journal of Frontiers of Computer Science and Technology , 2021 , 15 ( 3 ): 438 - 455 . (in Chinese) . doi: 10.3778/j.issn.1673-9418.2009095 http://dx.doi.org/10.3778/j.issn.1673-9418.2009095
DU Y , FU Y , WANG L . Skeleton based action recognition with convolutional neural network [C]// 2015 3rd IAPR Asian Conference on Pattern Recognition . Kuala Lumpur : IEEE , 2015 : 579 - 583 . doi: 10.1109/acpr.2015.7486569 http://dx.doi.org/10.1109/acpr.2015.7486569
WANG P C , LI W Q , LI C K , et al . Action recognition based on joint trajectory maps with convolutional neural networks [J]. Knowledge-based Systems , 2018 , 158 : 43 - 53 . doi: 10.1016/j.knosys.2018.05.029 http://dx.doi.org/10.1016/j.knosys.2018.05.029
LI C , ZHONG Q Y , XIE D , et al . Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence . Stockholm : ACM , 2018 , 786 - 792 . doi: 10.24963/ijcai.2018/109 http://dx.doi.org/10.24963/ijcai.2018/109
李庆辉 , 李艾华 , 郑勇 , 等 . 利用几何特征和时序注意递归网络的动作识别 [J]. 光学 精密工程 , 2018 , 26 ( 10 ): 2584 - 2591 . doi: 10.3788/ope.20182610.2584 http://dx.doi.org/10.3788/ope.20182610.2584
LI Q H , LI A H , ZHENG Y , et al . Action recognition using geometric features and recurrent temporal attention network [J]. Optics and Precision Engineering , 2018 , 26 ( 10 ): 2584 - 2591 . (in Chinese) . doi: 10.3788/ope.20182610.2584 http://dx.doi.org/10.3788/ope.20182610.2584
李颀 , 邓耀辉 , 王娇 . 基于轻量级图卷积网络的校园暴力行为识别 [J]. 液晶与显示 , 2022 , 37 ( 4 ): 530 - 538 . doi: 10.37188/CJLCD.2021-0229 http://dx.doi.org/10.37188/CJLCD.2021-0229
LI Q , DENG Y H , WANG J . Campus violence action recognition based on lightweight graph convolution network [J]. Chinese Journal of Liquid Crystals and Displays , 2022 , 37 ( 4 ): 530 - 538 . (in Chinese) . doi: 10.37188/CJLCD.2021-0229 http://dx.doi.org/10.37188/CJLCD.2021-0229
SHAHROUDY A , LIU J , NG T T , et al . NTU RGB+D: a large scale dataset for 3D human activity analysis [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 1010 - 1019 . doi: 10.1109/cvpr.2016.115 http://dx.doi.org/10.1109/cvpr.2016.115
YAN S J , XIONG Y J , LIN D H . Spatial temporal graph convolutional networks for skeleton-based action recognition [C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence . New Orleans : ACM , 2018 : 7444 - 7452 . doi: 10.1609/aaai.v32i1.12328 http://dx.doi.org/10.1609/aaai.v32i1.12328
LI M S , CHEN S H , CHEN X , et al . Actional-structural graph convolutional networks for skeleton-based action recognition [C] // IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach : IEEE , 2019 : 3590 - 3598 . doi: 10.1109/cvpr.2019.00371 http://dx.doi.org/10.1109/cvpr.2019.00371
YANG Z Y , LI Y C , YANG J C , et al . Action recognition with spatio-temporal visual attention on skeleton image sequences [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2019 , 29 ( 8 ): 2405 - 2415 . doi: 10.1109/tcsvt.2018.2864148 http://dx.doi.org/10.1109/tcsvt.2018.2864148
SONG Y , ZHANG Z , SHAN C , et al . Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition [C]// Proceedings of the 28th ACM International Conference on Multimedia . Seattle : ACM , 2020 : 1625 - 1633 . doi: 10.1145/3394171.3413802 http://dx.doi.org/10.1145/3394171.3413802
吴海滨 , 魏喜盈 , 刘美红 , 等 . 结合空洞卷积和迁移学习改进YOLOv4的X光安检危险品检测 [J]. 中国光学 , 2021 , 14 ( 6 ): 1417 - 1425 . doi: 10.37188/CO.2021-0078 http://dx.doi.org/10.37188/CO.2021-0078
WU H B , WEI X Y , LIU M H , et al . Improved YOLOv4 for dangerous goods detection in X-ray inspection combined with Atrous convolution and transfer learning [J]. Chinese Optics , 2021 , 14 ( 6 ): 1417 - 1425 . (in Chinese) . doi: 10.37188/CO.2021-0078 http://dx.doi.org/10.37188/CO.2021-0078
KIM T S , REITER A . Interpretable 3D human action analysis with temporal convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Honolulu : IEEE , 2017 : 1623 - 1631 . doi: 10.1109/cvprw.2017.207 http://dx.doi.org/10.1109/cvprw.2017.207
LI S , LI W Q , COOK C , et al . Independently recurrent neural network (IndRNN): building a longer and deeper RNN [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City : IEEE , 2018 : 5457 - 5466 . doi: 10.1109/cvpr.2018.00572 http://dx.doi.org/10.1109/cvpr.2018.00572
LIANG D H , FAN G L , LIN G F , et al . Three-stream convolutional neural network with multi-task and ensemble learning for 3D action recognition [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Long Beach : IEEE , 2019 : 934 - 940 . doi: 10.1109/cvprw.2019.00123 http://dx.doi.org/10.1109/cvprw.2019.00123
ZHANG P F , LAN C L , ZENG W J , et al . Semantics-guided neural networks for efficient skeleton-based human action recognition [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Seattle : IEEE , 2020 : 1109 - 1118 . doi: 10.1109/cvpr42600.2020.00119 http://dx.doi.org/10.1109/cvpr42600.2020.00119
LI S J , YI J H , FARHA Y A , et al . Pose refinement graph convolutional network for skeleton-based action recognition [J]. IEEE Robotics and Automation Letters , 2021 , 6 ( 2 ): 1028 - 1035 . doi: 10.1109/lra.2021.3056361 http://dx.doi.org/10.1109/lra.2021.3056361
YOON Y , YU J , JEON M . Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition [J]. Applied Intelligence , 2022 , 52 ( 3 ): 2317 - 2331 . doi: 10.1007/s10489-021-02487-z http://dx.doi.org/10.1007/s10489-021-02487-z
CAETANO C , BRÉMOND F , SCHWARTZ W R . Skeleton image representation for 3D action recognition based on tree structure and reference joints [C]// 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images . Rio De Janeiro : IEEE , 2019 : 16 - 23 . doi: 10.1109/sibgrapi.2019.00011 http://dx.doi.org/10.1109/sibgrapi.2019.00011
SONG Y F , ZHANG Z , WANG L . Richly activated graph convolutional network for action recognition with incomplete skeletons [C]// 2019 IEEE International Conference on Image Processing (ICIP) . Taipei, China : IEEE , 2019 : 1 - 5 . doi: 10.1109/icip.2019.8802917 http://dx.doi.org/10.1109/icip.2019.8802917
MEMMESHEIMER R , THEISEN N , PAULUS D . Gimme signals: discriminative signal encoding for multimodal activity recognition [C]// 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Las Vegas : IEEE , 2020 : 10394 - 10401 . doi: 10.1109/iros45743.2020.9341699 http://dx.doi.org/10.1109/iros45743.2020.9341699
0
浏览量
278
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构