浏览全部资源
扫码关注微信
1.贵州财经大学 信息学院,贵州 贵阳550025
2.北京云迹科技股份有限公司 智能中台,北京 100089
Received:11 October 2022,
Revised:25 October 2022,
Published:05 August 2023
移动端阅览
LIU Kuan, WANG Wei, SHEN Hong-ting, et al. Behavior recognition based on time-dependent attention[J]. Chinese journal of liquid crystals and displays, 2023, 38(8): 1095-1106.
LIU Kuan, WANG Wei, SHEN Hong-ting, et al. Behavior recognition based on time-dependent attention[J]. Chinese journal of liquid crystals and displays, 2023, 38(8): 1095-1106. DOI: 10.37188/CJLCD.2022-0330.
针对行为识别任务中,行为体和动作状态变化速度不同以及缺少对动作间的相关性研究而引起的行为判别能力低和误判等问题,提出一种基于SlowFast架构的时间相关性注意力机制模型。首先,放弃光流而直接将视频数据作为网络输入,使模型可以进行端到端训练;其次,定义了一种由相关性注意力和时间注意力构成的时间相关性注意力机制,其中相关性注意力机制用于提取动作间的相关性信息;然后,将信息输入时间注意力机制来抑制无用特征;最后,针对SlowFast在路径融合过程中由于卷积核步长过大而导致的特征间相关性丢失问题,提出更有效的连续卷积操作进行替代。在UCF101和HMDB51两个数据集上进行实验,结果证明,所提方法与现有方法相比,精度和鲁棒性具有优势。
Aiming at the problems of low behavior discrimination ability and misjudgment caused by different change speeds of actors and action states and the lack of correlation research between actions in action recognition tasks, a temporal correlation attention mechanism model based on SlowFast architecture was proposed. Firstly, the optical flow was abandoned and the video data was directly used as the network input, so that the model could be trained end-to-end. Secondly, a temporal correlation attention mechanism composed of correlation attention and temporal attention was defined. The correlation attention mechanism was used to extract the correlation information between actions, and then the information was input into the temporal attention mechanism to suppress useless features. Finally, to solve the problem of the loss of correlation between features caused by the large step size of the convolution kernel in the path fusion process of SlowFast, a more effective continuous convolution operation was proposed. Experimental results on UCF101 and HMDB51 datasets show that the proposed method has advantages in accuracy and robustness compared with the existing methods.
WANG H , SCHMID C . Action recognition with improved trajectories [C]// Proceedings of 2013 IEEE International Conference on Computer Vision . Sydney : IEEE , 2013 : 3551 - 3558 . doi: 10.1109/iccv.2013.441 http://dx.doi.org/10.1109/iccv.2013.441
PENG X J , ZOU C Q , QIAO Y , et al . Action recognition with stacked fisher vectors [C]// Proceedings of the 13th European Conference on Computer Vision . Zurich : Springer , 2014 : 581 - 595 . doi: 10.1007/978-3-319-10602-1_38 http://dx.doi.org/10.1007/978-3-319-10602-1_38
LAN Z Z , LIN M , LI X C , et al . Beyond Gaussian pyramid: multi-skip feature stacking for action recognition [C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston : IEEE , 2015 : 204 - 212 . doi: 10.1109/cvpr.2015.7298616 http://dx.doi.org/10.1109/cvpr.2015.7298616
WANG Y , TRAN V , HOAI M . Evolution-preserving dense trajectory descriptors [J/OL]. arXiv , 2017 : 1702 . 04037 . doi: 10.1109/fg.2018.00076 http://dx.doi.org/10.1109/fg.2018.00076
ZDRAVEVSKI E , LAMESKI P , TRAJKOVIK V , et al . Improving activity recognition accuracy in ambient-assisted living systems by automated feature engineering [J]. IEEE Access , 2017 , 5 : 5262 - 5280 . doi: 10.1109/access.2017.2684913 http://dx.doi.org/10.1109/access.2017.2684913
KARPATHY A , TODERICI G , SHETTY S , et al . Large-scale video classification with convolutional neural networks [C]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus : IEEE , 2014 : 1725 - 1732 . doi: 10.1109/cvpr.2014.223 http://dx.doi.org/10.1109/cvpr.2014.223
SIMONYAN K , ZISSERMAN A . Two-stream convolutional networks for action recognition in videos [C]// Proceedings of the 27th International Conference on Neural Information Processing Systems . Montreal : MIT Press , 2014 : 568 - 576 .
TRAN D , BOURDEV L , FERGUS R , et al . Learning spatiotemporal features with 3D convolutional networks [C]// Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago : IEEE , 2015 : 4489 - 4497 . doi: 10.1109/iccv.2015.510 http://dx.doi.org/10.1109/iccv.2015.510
CARREIRA J , ZISSERMAN A . Quo Vadis, action recognition? A new model and the kinetics dataset [C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 4724 - 4733 . doi: 10.1109/cvpr.2017.502 http://dx.doi.org/10.1109/cvpr.2017.502
谢昭 , 周义 , 吴克伟 , 等 . 基于时空关注度LSTM的行为识别 [J]. 计算机学报 , 2021 , 44 ( 2 ): 261 - 274 . doi: 10.11897/SP.J.1016.2021.00261 http://dx.doi.org/10.11897/SP.J.1016.2021.00261
XIE Z , ZHOU Y , WU K W , et al . Activity recognition based on spatial-temporal attention LSTM [J]. Chinese Journal of Computers , 2021 , 44 ( 2 ): 261 - 274 . (in Chinese) . doi: 10.11897/SP.J.1016.2021.00261 http://dx.doi.org/10.11897/SP.J.1016.2021.00261
张红颖 , 安征 . 基于改进双流时空网络的人体行为识别 [J]. 光学 精密工程 , 2021 , 29 ( 2 ): 420 - 429 . doi: 10.37188/OPE.20212902.0420 http://dx.doi.org/10.37188/OPE.20212902.0420
ZHANG H Y , AN Z . Human action recognition based on improved two-stream spatiotemporal network [J]. Optics and Precision Engineering , 2021 , 29 ( 2 ): 420 - 429 . (in Chinese) . doi: 10.37188/OPE.20212902.0420 http://dx.doi.org/10.37188/OPE.20212902.0420
潘娜 , 蒋敏 , 孔军 . 基于时空交互注意力模型的人体行为识别算法 [J]. 激光与光电子学进展 , 2020 , 57 ( 18 ): 181506 . doi: 10.3788/lop57.181506 http://dx.doi.org/10.3788/lop57.181506
PAN N , JIANG M , KONG J . Human action recognition algorithm based on spatio-temporal interactive attention model [J]. Laser & Optoelectronics Progress , 2020 , 57 ( 18 ): 181506 . (in Chinese) . doi: 10.3788/lop57.181506 http://dx.doi.org/10.3788/lop57.181506
张文强 , 王增强 , 张良 . 结合时序动态图和双流卷积网络的人体行为识别 [J]. 激光与光电子学进展 , 2021 , 58 ( 2 ): 0210007 . doi: 10.3788/lop202158.0210007 http://dx.doi.org/10.3788/lop202158.0210007
ZHANG W Q , WANG Z Q , ZHANG L . Human action recognition combining sequential dynamic images and two-stream convolutional network [J]. Laser & Optoelectronics Progress , 2021 , 58 ( 2 ): 0210007 . (in Chinese) . doi: 10.3788/lop202158.0210007 http://dx.doi.org/10.3788/lop202158.0210007
陈莹 , 龚苏明 . 改进通道注意力机制下的人体行为识别网络 [J]. 电子与信息学报 , 2021 , 43 ( 12 ): 3538 - 3545 . doi: 10.11999/JEIT200431 http://dx.doi.org/10.11999/JEIT200431
CHEN Y , GONG S M . Human action recognition network based on improved channel attention mechanism [J]. Journal of Electronics & Information Technology , 2021 , 43 ( 12 ): 3538 - 3545 . (in Chinese) . doi: 10.11999/JEIT200431 http://dx.doi.org/10.11999/JEIT200431
AFZA F , KHAN M A , SHARIF M , et al . A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection [J]. Image and Vision Computing , 2021 , 106 : 104090 . doi: 10.1016/j.imavis.2020.104090 http://dx.doi.org/10.1016/j.imavis.2020.104090
XU J , SONG R , WEI H L , et al . A fast human action recognition network based on spatio-temporal features [J]. Neurocomputing , 2021 , 441 : 350 - 358 . doi: 10.1016/j.neucom.2020.04.150 http://dx.doi.org/10.1016/j.neucom.2020.04.150
CHEN Y X , ZHANG Z Q , YUAN C F , et al . Channel-wise topology refinement graph convolution for skeleton-based action recognition [C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal : IEEE , 2021 : 13339 - 13348 . doi: 10.1109/iccv48922.2021.01311 http://dx.doi.org/10.1109/iccv48922.2021.01311
CHEN Z , LI S C , YANG B , et al . Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition [C]// Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence . Palo Alto : AAAI , 2021 : 1113 - 1122 . doi: 10.1609/aaai.v35i2.16197 http://dx.doi.org/10.1609/aaai.v35i2.16197
PLIZZARI C , CANNICI M , MATTEUCCI M . Spatial temporal transformer network for skeleton-based action recognition [C]// Proceedings of Pattern Recognition. ICPR International Workshops and Challenges . Milano : Springer , 2021 : 694 - 701 . doi: 10.1007/978-3-030-68796-0_50 http://dx.doi.org/10.1007/978-3-030-68796-0_50
李颀 , 邓耀辉 , 王娇 . 基于轻量级图卷积网络的校园暴力行为识别 [J]. 液晶与显示 , 2022 , 37 ( 4 ): 530 - 538 . doi: 10.37188/CJLCD.2021-0229 http://dx.doi.org/10.37188/CJLCD.2021-0229
LI Q , DENG Y H , WANG J . Campus violence action recognition based on lightweight graph convolution network [J]. Chinese Journal of Liquid Crystals and Displays , 222 , 37 ( 4 ): 530 - 538 . (in Chinese) . doi: 10.37188/CJLCD.2021-0229 http://dx.doi.org/10.37188/CJLCD.2021-0229
FEICHTENHOFER C , FAN H Q , MALIK J , et al . SlowFast networks for video recognition [C]// Proceedings of 2019 IEEE/CVF International Conference on Compute r Vision . Seoul : IEEE , 2019 : 6201 - 6210 . doi: 10.1109/iccv.2019.00630 http://dx.doi.org/10.1109/iccv.2019.00630
DONG M , FANG Z L , LI Y F , et al . AR3D: attention residual 3D network for human action recognition [J]. Sensors , 2021 , 21 ( 5 ): 1656 . doi: 10.3390/s21051656 http://dx.doi.org/10.3390/s21051656
HU J , SHEN L , ALBANIE S , et al . Squeeze-and-excitation networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 8 ): 2011 - 2023 . doi: 10.1109/tpami.2019.2913372 http://dx.doi.org/10.1109/tpami.2019.2913372
SOOMRO K , ZAMIR A R , SHAH M . UCF101: a dataset of 101 human actions classes from videos in the wild [J/OL]. arXiv , 2012 : 1212 . 0402 .
KUEHNE H , JHUANG H , GARROTE E , et al . HMDB: a large video database for human motion recognition [C]// Proceedings of 2011 International Conference on Computer Vision . Barcelona : IEEE , 2011 : 2556 - 2563 . doi: 10.1109/iccv.2011.6126543 http://dx.doi.org/10.1109/iccv.2011.6126543
ZHU Y , LAN Z Z , NEWSAM S , et al . Hidden two-stream convolutional networks for action recognition [C]// Proceedings of the 14th Asian Conference on Computer Vision . Perth : Springer , 2018 : 363 - 378 . doi: 10.1007/978-3-030-20893-6_23 http://dx.doi.org/10.1007/978-3-030-20893-6_23
HUANG M , QIAN H M , HAN Y , et al . R(2+1)D-based two-stream CNN for human activities recognition in videos [C]// Proceedings of the 2021 40th Chinese Control Conference . Shanghai : IEEE , 2021 : 7932 - 7937 . doi: 10.23919/ccc52363.2021.9549432 http://dx.doi.org/10.23919/ccc52363.2021.9549432
CHEN L , LIU Y G , MAN Y C . Spatial-temporal channel-wise attention network for action recognition [J]. Multimedia Tools and Applications , 2021 , 80 ( 14 ): 21789 - 21808 . doi: 10.1007/s11042-021-10752-z http://dx.doi.org/10.1007/s11042-021-10752-z
LI J P , WEI P , ZHENG N N . Nesting spatiotemporal attention networks for action recognition [J]. Neurocomputing , 2021 , 459 : 338 - 348 . doi: 10.1016/j.neucom.2021.06.088 http://dx.doi.org/10.1016/j.neucom.2021.06.088
0
Views
289
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution