1.贵州财经大学 信息学院,贵州 贵阳550025
2.北京云迹科技股份有限公司 智能中台,北京 100089
[ "刘宽(1996—),男,贵州正安人,硕士研究生,2019年于贵州财经大学获得学士学位,主要从事行为识别等方面的研究。E-mail:liukuan@mail.gufe.edu.cn" ]
[ "罗子江(1980—),男,贵州遵义人,博士,教授,2021年于贵州大学获得博士学位,主要从事模式识别等方面的研究。E-mail:luozijiang@mail.gufe.edu.cn" ]
扫 描 看 全 文
刘宽, 汪威, 申红婷, 等. 基于时间相关性注意力的行为识别[J]. 液晶与显示, 2023,38(8):1095-1106.
LIU Kuan, WANG Wei, SHEN Hong-ting, et al. Behavior recognition based on time-dependent attention[J]. Chinese Journal of Liquid Crystals and Displays, 2023,38(8):1095-1106.
刘宽, 汪威, 申红婷, 等. 基于时间相关性注意力的行为识别[J]. 液晶与显示, 2023,38(8):1095-1106. DOI: 10.37188/CJLCD.2022-0330.
LIU Kuan, WANG Wei, SHEN Hong-ting, et al. Behavior recognition based on time-dependent attention[J]. Chinese Journal of Liquid Crystals and Displays, 2023,38(8):1095-1106. DOI: 10.37188/CJLCD.2022-0330.
针对行为识别任务中,行为体和动作状态变化速度不同以及缺少对动作间的相关性研究而引起的行为判别能力低和误判等问题,提出一种基于SlowFast架构的时间相关性注意力机制模型。首先,放弃光流而直接将视频数据作为网络输入,使模型可以进行端到端训练;其次,定义了一种由相关性注意力和时间注意力构成的时间相关性注意力机制,其中相关性注意力机制用于提取动作间的相关性信息;然后,将信息输入时间注意力机制来抑制无用特征;最后,针对SlowFast在路径融合过程中由于卷积核步长过大而导致的特征间相关性丢失问题,提出更有效的连续卷积操作进行替代。在UCF101和HMDB51两个数据集上进行实验,结果证明,所提方法与现有方法相比,精度和鲁棒性具有优势。
Aiming at the problems of low behavior discrimination ability and misjudgment caused by different change speeds of actors and action states and the lack of correlation research between actions in action recognition tasks, a temporal correlation attention mechanism model based on SlowFast architecture was proposed. Firstly, the optical flow was abandoned and the video data was directly used as the network input, so that the model could be trained end-to-end. Secondly, a temporal correlation attention mechanism composed of correlation attention and temporal attention was defined. The correlation attention mechanism was used to extract the correlation information between actions, and then the information was input into the temporal attention mechanism to suppress useless features. Finally, to solve the problem of the loss of correlation between features caused by the large step size of the convolution kernel in the path fusion process of SlowFast, a more effective continuous convolution operation was proposed. Experimental results on UCF101 and HMDB51 datasets show that the proposed method has advantages in accuracy and robustness compared with the existing methods.
行为识别SlowFast时间相关性注意力机制端到端训练路径融合
behavior recognitionSlowFasttime-dependent attention mechanismend-to-end traininglateral connection
WANG H, SCHMID C. Action recognition with improved trajectories [C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney: IEEE, 2013: 3551-3558. doi: 10.1109/iccv.2013.441http://dx.doi.org/10.1109/iccv.2013.441
PENG X J, ZOU C Q, QIAO Y, et al. Action recognition with stacked fisher vectors [C]//Proceedings of the 13th European Conference on Computer Vision. Zurich: Springer, 2014: 581-595. doi: 10.1007/978-3-319-10602-1_38http://dx.doi.org/10.1007/978-3-319-10602-1_38
LAN Z Z, LIN M, LI X C, et al. Beyond Gaussian pyramid: multi-skip feature stacking for action recognition [C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 204-212. doi: 10.1109/cvpr.2015.7298616http://dx.doi.org/10.1109/cvpr.2015.7298616
WANG Y, TRAN V, HOAI M. Evolution-preserving dense trajectory descriptors [J/OL]. arXiv, 2017: 1702.04037. doi: 10.1109/fg.2018.00076http://dx.doi.org/10.1109/fg.2018.00076
ZDRAVEVSKI E, LAMESKI P, TRAJKOVIK V, et al. Improving activity recognition accuracy in ambient-assisted living systems by automated feature engineering[J]. IEEE Access, 2017, 5: 5262-5280. doi: 10.1109/access.2017.2684913http://dx.doi.org/10.1109/access.2017.2684913
KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks [C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732. doi: 10.1109/cvpr.2014.223http://dx.doi.org/10.1109/cvpr.2014.223
SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos [C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014: 568-576.
TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks [C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 4489-4497. doi: 10.1109/iccv.2015.510http://dx.doi.org/10.1109/iccv.2015.510
CARREIRA J, ZISSERMAN A. Quo Vadis, action recognition? A new model and the kinetics dataset [C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4724-4733. doi: 10.1109/cvpr.2017.502http://dx.doi.org/10.1109/cvpr.2017.502
谢昭,周义,吴克伟,等.基于时空关注度LSTM的行为识别[J].计算机学报,2021,44(2):261-274. doi: 10.11897/SP.J.1016.2021.00261http://dx.doi.org/10.11897/SP.J.1016.2021.00261
XIE Z, ZHOU Y, WU K W, et al. Activity recognition based on spatial-temporal attention LSTM [J]. Chinese Journal of Computers, 2021, 44(2): 261-274. (in Chinese). doi: 10.11897/SP.J.1016.2021.00261http://dx.doi.org/10.11897/SP.J.1016.2021.00261
张红颖,安征.基于改进双流时空网络的人体行为识别[J].光学 精密工程,2021,29(2):420-429. doi: 10.37188/OPE.20212902.0420http://dx.doi.org/10.37188/OPE.20212902.0420
ZHANG H Y, AN Z. Human action recognition based on improved two-stream spatiotemporal network [J]. Optics and Precision Engineering, 2021, 29(2): 420-429. (in Chinese). doi: 10.37188/OPE.20212902.0420http://dx.doi.org/10.37188/OPE.20212902.0420
潘娜,蒋敏,孔军.基于时空交互注意力模型的人体行为识别算法[J].激光与光电子学进展,2020,57(18):181506. doi: 10.3788/lop57.181506http://dx.doi.org/10.3788/lop57.181506
PAN N, JIANG M, KONG J. Human action recognition algorithm based on spatio-temporal interactive attention model [J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506. (in Chinese). doi: 10.3788/lop57.181506http://dx.doi.org/10.3788/lop57.181506
张文强,王增强,张良.结合时序动态图和双流卷积网络的人体行为识别[J].激光与光电子学进展,2021,58(2):0210007. doi: 10.3788/lop202158.0210007http://dx.doi.org/10.3788/lop202158.0210007
ZHANG W Q, WANG Z Q, ZHANG L. Human action recognition combining sequential dynamic images and two-stream convolutional network [J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210007. (in Chinese). doi: 10.3788/lop202158.0210007http://dx.doi.org/10.3788/lop202158.0210007
陈莹,龚苏明.改进通道注意力机制下的人体行为识别网络[J].电子与信息学报,2021,43(12):3538-3545. doi: 10.11999/JEIT200431http://dx.doi.org/10.11999/JEIT200431
CHEN Y, GONG S M. Human action recognition network based on improved channel attention mechanism [J]. Journal of Electronics & Information Technology, 2021, 43(12): 3538-3545.(in Chinese). doi: 10.11999/JEIT200431http://dx.doi.org/10.11999/JEIT200431
AFZA F, KHAN M A, SHARIF M, et al. A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection [J]. Image and Vision Computing, 2021, 106: 104090. doi: 10.1016/j.imavis.2020.104090http://dx.doi.org/10.1016/j.imavis.2020.104090
XU J, SONG R, WEI H L, et al. A fast human action recognition network based on spatio-temporal features [J]. Neurocomputing, 2021, 441: 350-358. doi: 10.1016/j.neucom.2020.04.150http://dx.doi.org/10.1016/j.neucom.2020.04.150
CHEN Y X, ZHANG Z Q, YUAN C F, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition [C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 13339-13348. doi: 10.1109/iccv48922.2021.01311http://dx.doi.org/10.1109/iccv48922.2021.01311
CHEN Z, LI S C, YANG B, et al. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition [C]//Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2021: 1113-1122. doi: 10.1609/aaai.v35i2.16197http://dx.doi.org/10.1609/aaai.v35i2.16197
PLIZZARI C, CANNICI M, MATTEUCCI M. Spatial temporal transformer network for skeleton-based action recognition [C]//Proceedings of Pattern Recognition. ICPR International Workshops and Challenges. Milano: Springer, 2021: 694-701. doi: 10.1007/978-3-030-68796-0_50http://dx.doi.org/10.1007/978-3-030-68796-0_50
李颀,邓耀辉,王娇.基于轻量级图卷积网络的校园暴力行为识别[J].液晶与显示,2022,37(4):530-538. doi: 10.37188/CJLCD.2021-0229http://dx.doi.org/10.37188/CJLCD.2021-0229
LI Q, DENG Y H, WANG J. Campus violence action recognition based on lightweight graph convolution network [J]. Chinese Journal of Liquid Crystals and Displays, 222, 37(4): 530-538. (in Chinese). doi: 10.37188/CJLCD.2021-0229http://dx.doi.org/10.37188/CJLCD.2021-0229
FEICHTENHOFER C, FAN H Q, MALIK J, et al. SlowFast networks for video recognition [C]//Proceedings of 2019 IEEE/CVF International Conference on Compute r Vision. Seoul: IEEE, 2019: 6201-6210. doi: 10.1109/iccv.2019.00630http://dx.doi.org/10.1109/iccv.2019.00630
DONG M, FANG Z L, LI Y F, et al. AR3D: attention residual 3D network for human action recognition [J]. Sensors, 2021, 21(5): 1656. doi: 10.3390/s21051656http://dx.doi.org/10.3390/s21051656
HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023. doi: 10.1109/tpami.2019.2913372http://dx.doi.org/10.1109/tpami.2019.2913372
SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human actions classes from videos in the wild [J/OL]. arXiv, 2012: 1212.0402.
KUEHNE H, JHUANG H, GARROTE E, et al. HMDB: a large video database for human motion recognition [C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona: IEEE, 2011: 2556-2563. doi: 10.1109/iccv.2011.6126543http://dx.doi.org/10.1109/iccv.2011.6126543
ZHU Y, LAN Z Z, NEWSAM S, et al. Hidden two-stream convolutional networks for action recognition [C]//Proceedings of the 14th Asian Conference on Computer Vision. Perth: Springer, 2018: 363-378. doi: 10.1007/978-3-030-20893-6_23http://dx.doi.org/10.1007/978-3-030-20893-6_23
HUANG M, QIAN H M, HAN Y, et al. R(2+1)D-based two-stream CNN for human activities recognition in videos [C]//Proceedings of the 2021 40th Chinese Control Conference. Shanghai: IEEE, 2021: 7932-7937. doi: 10.23919/ccc52363.2021.9549432http://dx.doi.org/10.23919/ccc52363.2021.9549432
CHEN L, LIU Y G, MAN Y C. Spatial-temporal channel-wise attention network for action recognition [J]. Multimedia Tools and Applications, 2021, 80(14): 21789-21808. doi: 10.1007/s11042-021-10752-zhttp://dx.doi.org/10.1007/s11042-021-10752-z
LI J P, WEI P, ZHENG N N. Nesting spatiotemporal attention networks for action recognition [J]. Neurocomputing, 2021, 459: 338-348. doi: 10.1016/j.neucom.2021.06.088http://dx.doi.org/10.1016/j.neucom.2021.06.088
0
浏览量
75
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构