Behavior recognition based on time-dependent attention

LIU Kuan; WANG Wei; SHEN Hong-ting; HOU Hong-tao; GUO Min-zhen; LUO Zi-jiang

doi:10.37188/CJLCD.2022-0330

您当前的位置：

首页 >

文章列表页 >

Behavior recognition based on time-dependent attention

Image Processing | 更新时间：2023-08-10

- Behavior recognition based on time-dependent attention
- Chinese Journal of Liquid Crystals and Displays Vol. 38, Issue 8, Pages: 1095-1106(2023)
- 作者机构：
  
  1.贵州财经大学信息学院，贵州贵阳550025
  2.北京云迹科技股份有限公司智能中台，北京 100089
- 作者简介：
- 基金信息：
  
  Supproted by Foundation：National Natural Science Foundation of China(11664005);Science Research Project for students （postgraduates） of Guizhou University of Finance and Economics(2021ZXSY113)
- DOI：10.37188/CJLCD.2022-0330
  CLC： TP391.4
- Received：11 October 2022，
  
  Revised：25 October 2022，
  
  Published：05 August 2023
- 稿件说明：
移动端阅览
LIU Kuan, WANG Wei, SHEN Hong-ting, et al. Behavior recognition based on time-dependent attention[J]. Chinese journal of liquid crystals and displays, 2023, 38(8): 1095-1106.
DOI：

LIU Kuan, WANG Wei, SHEN Hong-ting, et al. Behavior recognition based on time-dependent attention[J]. Chinese journal of liquid crystals and displays, 2023, 38(8): 1095-1106. DOI： 10.37188/CJLCD.2022-0330.

摘要

针对行为识别任务中，行为体和动作状态变化速度不同以及缺少对动作间的相关性研究而引起的行为判别能力低和误判等问题，提出一种基于SlowFast架构的时间相关性注意力机制模型。首先，放弃光流而直接将视频数据作为网络输入，使模型可以进行端到端训练；其次，定义了一种由相关性注意力和时间注意力构成的时间相关性注意力机制，其中相关性注意力机制用于提取动作间的相关性信息；然后，将信息输入时间注意力机制来抑制无用特征；最后，针对SlowFast在路径融合过程中由于卷积核步长过大而导致的特征间相关性丢失问题，提出更有效的连续卷积操作进行替代。在UCF101和HMDB51两个数据集上进行实验，结果证明，所提方法与现有方法相比，精度和鲁棒性具有优势。

Abstract

Aiming at the problems of low behavior discrimination ability and misjudgment caused by different change speeds of actors and action states and the lack of correlation research between actions in action recognition tasks， a temporal correlation attention mechanism model based on SlowFast architecture was proposed. Firstly， the optical flow was abandoned and the video data was directly used as the network input， so that the model could be trained end-to-end. Secondly， a temporal correlation attention mechanism composed of correlation attention and temporal attention was defined. The correlation attention mechanism was used to extract the correlation information between actions， and then the information was input into the temporal attention mechanism to suppress useless features. Finally， to solve the problem of the loss of correlation between features caused by the large step size of the convolution kernel in the path fusion process of SlowFast， a more effective continuous convolution operation was proposed. Experimental results on UCF101 and HMDB51 datasets show that the proposed method has advantages in accuracy and robustness compared with the existing methods.

关键词

Keywords

references

WANG H ， SCHMID C . Action recognition with improved trajectories ［C］// Proceedings of 2013 IEEE International Conference on Computer Vision . Sydney ： IEEE ， 2013 ： 3551 - 3558 . doi: 10.1109/iccv.2013.441 http://dx.doi.org/10.1109/iccv.2013.441

PENG X J ， ZOU C Q ， QIAO Y ， et al . Action recognition with stacked fisher vectors ［C］// Proceedings of the 13th European Conference on Computer Vision . Zurich ： Springer ， 2014 ： 581 - 595 . doi: 10.1007/978-3-319-10602-1_38 http://dx.doi.org/10.1007/978-3-319-10602-1_38

LAN Z Z ， LIN M ， LI X C ， et al . Beyond Gaussian pyramid： multi-skip feature stacking for action recognition ［C］// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston ： IEEE ， 2015 ： 204 - 212 . doi: 10.1109/cvpr.2015.7298616 http://dx.doi.org/10.1109/cvpr.2015.7298616

WANG Y ， TRAN V ， HOAI M . Evolution-preserving dense trajectory descriptors ［J/OL］. arXiv ， 2017 ： 1702 . 04037 . doi: 10.1109/fg.2018.00076 http://dx.doi.org/10.1109/fg.2018.00076

ZDRAVEVSKI E ， LAMESKI P ， TRAJKOVIK V ， et al . Improving activity recognition accuracy in ambient-assisted living systems by automated feature engineering ［J］. IEEE Access ， 2017 ， 5 ： 5262 - 5280 . doi: 10.1109/access.2017.2684913 http://dx.doi.org/10.1109/access.2017.2684913

KARPATHY A ， TODERICI G ， SHETTY S ， et al . Large-scale video classification with convolutional neural networks ［C］// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus ： IEEE ， 2014 ： 1725 - 1732 . doi: 10.1109/cvpr.2014.223 http://dx.doi.org/10.1109/cvpr.2014.223

SIMONYAN K ， ZISSERMAN A . Two-stream convolutional networks for action recognition in videos ［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems . Montreal ： MIT Press ， 2014 ： 568 - 576 .

TRAN D ， BOURDEV L ， FERGUS R ， et al . Learning spatiotemporal features with 3D convolutional networks ［C］// Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago ： IEEE ， 2015 ： 4489 - 4497 . doi: 10.1109/iccv.2015.510 http://dx.doi.org/10.1109/iccv.2015.510

CARREIRA J ， ZISSERMAN A . Quo Vadis， action recognition？ A new model and the kinetics dataset ［C］// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu ： IEEE ， 2017 ： 4724 - 4733 . doi: 10.1109/cvpr.2017.502 http://dx.doi.org/10.1109/cvpr.2017.502

谢昭，周义，吴克伟，等 . 基于时空关注度LSTM的行为识别［J］. 计算机学报， 2021 ， 44 （ 2 ）： 261 - 274 . doi: 10.11897/SP.J.1016.2021.00261 http://dx.doi.org/10.11897/SP.J.1016.2021.00261

XIE Z ， ZHOU Y ， WU K W ， et al . Activity recognition based on spatial-temporal attention LSTM ［J］. Chinese Journal of Computers ， 2021 ， 44 （ 2 ）： 261 - 274 . （in Chinese） . doi: 10.11897/SP.J.1016.2021.00261 http://dx.doi.org/10.11897/SP.J.1016.2021.00261

张红颖，安征 . 基于改进双流时空网络的人体行为识别［J］. 光学精密工程， 2021 ， 29 （ 2 ）： 420 - 429 . doi: 10.37188/OPE.20212902.0420 http://dx.doi.org/10.37188/OPE.20212902.0420

ZHANG H Y ， AN Z . Human action recognition based on improved two-stream spatiotemporal network ［J］. Optics and Precision Engineering ， 2021 ， 29 （ 2 ）： 420 - 429 . （in Chinese） . doi: 10.37188/OPE.20212902.0420 http://dx.doi.org/10.37188/OPE.20212902.0420

潘娜，蒋敏，孔军 . 基于时空交互注意力模型的人体行为识别算法［J］. 激光与光电子学进展， 2020 ， 57 （ 18 ）： 181506 . doi: 10.3788/lop57.181506 http://dx.doi.org/10.3788/lop57.181506

PAN N ， JIANG M ， KONG J . Human action recognition algorithm based on spatio-temporal interactive attention model ［J］. Laser & Optoelectronics Progress ， 2020 ， 57 （ 18 ）： 181506 . （in Chinese） . doi: 10.3788/lop57.181506 http://dx.doi.org/10.3788/lop57.181506

张文强，王增强，张良 . 结合时序动态图和双流卷积网络的人体行为识别［J］. 激光与光电子学进展， 2021 ， 58 （ 2 ）： 0210007 . doi: 10.3788/lop202158.0210007 http://dx.doi.org/10.3788/lop202158.0210007

ZHANG W Q ， WANG Z Q ， ZHANG L . Human action recognition combining sequential dynamic images and two-stream convolutional network ［J］. Laser & Optoelectronics Progress ， 2021 ， 58 （ 2 ）： 0210007 . （in Chinese） . doi: 10.3788/lop202158.0210007 http://dx.doi.org/10.3788/lop202158.0210007

陈莹，龚苏明 . 改进通道注意力机制下的人体行为识别网络［J］. 电子与信息学报， 2021 ， 43 （ 12 ）： 3538 - 3545 . doi: 10.11999/JEIT200431 http://dx.doi.org/10.11999/JEIT200431

CHEN Y ， GONG S M . Human action recognition network based on improved channel attention mechanism ［J］. Journal of Electronics & Information Technology ， 2021 ， 43 （ 12 ）： 3538 - 3545 . （in Chinese） . doi: 10.11999/JEIT200431 http://dx.doi.org/10.11999/JEIT200431

AFZA F ， KHAN M A ， SHARIF M ， et al . A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection ［J］. Image and Vision Computing ， 2021 ， 106 ： 104090 . doi: 10.1016/j.imavis.2020.104090 http://dx.doi.org/10.1016/j.imavis.2020.104090

XU J ， SONG R ， WEI H L ， et al . A fast human action recognition network based on spatio-temporal features ［J］. Neurocomputing ， 2021 ， 441 ： 350 - 358 . doi: 10.1016/j.neucom.2020.04.150 http://dx.doi.org/10.1016/j.neucom.2020.04.150

CHEN Y X ， ZHANG Z Q ， YUAN C F ， et al . Channel-wise topology refinement graph convolution for skeleton-based action recognition ［C］// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal ： IEEE ， 2021 ： 13339 - 13348 . doi: 10.1109/iccv48922.2021.01311 http://dx.doi.org/10.1109/iccv48922.2021.01311

CHEN Z ， LI S C ， YANG B ， et al . Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition ［C］// Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence . Palo Alto ： AAAI ， 2021 ： 1113 - 1122 . doi: 10.1609/aaai.v35i2.16197 http://dx.doi.org/10.1609/aaai.v35i2.16197

PLIZZARI C ， CANNICI M ， MATTEUCCI M . Spatial temporal transformer network for skeleton-based action recognition ［C］// Proceedings of Pattern Recognition. ICPR International Workshops and Challenges . Milano ： Springer ， 2021 ： 694 - 701 . doi: 10.1007/978-3-030-68796-0_50 http://dx.doi.org/10.1007/978-3-030-68796-0_50

李颀，邓耀辉，王娇 . 基于轻量级图卷积网络的校园暴力行为识别［J］. 液晶与显示， 2022 ， 37 （ 4 ）： 530 - 538 . doi: 10.37188/CJLCD.2021-0229 http://dx.doi.org/10.37188/CJLCD.2021-0229

LI Q ， DENG Y H ， WANG J . Campus violence action recognition based on lightweight graph convolution network ［J］. Chinese Journal of Liquid Crystals and Displays ， 222 ， 37 （ 4 ）： 530 - 538 . （in Chinese） . doi: 10.37188/CJLCD.2021-0229 http://dx.doi.org/10.37188/CJLCD.2021-0229

FEICHTENHOFER C ， FAN H Q ， MALIK J ， et al . SlowFast networks for video recognition ［C］// Proceedings of 2019 IEEE/CVF International Conference on Compute r Vision . Seoul ： IEEE ， 2019 ： 6201 - 6210 . doi: 10.1109/iccv.2019.00630 http://dx.doi.org/10.1109/iccv.2019.00630

DONG M ， FANG Z L ， LI Y F ， et al . AR3D： attention residual 3D network for human action recognition ［J］. Sensors ， 2021 ， 21 （ 5 ）： 1656 . doi: 10.3390/s21051656 http://dx.doi.org/10.3390/s21051656

HU J ， SHEN L ， ALBANIE S ， et al . Squeeze-and-excitation networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2020 ， 42 （ 8 ）： 2011 - 2023 . doi: 10.1109/tpami.2019.2913372 http://dx.doi.org/10.1109/tpami.2019.2913372

SOOMRO K ， ZAMIR A R ， SHAH M . UCF101： a dataset of 101 human actions classes from videos in the wild ［J/OL］. arXiv ， 2012 ： 1212 . 0402 .

KUEHNE H ， JHUANG H ， GARROTE E ， et al . HMDB： a large video database for human motion recognition ［C］// Proceedings of 2011 International Conference on Computer Vision . Barcelona ： IEEE ， 2011 ： 2556 - 2563 . doi: 10.1109/iccv.2011.6126543 http://dx.doi.org/10.1109/iccv.2011.6126543

ZHU Y ， LAN Z Z ， NEWSAM S ， et al . Hidden two-stream convolutional networks for action recognition ［C］// Proceedings of the 14th Asian Conference on Computer Vision . Perth ： Springer ， 2018 ： 363 - 378 . doi: 10.1007/978-3-030-20893-6_23 http://dx.doi.org/10.1007/978-3-030-20893-6_23

HUANG M ， QIAN H M ， HAN Y ， et al . R（2+1）D-based two-stream CNN for human activities recognition in videos ［C］// Proceedings of the 2021 40th Chinese Control Conference . Shanghai ： IEEE ， 2021 ： 7932 - 7937 . doi: 10.23919/ccc52363.2021.9549432 http://dx.doi.org/10.23919/ccc52363.2021.9549432

CHEN L ， LIU Y G ， MAN Y C . Spatial-temporal channel-wise attention network for action recognition ［J］. Multimedia Tools and Applications ， 2021 ， 80 （ 14 ）： 21789 - 21808 . doi: 10.1007/s11042-021-10752-z http://dx.doi.org/10.1007/s11042-021-10752-z

LI J P ， WEI P ， ZHENG N N . Nesting spatiotemporal attention networks for action recognition ［J］. Neurocomputing ， 2021 ， 459 ： 338 - 348 . doi: 10.1016/j.neucom.2021.06.088 http://dx.doi.org/10.1016/j.neucom.2021.06.088

Views

289

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

No data

Related Author

LIU Kuan

WANG Wei

HOU Hong-tao

GUO Min-zhen

Related Institution

Beijing Cloud Trace Technology Co.， LTD ， Intelligent middle

Address：No.3888 Dong Nanhu Road, Changchun, Jilin, China 130033 Postal code：130033
Tel：0431-86176059 Email：yjxs@ciomp.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 吉ICP备11002662号-17 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰