Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network.

Wang, Le; Zang, Jinliang; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning

Wang, Le; Zang, Jinliang; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning.

Afiliação

Wang L; Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China. lewang@xjtu.edu.cn.
Zang J; Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China. zjl19920904@stu.xjtu.edu.cn.
Zhang Q; HERE Technologies, Chicago, IL 60606, USA. qilin.zhang@here.com.
Niu Z; Alibaba Group, Hangzhou 311121, China. zhenxing.nzx@alibaba-inc.com.
Hua G; Microsoft Research, Redmond, WA 98052, USA. ganghua@microsoft.com.
Zheng N; Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China. nnzheng@xjtu.edu.cn.

Sensors (Basel) ; 18(7)2018 Jun 21.

Article em En | MEDLINE | ID: mdl-29933555

ABSTRACT

ABSTRACT

Research in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs). However, effective and efficient methods for incorporation of temporal information into CNNs are still being actively explored in the recent literature. Motivated by the popular recurrent attention models in the research area of natural language processing, we propose the Attention-aware Temporal Weighted CNN (ATW CNN) for action recognition in videos, which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is simply implemented as temporal weighting yet it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW CNN framework is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with back-propagation. Our experimental results on the UCF-101 and HMDB-51 datasets showed that the proposed attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments.

Palavras-chave

action recognition; attention model; convolutional neural netwoks; temporal weighting; video-level prediction

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2018 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2018 Tipo de documento: Article