Spatial-Temporal Pyramid Graph Reasoning for Action Recognition.

Geng, Tiantian; Zheng, Feng; Hou, Xiaorong; Lu, Ke; Qi, Guo-Jun; Shao, Ling

Geng, Tiantian; Zheng, Feng; Hou, Xiaorong; Lu, Ke; Qi, Guo-Jun; Shao, Ling.

IEEE Trans Image Process ; 31: 5484-5497, 2022.

Article em En | MEDLINE | ID: mdl-35943998

RESUMO

Spatial-temporal relation reasoning is a significant yet challenging problem for video action recognition. Previous works typically apply local operations like 2D or 3D CNNs to conduct space-time interactions in video sequences, or simply capture space-time long-range relations of a single fixed scale. However, this is inadequate for obtaining a comprehensive action representation. Besides, most models treat all input frames equally for the final classification, without selecting key frames and motion-sensitive regions. This introduces irrelevant video content and hurts the performance of models. In this paper, we propose a generic Spatial-Temporal Pyramid Graph Network (STPG-Net) to adaptively capture long-range spatial-temporal relations in video sequences at multiple scales. Specifically, we design a temporal attention (TA) module and a spatial-temporal attention (STA) module to learn the contribution of each frame and each space-time region to an action at a feature level, respectively. We then apply the selected key information to build spatial-temporal pyramid graphs for long-range relation reasoning and more comprehensive action representation learning. STPG-Net can be flexibly integrated into 2D and 3D backbone networks in a plug-and-play manner. Extensive experiments show that it brings consistent improvements over many challenging baselines on several standard action recognition benchmarks (i.e., Something-Something V1 & V2, and FineGym), demonstrating the effectiveness of our approach.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: IEEE Trans Image Process Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: IEEE Trans Image Process Ano de publicação: 2022 Tipo de documento: Article