Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition.

Zhang, Haiping; Zhang, Xinhao; Yu, Dongjin; Guan, Liming; Wang, Dongjing; Zhou, Fuxing; Zhang, Wanjun

Zhang, Haiping; Zhang, Xinhao; Yu, Dongjin; Guan, Liming; Wang, Dongjing; Zhou, Fuxing; Zhang, Wanjun.

Afiliación

Zhang H; School of Computer Science, Hangzhou Dianzi University, Hangzhou 310005, China.
Zhang X; School of Information Engineering, Hangzhou Dianzi University, Hangzhou 310005, China.
Yu D; School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310005, China.
Guan L; School of Computer Science, Hangzhou Dianzi University, Hangzhou 310005, China.
Wang D; School of Information Engineering, Hangzhou Dianzi University, Hangzhou 310005, China.
Zhou F; School of Computer Science, Hangzhou Dianzi University, Hangzhou 310005, China.
Zhang W; School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310005, China.

Sensors (Basel) ; 23(12)2023 Jun 07.

Article en En | MEDLINE | ID: mdl-37420580

ABSTRACT

ABSTRACT

Graph convolutional networks are widely used in skeleton-based action recognition because of their good fitting ability to non-Euclidean data. While conventional multi-scale temporal convolution uses several fixed-size convolution kernels or dilation rates at each layer of the network, we argue that different layers and datasets require different receptive fields. We use multi-scale adaptive convolution kernels and dilation rates to optimize traditional multi-scale temporal convolution with a simple and effective self attention mechanism, allowing different network layers to adaptively select convolution kernels of different sizes and dilation rates instead of being fixed and unchanged. Besides, the effective receptive field of the simple residual connection is not large, and there is a great deal of redundancy in the deep residual network, which will lead to the loss of context when aggregating spatio-temporal information. This article introduces a feature fusion mechanism that replaces the residual connection between initial features and temporal module outputs, effectively solving the problems of context aggregation and initial feature fusion. We propose a multi-modality adaptive feature fusion framework (MMAFF) to simultaneously increase the receptive field in both spatial and temporal dimensions. Concretely, we input the features extracted by the spatial module into the adaptive temporal fusion module to simultaneously extract multi-scale skeleton features in both spatial and temporal parts. In addition, based on the current multi-stream approach, we use the limb stream to uniformly process correlated data from multiple modalities. Extensive experiments show that our model obtains competitive results with state-of-the-art methods on the NTU-RGB+D 60 and NTU-RGB+D 120 datasets.

Asunto(s)

Sistema Musculoesquelético; Esqueleto; Reconocimiento en Psicología; Algoritmos; Extremidades

Palabras clave

action recognition; attention mechanism; feature fusion; graph convolutional networks

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Sistema Musculoesquelético Idioma: En Revista: Sensors (Basel) Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google