Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Feature Boosting Network For 3D Pose Estimation.

Liu, Jun; Ding, Henghui; Shahroudy, Amir; Duan, Ling-Yu; Jiang, Xudong; Wang, Gang; Kot, Alex C.

IEEE Trans Pattern Anal Mach Intell ; 42(2): 494-501, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-30676946

RESUMO

In this paper, a feature boosting network is proposed for estimating 3D hand pose and 3D body pose from a single RGB image. In this method, the features learned by the convolutional layers are boosted with a new long short-term dependence-aware (LSTD) module, which enables the intermediate convolutional feature maps to perceive the graphical long short-term dependency among different hand (or body) parts using the designed Graphical ConvLSTM. Learning a set of features that are reliable and discriminatively representative of the pose of a hand (or body) part is difficult due to the ambiguities, texture and illumination variation, and self-occlusion in the real application of 3D pose estimation. To improve the reliability of the features for representing each body part and enhance the LSTD module, we further introduce a context consistency gate (CCG) in this paper, with which the convolutional feature maps are modulated according to their consistency with the context representations. We evaluate the proposed method on challenging benchmark datasets for 3D hand pose estimation and 3D full body pose estimation. Experimental results show the effectiveness of our method that achieves state-of-the-art performance on both of the tasks.

Assuntos

Mãos/diagnóstico por imagem , Imageamento Tridimensional/métodos , Aprendizado de Máquina , Postura/fisiologia , Humanos , Reprodutibilidade dos Testes

Skeleton-Based Online Action Prediction Using Scale Selection Network.

Liu, Jun; Shahroudy, Amir; Wang, Gang; Duan, Ling-Yu; Kot, Alex C.

IEEE Trans Pattern Anal Mach Intell ; 42(6): 1453-1467, 2020 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-30762531

RESUMO

Action prediction is to recognize the class label of an ongoing activity when only a part of it is observed. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis. Since there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed to make our network focus on the performed part of the ongoing action and try to suppress the possible incoming interference from the previous actions at each step. An activation sharing scheme is also proposed to handle the overlapping computations among the adjacent time steps, which enables our framework to run more efficiently. Moreover, to enhance the performance of our framework for action prediction with the skeletal input data, a hierarchy of dilated tree convolutions are also designed to learn the multi-level structured semantic representations over the skeleton joints at each frame. Our proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of our method for skeleton-based online action prediction.

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding.

Liu, Jun; Shahroudy, Amir; Perez, Mauricio; Wang, Gang; Duan, Ling-Yu; Kot, Alex C.

IEEE Trans Pattern Anal Mach Intell ; 42(10): 2684-2701, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-31095476

RESUMO

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding.

Assuntos

Aprendizado Profundo , Atividades Humanas/classificação , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Benchmarking , Humanos , Semântica , Gravação em Vídeo

Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates.

Liu, Jun; Shahroudy, Amir; Xu, Dong; Kot, Alex C; Wang, Gang.

IEEE Trans Pattern Anal Mach Intell ; 40(12): 3007-3021, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-29990167

RESUMO

Skeleton-based human action recognition has attracted a lot of research attention during the past few years. Recent works attempted to utilize recurrent neural networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activities in the skeletal data. The proposed work extends this idea to spatial domain as well as temporal domain to better analyze the hidden sources of action-related information within the human skeleton sequences in both of these domains simultaneously. Based on the pictorial structure of Kinect's skeletal data, an effective tree-structure based traversal framework is also proposed. In order to deal with the noise in the skeletal data, a new gating mechanism within LSTM module is introduced, with which the network can learn the reliability of the sequential data and accordingly adjust the effect of the input data on the updating procedure of the long-term context representation stored in the unit's memory cell. Moreover, we introduce a novel multi-modal feature fusion strategy within the LSTM unit in this paper. The comprehensive experimental results on seven challenging benchmark datasets for human action recognition demonstrate the effectiveness of the proposed method.

Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos.

Shahroudy, Amir; Ng, Tian-Tsong; Gong, Yihong; Wang, Gang.

IEEE Trans Pattern Anal Mach Intell ; 40(5): 1045-1058, 2018 05.

Artigo em Inglês | MEDLINE | ID: mdl-28391189

RESUMO

Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.

Multimodal Multipart Learning for Action Recognition in Depth Videos.

Shahroudy, Amir; Ng, Tian-Tsong; Yang, Qingxiong; Wang, Gang.

IEEE Trans Pattern Anal Mach Intell ; 38(10): 2123-9, 2016 10.

Artigo em Inglês | MEDLINE | ID: mdl-26660700

RESUMO

The articulated and complex nature of human actions makes the task of action recognition difficult. One approach to handle this complexity is dividing it to the kinetics of body parts and analyzing the actions based on these partial descriptors. We propose a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts. To represent dynamics and appearance of parts, we employ a heterogeneous set of depth and skeleton based features. The proper structure of multimodal multipart features are formulated into the learning framework via the proposed hierarchical mixed norm, to regularize the structured features of each part and to apply sparsity between them, in favor of a group feature selection. Our experimental results expose the effectiveness of the proposed learning method in which it outperforms other methods in all three tested datasets while saturating one of them by achieving perfect accuracy.

Assuntos

Algoritmos , Atividades Humanas , Reconhecimento Automatizado de Padrão , Humanos , Aprendizagem

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA