Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 1 de 1
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38865228

RESUMO

This work pays the first research effort to leverage point cloud sequence-based Self-supervised 3-D Action Feature Learning (S3AFL), under text's cross-modality weak supervision. We intend to fill the huge performance gap between point cloud sequence and 3-D skeleton-based manners. The key intuition derives from the observation that skeleton-based manners actually hold the human pose's high-level knowledge that leads to attention on the body's joint-aware local parts. Inspired by this, we propose to introduce the text's weak supervision of high-level semantics into a point cloud sequence-based paradigm. With RGB-point cloud pair sequence acquired via RGB-D camera, text sequence is first generated from RGB component using pretrained image captioning model, as auxiliary weak supervision. Then, S3AFL runs in a cross and intra-modality contrastive learning (CL) way. To resist text's missing and redundant semantics, feature learning is conducted in a multistage way with semantic refinement. Essentially, text is only required for training. To facilitate the feature's representation power on fine-grained actions, a multirank max-pooling (MR-MP) way is also proposed for the point set network to better maintain discriminative clues. Experiments verify that the text's weak supervision can facilitate performance by 10.8%, 10.4%, and 8.0% on NTU RGB + D 60, 120, and N-UCLA at most. The performance gap between point cloud sequence and skeleton-based manners has been remarkably narrowed down. The idea of transferring text's weak supervision to S3AFL can also be applied to a skeleton manner, with strong generality. The source code is available at https://github.com/tangent-T/W3AMT.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA