Búsqueda | Portal Regional de la BVS

Full-BAPose: Bottom Up Framework for Full Body Pose Estimation.

Artacho, Bruno; Savakis, Andreas.

Sensors (Basel) ; 23(7)2023 Apr 04.

Artículo en Inglés | MEDLINE | ID: mdl-37050785

RESUMEN

We present Full-BAPose, a novel bottom-up approach for full body pose estimation that achieves state-of-the-art results without relying on external people detectors. The Full-BAPose method addresses the broader task of full body pose estimation including hands, feet, and facial landmarks. Our deep learning architecture is end-to-end trainable based on an encoder-decoder configuration with HRNet backbone and multi-scale representations using a disentangled waterfall atrous spatial pooling module. The disentangled waterfall module leverages the efficiency of progressive filtering, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Additionally, it combines multi-scale features obtained from the waterfall flow with the person-detection capability of the disentangled adaptive regression and incorporates adaptive convolutions to infer keypoints more precisely in crowded scenes. Full-BAPose achieves state-of-the art performance on the challenging CrowdPose and COCO-WholeBody datasets, with AP of 72.2% and 68.4%, respectively, based on 133 keypoints. Our results demonstrate that Full-BAPose is efficient and robust when operating under a variety conditions, including multiple people, changes in scale, and occlusions.

Perceptual optimization of language: Evidence from American Sign Language.

Caselli, Naomi; Occhino, Corrine; Artacho, Bruno; Savakis, Andreas; Dye, Matthew.

Cognition ; 224: 105040, 2022 07.

Artículo en Inglés | MEDLINE | ID: mdl-35192994

RESUMEN

If language has evolved for communication, languages should be structured such that they maximize the efficiency of processing. What is efficient for communication in the visual-gestural modality is different from the auditory-oral modality, and we ask here whether sign languages have adapted to the affordances and constraints of the signed modality. During sign perception, perceivers look almost exclusively at the lower face, rarely looking down at the hands. This means that signs articulated far from the lower face must be perceived through peripheral vision, which has less acuity than central vision. We tested the hypothesis that signs that are more predictable (high frequency signs, signs with common handshapes) can be produced further from the face because precise visual resolution is not necessary for recognition. Using pose estimation algorithms, we examined the structure of over 2000 American Sign Language lexical signs to identify whether lexical frequency and handshape probability affect the position of the wrist in 2D space. We found that frequent signs with rare handshapes tended to occur closer to the signer's face than frequent signs with common handshapes, and that frequent signs are generally more likely to be articulated further from the face than infrequent signs. Together these results provide empirical support for anecdotal assertions that the phonological structure of sign language is shaped by the properties of the human visual and motor systems.

Asunto(s)

Lenguaje , Lengua de Signos , Gestos , Humanos , Reconocimiento en Psicología , Percepción Visual

UniPose+: A Unified Framework for 2D and 3D Human Pose Estimation in Images and Videos.

Artacho, Bruno; Savakis, Andreas.

IEEE Trans Pattern Anal Mach Intell ; 44(12): 9641-9653, 2022 12.

Artículo en Inglés | MEDLINE | ID: mdl-34727028

RESUMEN

We propose UniPose+, a unified framework for 2D and 3D human pose estimation in images and videos. The UniPose+ architecture leverages multi-scale feature representations to increase the effectiveness of backbone feature extractors, with no significant increase in network size and no postprocessing. Current pose estimation methods heavily rely on statistical postprocessing or predefined anchor poses for joint localization. The UniPose+ framework incorporates contextual information across scales and joint localization with Gaussian heatmap modulation at the decoder output to estimate 2D and 3D human pose in a single stage with state-of-the-art accuracy, without relying on predefined anchor poses. The multi-scale representations allowed by the waterfall module in the UniPose+ framework leverage the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Our results on multiple datasets demonstrate that UniPose+, with a HRNet, ResNet or SENet backbone and waterfall module, is a robust and efficient architecture for single person 2D and 3D pose estimation in single images and videos.

Asunto(s)

Algoritmos , Imagenología Tridimensional , Humanos , Imagenología Tridimensional/métodos

GourmetNet: Food Segmentation Using Multi-Scale Waterfall Features with Spatial and Channel Attention.

Sharma, Udit; Artacho, Bruno; Savakis, Andreas.

Sensors (Basel) ; 21(22)2021 Nov 11.

Artículo en Inglés | MEDLINE | ID: mdl-34833577

RESUMEN

We propose GourmetNet, a single-pass, end-to-end trainable network for food segmentation that achieves state-of-the-art performance. Food segmentation is an important problem as the first step for nutrition monitoring, food volume and calorie estimation. Our novel architecture incorporates both channel attention and spatial attention information in an expanded multi-scale feature representation using our advanced Waterfall Atrous Spatial Pooling module. GourmetNet refines the feature extraction process by merging features from multiple levels of the backbone through the two attention modules. The refined features are processed with the advanced multi-scale waterfall module that combines the benefits of cascade filtering and pyramid representations without requiring a separate decoder or post-processing. Our experiments on two food datasets show that GourmetNet significantly outperforms existing current state-of-the-art methods.

Asunto(s)

Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Atención , Alimentos

Waterfall Atrous Spatial Pooling Architecture for Efficient Semantic Segmentation.

Artacho, Bruno; Savakis, Andreas.

Sensors (Basel) ; 19(24)2019 Dec 05.

Artículo en Inglés | MEDLINE | ID: mdl-31817366

RESUMEN

We propose a new efficient architecture for semantic segmentation, based on a "Waterfall" Atrous Spatial Pooling architecture, that achieves a considerable accuracy increase while decreasing the number of network parameters and memory footprint. The proposed Waterfall architecture leverages the efficiency of progressive filtering in the cascade architecture while maintaining multiscale fields-of-view comparable to spatial pyramid configurations. Additionally, our method does not rely on a postprocessing stage with Conditional Random Fields, which further reduces complexity and required training time. We demonstrate that the Waterfall approach with a ResNet backbone is a robust and efficient architecture for semantic segmentation obtaining state-of-the-art results with significant reduction in the number of parameters for the Pascal VOC dataset and the Cityscapes dataset.

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA