Búsqueda | BVS Bolivia

Vision-based activity recognition in children with autism-related behaviors.

Wei, Pengbo; Ahmedt-Aristizabal, David; Gammulle, Harshala; Denman, Simon; Armin, Mohammad Ali.

Heliyon ; 9(6): e16763, 2023 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37303525

RESUMEN

Advances in machine learning and contactless sensors have enabled the understanding complex human behaviors in a healthcare setting. In particular, several deep learning systems have been introduced to enable comprehensive analysis of neuro-developmental conditions such as Autism Spectrum Disorder (ASD). This condition affects children from their early developmental stages onwards, and diagnosis relies entirely on observing the child's behavior and detecting behavioral cues. However, the diagnosis process is time-consuming as it requires long-term behavior observation, and the scarce availability of specialists. We demonstrate the effect of a region-based computer vision system to help clinicians and parents analyze a child's behavior. For this purpose, we adopt and enhance a dataset for analyzing autism-related actions using videos of children captured in uncontrolled environments (e.g. videos collected with consumer-grade cameras, in varied environments). The data is pre-processed by detecting the target child in the video to reduce the impact of background noise. Motivated by the effectiveness of temporal convolutional models, we propose both light-weight and conventional models capable of extracting action features from video frames and classifying autism-related behaviors by analyzing the relationships between frames in a video. By extensively evaluating feature extraction and learning strategies, we demonstrate that the highest performance is attained through the use of an Inflated 3D Convnet and Multi-Stage Temporal Convolutional Network. Our model achieved a Weighted F1-score of 0.83 for the classification of the three autism-related actions. We also propose a light-weight solution by employing the ESNet backbone with the same action recognition model, achieving a competitive 0.71 Weighted F1-score, and enabling potential deployment on embedded systems. Experimental results demonstrate the ability of our proposed models to recognize autism-related actions from videos captured in an uncontrolled environment, and thus can assist clinicians in analyzing ASD.

TMMF: Temporal Multi-Modal Fusion for Single-Stage Continuous Gesture Recognition.

Gammulle, Harshala; Denman, Simon; Sridharan, Sridha; Fookes, Clinton.

IEEE Trans Image Process ; 30: 7689-7701, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34478365

RESUMEN

Gesture recognition is a much studied research area which has myriad real-world applications including robotics and human-machine interaction. Current gesture recognition methods have focused on recognising isolated gestures, and existing continuous gesture recognition methods are limited to two-stage approaches where independent models are required for detection and classification, with the performance of the latter being constrained by detection performance. In contrast, we introduce a single-stage continuous gesture recognition framework, called Temporal Multi-Modal Fusion (TMMF), that can detect and classify multiple gestures in a video via a single model. This approach learns the natural transitions between gestures and non-gestures without the need for a pre-processing segmentation step to detect individual gestures. To achieve this, we introduce a multi-modal fusion mechanism to support the integration of important information that flows from multi-modal inputs, and is scalable to any number of modes. Additionally, we propose Unimodal Feature Mapping (UFM) and Multi-modal Feature Mapping (MFM) models to map uni-modal features and the fused multi-modal features respectively. To further enhance performance, we propose a mid-point based loss function that encourages smooth alignment between the ground truth and the prediction, helping the model to learn natural gesture transitions. We demonstrate the utility of our proposed framework, which can handle variable-length input videos, and outperforms the state-of-the-art on three challenging datasets: EgoGesture, IPN hand and ChaLearn LAP Continuous Gesture Dataset (ConGD). Furthermore, ablation experiments show the importance of different components of the proposed framework.

Asunto(s)

Gestos , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Mano , Humanos

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA