RESUMEN
Sensor- orientation is a critical aspect in a Human Activity Recognition (HAR) system based on tri-axial signals (such as accelerations); different sensors orientations introduce important errors in the activity recognition process. This paper proposes a new preprocessing module to reduce the negative impact of sensor-orientation variability in HAR. Firstly, this module estimates a consistent reference system; then, the tri-axial signals recorded from sensors with different orientations are transformed into this consistent reference system. This new preprocessing has been evaluated to mitigate the effect of different sensor orientations on the classification accuracy in several state-of-the-art HAR systems. The experiments were carried out using a subject-wise cross-validation methodology over six different datasets, including movements and postures. This new preprocessing module provided robust HAR performance even when sudden sensor orientation changes were included during data collection in the six different datasets. As an example, for the WISDM dataset, sensors with different orientations provoked a significant reduction in the classification accuracy of the state-of-the-art system (from 91.57 ± 0.23% to 89.19 ± 0.26%). This important reduction was recovered with the proposed algorithm, increasing the accuracy to 91.46 ± 0.30%, i.e., the same result obtained when all sensors had the same orientation.
Asunto(s)
Algoritmos , Actividades Humanas , Humanos , Aceleración , Movimiento , PosturaRESUMEN
This paper proposes, analyzes, and evaluates a deep learning architecture based on transformers for generating sign language motion from sign phonemes (represented using HamNoSys: a notation system developed at the University of Hamburg). The sign phonemes provide information about sign characteristics like hand configuration, localization, or movements. The use of sign phonemes is crucial for generating sign motion with a high level of details (including finger extensions and flexions). The transformer-based approach also includes a stop detection module for predicting the end of the generation process. Both aspects, motion generation and stop detection, are evaluated in detail. For motion generation, the dynamic time warping distance is used to compute the similarity between two landmarks sequences (ground truth and generated). The stop detection module is evaluated considering detection accuracy and ROC (receiver operating characteristic) curves. The paper proposes and evaluates several strategies to obtain the system configuration with the best performance. These strategies include different padding strategies, interpolation approaches, and data augmentation techniques. The best configuration of a fully automatic system obtains an average DTW distance per frame of 0.1057 and an area under the ROC curve (AUC) higher than 0.94.
Asunto(s)
Algoritmos , Lengua de Signos , Humanos , Movimiento (Física) , Movimiento , ManoRESUMEN
The Y Balance Test (YBT) is a dynamic balance assessment typically used in sports medicine. This work proposes a deep learning approach to automatically score this YBT by estimating the normalized reach distance (NRD) using a wearable sensor to register inertial signals during the movement. This paper evaluates several signal processing techniques to extract relevant information to feed the deep neural network. This evaluation was performed using a state-of-the-art human activity recognition system based on recurrent neural networks (RNNs). This deep neural network includes long short-term memory (LSTM) layers to learn features from time series by modeling temporal patterns and an additional fully connected layer to estimate the NRD (normalized by the leg length). All analyses were carried out using a dataset with YBT assessments from 407 subjects, including young and middle-aged volunteers and athletes from different sports. This dataset allowed developing a global and robust solution for scoring the YBT in a wide range of applications. The experimentation setup considered a 10-fold subject-wise cross-validation using training, validation, and testing subsets. The mean absolute percentage error (MAPE) obtained was 7.88 ± 0.20%. Moreover, this work proposes specific regression systems to estimate the NRD for each direction separately, obtaining an average MAPE of 7.33 ± 0.26%. This deep learning approach was compared to a previous work using dynamic time warping and k-NN algorithms, obtaining a relative MAPE reduction of 10%.
Asunto(s)
Aprendizaje Profundo , Algoritmos , Humanos , Persona de Mediana Edad , Movimiento , Redes Neurales de la Computación , Procesamiento de Señales Asistido por ComputadorRESUMEN
This paper presents Multi-view Leap2 Hand Pose Dataset (ML2HP Dataset), a new dataset for hand pose recognition, captured using a multi-view recording setup with two Leap Motion Controller 2 devices. This dataset encompasses a diverse range of hand poses, recorded from different angles to ensure comprehensive coverage. The dataset includes real images with the associated precise and automatic hand properties, such as landmark coordinates, velocities, orientations, and finger widths. This dataset has been meticulously designed and curated to maintain a balance in terms of subjects, hand poses, and the usage of right or left hand, ensuring fairness and parity. The content includes 714,000 instances from 21 subjects of 17 different hand poses (including real images and 247 associated hand properties). The multi-view setup is necessary to mitigate hand occlusion phenomena, ensuring continuous tracking and pose estimation required in real human-computer interaction applications. This dataset contributes to advancing the field of multimodal hand pose recognition by providing a valuable resource for developing advanced artificial intelligence human computer interfaces.
Asunto(s)
Mano , Humanos , Mano/fisiología , Inteligencia Artificial , PosturaRESUMEN
This paper introduces Art_GenEvalGPT, a novel dataset of synthetic dialogues centered on art generated through ChatGPT. Unlike existing datasets focused on conventional art-related tasks, Art_GenEvalGPT delves into nuanced conversations about art, encompassing a wide variety of artworks, artists, and genres, and incorporating emotional interventions, integrating speakers' subjective opinions and different roles for the conversational agents (e.g., teacher-student, expert guide, anthropic behavior or handling toxic users). Generation and evaluation stages of GenEvalGPT platform are used to create the dataset, which includes 13,870 synthetic dialogues, covering 799 distinct artworks, 378 different artists, and 26 art styles. Automatic and manual assessment proof the high quality of the synthetic dialogues generated. For the profile recovery, promising lexical and semantic metrics for objective and factual attributes are offered. For subjective attributes, the evaluation for detecting emotions or subjectivity in the interventions achieves 92% of accuracy using LLM-self assessment metrics.
RESUMEN
Several sign language datasets are available in the literature. Most of them are designed for sign language recognition and translation. This paper presents a new sign language dataset for automatic motion generation. This dataset includes phonemes for each sign (specified in HamNoSys, a transcription system developed at the University of Hamburg, Hamburg, Germany) and the corresponding motion information. The motion information includes sign videos and the sequence of extracted landmarks associated with relevant points of the skeleton (including face, arms, hands, and fingers). The dataset includes signs from three different subjects in three different positions, performing 754 signs including the entire alphabet, numbers from 0 to 100, numbers for hour specification, months, and weekdays, and the most frequent signs used in Spanish Sign Language (LSE). In total, there are 6786 videos and their corresponding phonemes (HamNoSys annotations). From each video, a sequence of landmarks was extracted using MediaPipe. The dataset allows training an automatic system for motion generation from sign language phonemes. This paper also presents preliminary results in motion generation from sign phonemes obtaining a Dynamic Time Warping distance per frame of 0.37.
RESUMEN
This paper describes the analysis of a deep neural network for the classification of epileptic EEG signals. The deep learning architecture is made up of two convolutional layers for feature extraction and three fully-connected layers for classification. We evaluated several EEG signal transforms for generating the inputs to the deep neural network: Fourier, wavelet and empirical mode decomposition. This analysis was carried out using two public datasets (Bern-Barcelona EEG and Epileptic Seizure Recognition datasets) obtaining significant improvements in accuracy. For the Bern-Barcelona EEG, we obtained an increase in accuracy from 92.3% to 98.9% when classifying between focal and non-focal signals using the empirical mode decomposition. For the Epileptic Seizure Recognition dataset, we evaluated several scenarios for seizure detection obtaining the best results when using the Fourier transform. The accuracy increased from 99.0% to 99.5% for classifying non-seizure vs. seizure recordings, from 91.7% to 96.5% when differentiating between healthy, non-focal and seizure recordings, and from 89.0% to 95.7% when considering healthy, focal and seizure recordings.