Búsqueda | Portal Regional de la BVS

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets.

Noh, Kyoung Ju; Jeong, Chi Yoon; Lim, Jiyoun; Chung, Seungeun; Kim, Gague; Lim, Jeong Mook; Jeong, Hyuntae.

Sensors (Basel) ; 21(5)2021 Feb 24.

Artículo en Inglés | MEDLINE | ID: mdl-33668254

RESUMEN

Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.

Asunto(s)

Bases de Datos Factuales , Emociones/clasificación , Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas , Habla , Humanos

Sensor Data Acquisition and Multimodal Sensor Fusion for Human Activity Recognition Using Deep Learning.

Chung, Seungeun; Lim, Jiyoun; Noh, Kyoung Ju; Kim, Gague; Jeong, Hyuntae.

Sensors (Basel) ; 19(7)2019 Apr 10.

Artículo en Inglés | MEDLINE | ID: mdl-30974845

RESUMEN

In this paper, we perform a systematic study about the on-body sensor positioning and data acquisition details for Human Activity Recognition (HAR) systems. We build a testbed that consists of eight body-worn Inertial Measurement Units (IMU) sensors and an Android mobile device for activity data collection. We develop a Long Short-Term Memory (LSTM) network framework to support training of a deep learning model on human activity data, which is acquired in both real-world and controlled environments. From the experiment results, we identify that activity data with sampling rate as low as 10 Hz from four sensors at both sides of wrists, right ankle, and waist is sufficient in recognizing Activities of Daily Living (ADLs) including eating and driving activity. We adopt a two-level ensemble model to combine class-probabilities of multiple sensor modalities, and demonstrate that a classifier-level sensor fusion technique can improve the classification performance. By analyzing the accuracy of each sensor on different types of activity, we elaborate custom weights for multimodal sensor fusion that reflect the characteristic of individual activities.

Asunto(s)

Técnicas Biosensibles , Actividades Humanas , Monitoreo Fisiológico/instrumentación , Dispositivos Electrónicos Vestibles , Actividades Cotidianas , Algoritmos , Conducción de Automóvil , Aprendizaje Profundo , Humanos , Imagen Multimodal/métodos , Posición de Pie , Caminata/fisiología

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA