Pesquisa | Portal Regional da BVS

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.

Toyoshima, Itsuki; Okada, Yoshifumi; Ishimaru, Momoko; Uchiyama, Ryunosuke; Tada, Mayu.

Sensors (Basel) ; 23(3)2023 Feb 03.

Artigo em Inglês | MEDLINE | ID: mdl-36772782

RESUMO

The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audio features. On the other hand, GeMAPS can handle multiple audio features but fails to provide information on their time-series variations. Thus, this study proposes a speech emotion recognition model based on a multi-input deep neural network that simultaneously learns these two audio features. The proposed model comprises three parts, specifically, for learning MelSpec in image format, learning GeMAPS in vector format, and integrating them to predict the emotion. Additionally, a focal loss function is introduced to address the imbalanced data problem among the emotion classes. The results of the recognition experiments demonstrate weighted and unweighted accuracies of 0.6657 and 0.6149, respectively, which are higher than or comparable to those of the existing state-of-the-art methods. Overall, the proposed model significantly improves the recognition accuracy of the emotion "happiness", which has been difficult to identify in previous studies owing to limited data. Therefore, the proposed model can effectively recognize emotions from speech and can be applied for practical purposes with future development.

Assuntos

Emoções , Fala , Redes Neurais de Computação , Percepção , Acústica

A New Regression Model for Depression Severity Prediction Based on Correlation among Audio Features Using a Graph Convolutional Neural Network.

Ishimaru, Momoko; Okada, Yoshifumi; Uchiyama, Ryunosuke; Horiguchi, Ryo; Toyoshima, Itsuki.

Diagnostics (Basel) ; 13(4)2023 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-36832211

RESUMO

Recent studies have revealed mutually correlated audio features in the voices of depressed patients. Thus, the voices of these patients can be characterized based on the combinatorial relationships among the audio features. To date, many deep learning-based methods have been proposed to predict the depression severity using audio data. However, existing methods have assumed that the individual audio features are independent. Hence, in this paper, we propose a new deep learning-based regression model that allows for the prediction of depression severity on the basis of the correlation among audio features. The proposed model was developed using a graph convolutional neural network. This model trains the voice characteristics using graph-structured data generated to express the correlation among audio features. We conducted prediction experiments on depression severity using the DAIC-WOZ dataset employed in several previous studies. The experimental results showed that the proposed model achieved a root mean square error (RMSE) of 2.15, a mean absolute error (MAE) of 1.25, and a symmetric mean absolute percentage error of 50.96%. Notably, RMSE and MAE significantly outperformed the existing state-of-the-art prediction methods. From these results, we conclude that the proposed model can be a promising tool for depression diagnosis.

Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network.

Ishimaru, Momoko; Okada, Yoshifumi; Uchiyama, Ryunosuke; Horiguchi, Ryo; Toyoshima, Itsuki.

Int J Environ Res Public Health ; 20(2)2023 01 15.

Artigo em Inglês | MEDLINE | ID: mdl-36674342

RESUMO

Audio features are physical features that reflect single or complex coordinated movements in the vocal organs. Hence, in speech-based automatic depression classification, it is critical to consider the relationship among audio features. Here, we propose a deep learning-based classification model for discriminating depression and its severity using correlation among audio features. This model represents the correlation between audio features as graph structures and learns speech characteristics using a graph convolutional neural network. We conducted classification experiments in which the same subjects were allowed to be included in both the training and test data (Setting 1) and the subjects in the training and test data were completely separated (Setting 2). The results showed that the classification accuracy in Setting 1 significantly outperformed existing state-of-the-art methods, whereas that in Setting 2, which has not been presented in existing studies, was much lower than in Setting 1. We conclude that the proposed model is an effective tool for discriminating recurring patients and their severities, but it is difficult to detect new depressed patients. For practical application of the model, depression-specific speech regions appearing locally rather than the entire speech of depressed patients should be detected and assigned the appropriate class labels.

Assuntos

Depressão , Redes Neurais de Computação , Humanos , Depressão/diagnóstico , Fala

End-to-End Convolutional Neural Network Model to Detect and Localize Myocardial Infarction Using 12-Lead ECG Images without Preprocessing.

Uchiyama, Ryunosuke; Okada, Yoshifumi; Kakizaki, Ryuya; Tomioka, Sekito.

Bioengineering (Basel) ; 9(9)2022 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-36134976

RESUMO

In recent years, many studies have proposed automatic detection and localization techniques for myocardial infarction (MI) using the 12-lead electrocardiogram (ECG). Most of them applied preprocessing to the ECG signals, e.g., noise removal, trend removal, beat segmentation, and feature selection, followed by model construction and classification based on machine-learning algorithms. The selection and implementation of preprocessing methods require specialized knowledge and experience to handle ECG data. In this paper, we propose an end-to-end convolutional neural network model that detects and localizes MI without such complicated multistep preprocessing. The proposed model executes comprehensive learning for the waveform features of unpreprocessed raw ECG images captured from 12-lead ECG signals. We evaluated the classification performance of the proposed model in two experimental settings: ten-fold cross-validation where ECG images were split randomly, and two-fold cross-validation where ECG images were split into one patient and the other patients. The experimental results demonstrate that the proposed model obtained MI detection accuracies of 99.82% and 93.93% and MI localization accuracies of 99.28% and 69.27% in the first and second settings, respectively. The performance of the proposed method is higher than or comparable to that of existing state-of-the-art methods. Thus, the proposed model is expected to be an effective MI diagnosis tool that can be used in intensive care units and as wearable technology.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA