Emotional Speech Recognition Using Deep Neural Networks.

Trinh Van, Loan; Dao Thi Le, Thuy; Le Xuan, Thanh; Castelli, Eric

Trinh Van, Loan; Dao Thi Le, Thuy; Le Xuan, Thanh; Castelli, Eric.

Afiliação

Trinh Van L; Faculty of Computer Engineering, School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 10000, Vietnam.
Dao Thi Le T; Department of Software Engineering, Faculty of Information Technology, University of Transport and Communications, Hanoi 10000, Vietnam.
Le Xuan T; Faculty of Computer Engineering, School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 10000, Vietnam.
Castelli E; LIG, CNRS, Grenoble INP, Inria, Université Grenoble Alpes, 38000 Grenoble, France.

Sensors (Basel) ; 22(4)2022 Feb 12.

Article em En | MEDLINE | ID: mdl-35214316

ABSTRACT

ABSTRACT

The expression of emotions in human communication plays a very important role in the information that needs to be conveyed to the partner. The forms of expression of human emotions are very rich. It could be body language, facial expressions, eye contact, laughter, and tone of voice. The languages of the world's peoples are different, but even without understanding a language in communication, people can almost understand part of the message that the other partner wants to convey with emotional expressions as mentioned. Among the forms of human emotional expression, the expression of emotions through voice is perhaps the most studied. This article presents our research on speech emotion recognition using deep neural networks such as CNN, CRNN, and GRU. We used the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus for the study with four emotions anger, happiness, sadness, and neutrality. The feature parameters used for recognition include the Mel spectral coefficients and other parameters related to the spectrum and the intensity of the speech signal. The data augmentation was used by changing the voice and adding white noise. The results show that the GRU model gave the highest average recognition accuracy of 97.47%. This result is superior to existing studies on speech emotion recognition with the IEMOCAP corpus.

Assuntos

Percepção da Fala; Voz; Emoções; Expressão Facial; Humanos; Redes Neurais de Computação; Fala

Palavras-chave

CNN; CRNN; GRU; IEMOCAP; data augmentation; emotion; recognition; speech

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Percepção da Fala / Voz Limite: Humans Idioma: En Revista: Sensors (Basel) Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Vietnã

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google