Pesquisa | BVS Integralidade em Saúde

Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.

Aggarwal, Apeksha; Srivastava, Akshat; Agarwal, Ajay; Chahal, Nidhi; Singh, Dilbag; Alnuaim, Abeer Ali; Alhadlaq, Aseel; Lee, Heung-No.

Sensors (Basel) ; 22(6)2022 Mar 19.

Artigo em Inglês | MEDLINE | ID: mdl-35336548

RESUMO

Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.

Assuntos

Aprendizado Profundo , Fala , Algoritmos , Emoções , Humanos , Redes Neurais de Computação

Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier.

Alnuaim, Abeer Ali; Zakariah, Mohammed; Shukla, Prashant Kumar; Alhadlaq, Aseel; Hatamleh, Wesam Atef; Tarazi, Hussam; Sureshbabu, R; Ratna, Rajnish.

J Healthc Eng ; 2022: 6005446, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35388315

RESUMO

Human-computer interaction (HCI) has seen a paradigm shift from textual or display-based control toward more intuitive control modalities such as voice, gesture, and mimicry. Particularly, speech has a great deal of information, conveying information about the speaker's inner condition and his/her aim and desire. While word analysis enables the speaker's request to be understood, other speech features disclose the speaker's mood, purpose, and motive. As a result, emotion recognition from speech has become critical in current human-computer interaction systems. Moreover, the findings of the several professions involved in emotion recognition are difficult to combine. Many sound analysis methods have been developed in the past. However, it was not possible to provide an emotional analysis of people in a live speech. Today, the development of artificial intelligence and the high performance of deep learning methods bring studies on live data to the fore. This study aims to detect emotions in the human voice using artificial intelligence methods. One of the most important requirements of artificial intelligence works is data. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) open-source dataset was used in the study. The RAVDESS dataset contains more than 2000 data recorded as speeches and songs by 24 actors. Data were collected for eight different moods from the actors. It was aimed at detecting eight different emotion classes, including neutral, calm, happy, sad, angry, fearful, disgusted, and surprised moods. The multilayer perceptron (MLP) classifier, a widely used supervised learning algorithm, was preferred for classification. The proposed model's performance was compared with that of similar studies, and the results were evaluated. An overall accuracy of 81% was obtained for classifying eight different emotions by using the proposed model on the RAVDESS dataset.

Assuntos

Inteligência Artificial , Fala , Computadores , Emoções , Feminino , Humanos , Masculino , Redes Neurais de Computação

Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks.

Alnuaim, Abeer Ali; Zakariah, Mohammed; Alhadlaq, Aseel; Shashidhar, Chitra; Hatamleh, Wesam Atef; Tarazi, Hussam; Shukla, Prashant Kumar; Ratna, Rajnish.

Comput Intell Neurosci ; 2022: 7463091, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35401731

RESUMO

Emotions play an essential role in human relationships, and many real-time applications rely on interpreting the speaker's emotion from their words. Speech emotion recognition (SER) modules aid human-computer interface (HCI) applications, but they are challenging to implement because of the lack of balanced data for training and clarity about which features are sufficient for categorization. This research discusses the impact of the classification approach, identifying the most appropriate combination of features and data augmentation on speech emotion detection accuracy. Selection of the correct combination of handcrafted features with the classifier plays an integral part in reducing computation complexity. The suggested classification model, a 1D convolutional neural network (1D CNN), outperforms traditional machine learning approaches in classification. Unlike most earlier studies, which examined emotions primarily through a single language lens, our analysis looks at numerous language data sets. With the most discriminating features and data augmentation, our technique achieves 97.09%, 96.44%, and 83.33% accuracy for the BAVED, ANAD, and SAVEE data sets, respectively.

Assuntos

Redes Neurais de Computação , Fala , Computadores , Emoções , Humanos , Idioma

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa