Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models.
Sensors (Basel)
; 21(7)2021 Mar 27.
Article
em En
| MEDLINE
| ID: mdl-33801739
Emotion recognition plays an important role in human-computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with "Conv2D+LSTM+3DCNN+Classify" architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Conscientização
/
Emoções
Limite:
Humans
Idioma:
En
Revista:
Sensors (Basel)
Ano de publicação:
2021
Tipo de documento:
Article