Búsqueda | Portal Regional de la BVS

Action Unit Models of Facial Expression of Emotion in the Presence of Speech.

Shah, Miraj; Cooper, David G; Cao, Houwei; Gur, Ruben C; Nenkova, Ani; Verma, Ragini.

Int Conf Affect Comput Intell Interact Workshops ; 2013: 49-54, 2013 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-25525561

RESUMEN

Automatic recognition of emotion using facial expressions in the presence of speech poses a unique challenge because talking reveals clues for the affective state of the speaker but distorts the canonical expression of emotion on the face. We introduce a corpus of acted emotion expression where speech is either present (talking) or absent (silent). The corpus is uniquely suited for analysis of the interplay between the two conditions. We use a multimodal decision level fusion classifier to combine models of emotion from talking and silent faces as well as from audio to recognize five basic emotions: anger, disgust, fear, happy and sad. Our results strongly indicate that emotion prediction in the presence of speech from action unit facial features is less accurate when the person is talking. Modeling talking and silent expressions separately and fusing the two models greatly improves accuracy of prediction in the talking setting. The advantages are most pronounced when silent and talking face models are fused with predictions from audio features. In this multi-modal prediction both the combination of modalities and the separate models of talking and silent facial expression of emotion contribute to the improvement.

Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering.

Savran, Arman; Cao, Houwei; Shah, Miraj; Nenkova, Ani; Verma, Ragini.

Proc ACM Int Conf Multimodal Interact ; 2012: 485-492, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-25300451

RESUMEN

We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively.

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA