A novel transformer autoencoder for multi-modal emotion recognition with incomplete data.

Cheng, Cheng; Liu, Wenzhe; Fan, Zhaoxin; Feng, Lin; Jia, Ziyu

Cheng, Cheng; Liu, Wenzhe; Fan, Zhaoxin; Feng, Lin; Jia, Ziyu.

Afiliação

Cheng C; Department of Computer Science and Technology, Dalian University of Technology, Dalian, China.
Liu W; School of Information Engineering, Huzhou University, Huzhou, China.
Fan Z; Renmin University of China, Psyche AI Inc, Beijing, China.
Feng L; Department of Computer Science and Technology, Dalian University of Technology, Dalian, China. Electronic address: fenglin@dlut.edu.cn.
Jia Z; Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China.

Neural Netw ; 172: 106111, 2024 Apr.

Article em En | MEDLINE | ID: mdl-38237444

ABSTRACT

ABSTRACT

Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.

Assuntos

Emoções; Aprendizagem; Reconhecimento Psicológico

Palavras-chave

Convolutional encoder; Emotion recognition; Incomplete data; Multi-modal signals; Transformer autoencoder

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Emoções / Aprendizagem Idioma: En Revista: Neural Netw Assunto da revista: NEUROLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google