Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text Data.

Zhang, Zhenwei; Zhang, Shengming; Ni, Dong; Wei, Zhaoguo; Yang, Kongjun; Jin, Shan; Huang, Gan; Liang, Zhen; Zhang, Li; Li, Linling; Ding, Huijun; Zhang, Zhiguo; Wang, Jianhong

Zhang, Zhenwei; Zhang, Shengming; Ni, Dong; Wei, Zhaoguo; Yang, Kongjun; Jin, Shan; Huang, Gan; Liang, Zhen; Zhang, Li; Li, Linling; Ding, Huijun; Zhang, Zhiguo; Wang, Jianhong.

Afiliación

Zhang Z; School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China.
Zhang S; Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China.
Ni D; Affiliated Mental Health Center, Southern University of Science and Technology, Shenzhen 518055, China.
Wei Z; School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China.
Yang K; Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China.
Jin S; Shenzhen Kangning Hospital, Shenzhen 518020, China.
Huang G; Shenzhen Mental Health Center, Shenzhen 518020, China.
Liang Z; Shenzhen Kangning Hospital, Shenzhen 518020, China.
Zhang L; Shenzhen Mental Health Center, Shenzhen 518020, China.
Li L; Shenzhen Kangning Hospital, Shenzhen 518020, China.
Ding H; Shenzhen Mental Health Center, Shenzhen 518020, China.
Zhang Z; School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China.
Wang J; Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, Shenzhen 518060, China.

Sensors (Basel) ; 24(12)2024 Jun 07.

Article en En | MEDLINE | ID: mdl-38931497

ABSTRACT

ABSTRACT

Depression is a major psychological disorder with a growing impact worldwide. Traditional methods for detecting the risk of depression, predominantly reliant on psychiatric evaluations and self-assessment questionnaires, are often criticized for their inefficiency and lack of objectivity. Advancements in deep learning have paved the way for innovations in depression risk detection methods that fuse multimodal data. This paper introduces a novel framework, the Audio, Video, and Text Fusion-Three Branch Network (AVTF-TBN), designed to amalgamate auditory, visual, and textual cues for a comprehensive analysis of depression risk. Our approach encompasses three dedicated branches-Audio Branch, Video Branch, and Text Branch-each responsible for extracting salient features from the corresponding modality. These features are subsequently fused through a multimodal fusion (MMF) module, yielding a robust feature vector that feeds into a predictive modeling layer. To further our research, we devised an emotion elicitation paradigm based on two distinct tasks-reading and interviewing-implemented to gather a rich, sensor-based depression risk detection dataset. The sensory equipment, such as cameras, captures subtle facial expressions and vocal characteristics essential for our analysis. The research thoroughly investigates the data generated by varying emotional stimuli and evaluates the contribution of different tasks to emotion evocation. During the experiment, the AVTF-TBN model has the best performance when the data from the two tasks are simultaneously used for detection, where the F1 Score is 0.78, Precision is 0.76, and Recall is 0.81. Our experimental results confirm the validity of the paradigm and demonstrate the efficacy of the AVTF-TBN model in detecting depression risk, showcasing the crucial role of sensor-based data in mental health detection.

Asunto(s)

Depresión; Imagen Multimodal; Depresión/diagnóstico; Imagen Multimodal/instrumentación; Imagen Multimodal/métodos; Factores de Riesgo; Envío de Mensajes de Texto; Grabación en Video; Grabaciones de Sonido; Humanos; Masculino; Femenino; Adulto Joven; Adulto; Persona de Mediana Edad; Conjuntos de Datos como Asunto; Emociones; Expresión Facial

Palabras clave

depression risk detection; emotion elicitation paradigm; multimodal data

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Depresión / Imagen Multimodal Límite: Adult / Female / Humans / Male / Middle aged Idioma: En Revista: Sensors (Basel) Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google