Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound.

Gu, Yue; Zhang, Ruiyu; Zhao, Xinwei; Chen, Shuhong; Abdulbaqi, Jalal; Marsic, Ivan; Cheng, Megan; Burd, Randall S

Gu, Yue; Zhang, Ruiyu; Zhao, Xinwei; Chen, Shuhong; Abdulbaqi, Jalal; Marsic, Ivan; Cheng, Megan; Burd, Randall S.

Afiliación

Gu Y; Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA.
Zhang R; Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA.
Zhao X; Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA.
Chen S; Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA.
Abdulbaqi J; Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA.
Marsic I; Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA.
Cheng M; Trauma and Burn Surgery, Childrens National Medical Center, Washington, DC, USA.
Burd RS; Trauma and Burn Surgery, Childrens National Medical Center, Washington, DC, USA.

IEEE Int Conf Healthc Inform ; 20192019 Jun.

Article en En | MEDLINE | ID: mdl-32201857

ABSTRACT

ABSTRACT

Trauma activity recognition aims to detect, recognize, and predict the activities (or tasks) during a trauma resuscitation. Previous work has mainly focused on using various sensor data including image, RFID, and vital signals to generate the trauma event log. However, spoken language and environmental sound, which contain rich communication and contextual information necessary for trauma team cooperation, are still largely ignored. In this paper, we propose a multimodal attention network (MAN) that uses both verbal transcripts and environmental audio stream as input; the model extracts textual and acoustic features using a multi-level multi-head attention module, and forms a final shared representation for trauma activity classification. We evaluated the proposed architecture on 75 actual trauma resuscitation cases collected from a hospital. We achieved 72.4% accuracy with 0.705 F1 score, demonstrating that our proposed architecture is useful and efficient. These results also show that using spoken language and environmental audio indeed helps identify hard-to-recognize activities, compared to previous approaches. We also provide a detailed analysis of the performance and generalization of the proposed multimodal attention network.

Palabras clave

environmental sound; multimodal attention network; spoken language; trauma activity recognition

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: IEEE Int Conf Healthc Inform Año: 2019 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: IEEE Int Conf Healthc Inform Año: 2019 Tipo del documento: Article