Multimodal fall detection for solitary individuals based on audio-video decision fusion processing.

Jiao, Shiqin; Li, Guoqi; Zhang, Guiyang; Zhou, Jiahao; Li, Jihong

Jiao, Shiqin; Li, Guoqi; Zhang, Guiyang; Zhou, Jiahao; Li, Jihong.

Afiliação

Jiao S; School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China.
Li G; School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China.
Zhang G; School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China.
Zhou J; Jinan Thomas School, Jinan, Shandong 250102, China.
Li J; School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China.

Heliyon ; 10(8): e29596, 2024 Apr 30.

Article em En | MEDLINE | ID: mdl-38681632

ABSTRACT

ABSTRACT

Falls often pose significant safety risks to solitary individuals, especially the elderly. Implementing a fast and efficient fall detection system is an effective strategy to address this hidden danger. We propose a multimodal method based on audio and video. On the basis of using non-intrusive equipment, it reduces to a certain extent the false negative situation that the most commonly used video-based methods may face due to insufficient lighting conditions, exceeding the monitoring range, etc. Therefore, in the foreseeable future, methods based on audio and video fusion are expected to become the best solution for fall detection. Specifically, this article outlines the following

methodology:

the video-based model utilizes YOLOv7-Pose to extract key skeleton joints, which are then fed into a two stream Spatial Temporal Graph Convolutional Network (ST-GCN) for classification. Meanwhile, the audio-based model employs log-scaled mel spectrograms to capture different features, which are processed through the MobileNetV2 architecture for detection. The final decision fusion of the two results is achieved through linear weighting and Dempster-Shafer (D-S) theory. After evaluation, our multimodal fall detection method significantly outperforms the single modality method, especially the evaluation metric sensitivity increased from 81.67% in single video modality to 96.67% (linear weighting) and 97.50% (D-S theory), which emphasizing the effectiveness of integrating video and audio data to achieve more powerful and reliable fall detection in complex and diverse daily life environments.

Palavras-chave

Audio-video fusion; Fall detection; Multimodal analysis; Solitary individuals

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links