A self-supervised spatio-temporal attention network for video-based 3D infant pose estimation.

Yin, Wang; Chen, Linxi; Huang, Xinrui; Huang, Chunling; Wang, Zhaohong; Bian, Yang; Wan, You; Zhou, Yuan; Han, Tongyan; Yi, Ming

Yin, Wang; Chen, Linxi; Huang, Xinrui; Huang, Chunling; Wang, Zhaohong; Bian, Yang; Wan, You; Zhou, Yuan; Han, Tongyan; Yi, Ming.

Affiliation

Yin W; Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing 100191, China; Neuroscience Research Institute, Peking University and Key Laboratory for Neuroscience, Ministry of Education/National Health Commission, Beijing 100083, China.
Chen L; Peking University Cancer Hospital & Institute, Beijing 100142, China.
Huang X; Department of Biochemistry and Biophysics, School of Basic Medical Sciences, Peking University, Beijing 100191, China.
Huang C; Peking University Third Hospital, Beijing 100191, China.
Wang Z; Peking University Third Hospital, Beijing 100191, China.
Bian Y; Peking University First Hospital, Beijing 100034, China.
Wan Y; Neuroscience Research Institute, Peking University and Key Laboratory for Neuroscience, Ministry of Education/National Health Commission, Beijing 100083, China.
Zhou Y; Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing 100191, China.
Han T; Department of Pediatrics, Peking University Third Hospital, Beijing 100191, China. Electronic address: tongyanhan@bjmu.edu.cn.
Yi M; Neuroscience Research Institute, Peking University and Key Laboratory for Neuroscience, Ministry of Education/National Health Commission, Beijing 100083, China. Electronic address: mingyi@hsc.pku.edu.cn.

Med Image Anal ; 96: 103208, 2024 Aug.

Article in En | MEDLINE | ID: mdl-38788327

ABSTRACT

ABSTRACT

General movement and pose assessment of infants is crucial for the early detection of cerebral palsy (CP). Nevertheless, most human pose estimation methods, in 2D or 3D, focus on adults due to the lack of large datasets and pose annotations on infants. To solve these problems, here we present a model known as YOLO-infantPose, which has been fine-tuned, for infant pose estimation in 2D. We further propose a self-supervised model called STAPose3D for 3D infant pose estimation based on videos. We employ multi-view video data during the training process as a strategy to address the challenge posed by the absence of 3D pose annotations. STAPose3D combines temporal convolution, temporal attention, and graph attention to jointly learn spatio-temporal features of infant pose. Our methods are summarized into two stages applying YOLO-infantPose on input videos, followed by lifting these 2D poses along with respective confidences for every joint to 3D. The employment of the best-performing 2D detector in the first stage significantly improves the precision of 3D pose estimation. We reveal that fine-tuned YOLO-infantPose outperforms other models tested on our clinical dataset as well as two public datasets MINI-RGBD and YouTube-Infant dataset. Results from our infant movement video dataset demonstrate that STAPose3D effectively comprehends the spatio-temporal features among different views and significantly improves the performance of 3D infant pose estimation in videos. Finally, we explore the clinical application of our method for general movement assessment (GMA) in a clinical dataset annotated as normal writhing movements or abnormal monotonic movements according to the GMA standards. We show that the 3D pose estimation results produced by our STAPose3D model significantly boost the GMA prediction performance than 2D pose estimation. Our code is available at github.com/wwYinYin/STAPose3D.

Subject(s)

Imaging, Three-Dimensional; Posture; Video Recording; Humans; Infant; Imaging, Three-Dimensional/methods; Posture/physiology; Cerebral Palsy/diagnostic imaging; Cerebral Palsy/physiopathology; Algorithms; Supervised Machine Learning

Key words

General movement assessment; Infant pose estimation; Multi-view videos; Self-supervision

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Posture / Video Recording / Imaging, Three-Dimensional Limits: Humans / Infant Language: En Journal: Med Image Anal Journal subject: DIAGNOSTICO POR IMAGEM Year: 2024 Type: Article Affiliation country: China

Fulltext

XML

PubMed Links

Search on Google