Search | Nursing VHL Search Portal

Multimodal Distillation Pre-training Model for Ultrasound Dynamic Images Annotation.

Chen, Xiaojun; Ke, Jia; Zhang, Yaning; Gou, Jianping; Shen, Anna; Wan, Shaohua.

IEEE J Biomed Health Inform ; PP2024 Aug 05.

Article in English | MEDLINE | ID: mdl-39102331

ABSTRACT

With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc., which has more research bases, ultrasonography is a dynamic medical image similar to video, which is captured and generated by a real-time moving probe, so how to deal with the video data in the medical field and cross modal extraction of the textual semantics in the medical video is a difficult problem that needs to be researched. For this reason, this paper proposes a pre-training model of multimodal distillation and fusion coding for processing the semantic relationship between ultrasound dynamic Images and text. Firstly, by designing the fusion encoder, the visual geometric features of tissues and organs in ultrasound dynamic images, the overall visual appearance descriptive features and the named entity linguistic features are fused to form a unified visual-linguistic feature, so that the model obtains richer visual, linguistic cues aggregation and alignment ability. Then, the pre-training model is augmented by multimodal knowledge distillation to improve the learning ability of the model. The final experimental results on multiple datasets show that the multimodal distillation pre-training model generally improves the fusion ability of various types of features in ultrasound dynamic images, and realizes the automated and accurate annotation of ultrasound dynamic images.

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL