Pesquisa | BVS IEC

A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images.

Aswiga, R V; Shanthi, A P.

J Digit Imaging ; 35(3): 564-580, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35217942

RESUMO

Medical image captioning has been recently attracting the attention of the medical community. Also, generating captions for images involving multiple organs is an even more challenging task. Therefore, any attempt toward such medical image captioning becomes the need of the hour. In recent years, the rapid developments in deep learning approaches have made them an effective option for the analysis of medical images and automatic report generation. But analyzing medical images that are scarce and limited is hard, and it is difficult even with machine learning approaches. The concept of transfer learning can be employed in such applications that suffer from insufficient training data. This paper presents an approach to develop a medical image captioning model based on a deep recurrent architecture that combines Multi Level Transfer Learning (MLTL) framework with a Long Short-Term-Memory (LSTM) model. A basic MLTL framework with three models is designed to detect and classify very limited datasets, using the knowledge acquired from easily available datasets. The first model for the source domain uses the abundantly available non-medical images and learns the generalized features. The acquired knowledge is then transferred to the second model for the intermediate and auxiliary domain, which is related to the target domain. This information is then used for the final target domain, which consists of medical datasets that are very limited in nature. Therefore, the knowledge learned from a non-medical source domain is transferred to improve the learning in the target domain that deals with medical images. Then, a novel LSTM model, which is used for sequence generation and machine translation, is proposed to generate captions for the given medical image from the MLTL framework. To improve the captioning of the target sentence further, an enhanced multi-input Convolutional Neural Network (CNN) model along with feature extraction techniques is proposed. This enhanced multi-input CNN model extracts the most important features of an image that help in generating a more precise and detailed caption of the medical image. Experimental results show that the proposed model performs well with an accuracy of 96.90%, with BLEU score of 76.9%, even with very limited datasets, when compared to the work reported in literature.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Humanos , Idioma , Tomografia Computadorizada por Raios X

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA