Your browser doesn't support javascript.
loading
Distilling Knowledge From an Ensemble of Vision Transformers for Improved Classification of Breast Ultrasound.
Zhou, George; Mosadegh, Bobak.
Afiliação
  • Zhou G; Weill Cornell Medicine, New York, NY 10021. Electronic address: gez4001@med.cornell.edu.
  • Mosadegh B; Dalio Institute of Cardiovascular Imaging, Department of Radiology, Weill Cornell Medicine, New York, New York.
Acad Radiol ; 31(1): 104-120, 2024 Jan.
Article em En | MEDLINE | ID: mdl-37666747
ABSTRACT
RATIONALE AND

OBJECTIVES:

To develop a deep learning model for the automated classification of breast ultrasound images as benign or malignant. More specifically, the application of vision transformers, ensemble learning, and knowledge distillation is explored for breast ultrasound classification. MATERIALS AND

METHODS:

Single view, B-mode ultrasound images were curated from the publicly available Breast Ultrasound Image (BUSI) dataset, which has categorical ground truth labels (benign vs malignant) assigned by radiologists and malignant cases confirmed by biopsy. The performance of vision transformers (ViT) is compared to convolutional neural networks (CNN), followed by a comparison between supervised, self-supervised, and randomly initialized ViT. Subsequently, the ensemble of 10 independently trained ViT, where the ensemble model is the unweighted average of the output of each individual model is compared to the performance of each ViT alone. Finally, we train a single ViT to emulate the ensembled ViT using knowledge distillation.

RESULTS:

On this dataset that was trained using five-fold cross validation, ViT outperforms CNN, while self-supervised ViT outperform supervised and randomly initialized ViT. The ensemble model achieves an area under the receiver operating characteristics curve (AuROC) and area under the precision recall curve (AuPRC) of 0.977 and 0.965 on the test set, outperforming the average AuROC and AuPRC of the independently trained ViTs (0.958 ± 0.05 and 0.931 ± 0.016). The distilled ViT achieves an AuROC and AuPRC of 0.972 and 0.960.

CONCLUSION:

Both transfer learning and ensemble learning can each offer increased performance independently and can be sequentially combined to collectively improve the performance of the final model. Furthermore, a single vision transformer can be trained to match the performance of an ensemble of a set of vision transformers using knowledge distillation.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Ultrassonografia Mamária / Redes Neurais de Computação Tipo de estudo: Prognostic_studies Limite: Female / Humans Idioma: En Revista: Acad Radiol Assunto da revista: RADIOLOGIA Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Ultrassonografia Mamária / Redes Neurais de Computação Tipo de estudo: Prognostic_studies Limite: Female / Humans Idioma: En Revista: Acad Radiol Assunto da revista: RADIOLOGIA Ano de publicação: 2024 Tipo de documento: Article