Masked pre-training of transformers for histology image analysis.

Jiang, Shuai; Hondelink, Liesbeth; Suriawinata, Arief A; Hassanpour, Saeed

Jiang, Shuai; Hondelink, Liesbeth; Suriawinata, Arief A; Hassanpour, Saeed.

Afiliação

Jiang S; Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.
Hondelink L; Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.
Suriawinata AA; Department of Pathology and Laboratory Medicine, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756, USA.
Hassanpour S; Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.

J Pathol Inform ; 15: 100386, 2024 Dec.

Article em En | MEDLINE | ID: mdl-39006998

ABSTRACT

ABSTRACT

In digital pathology, whole-slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Vision transformer (ViT) models have recently emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. However, due to the large number of model parameters and limited labeled data, applying transformer models to WSIs remains challenging. In this study, we propose a pretext task to train the transformer model in a self-supervised manner. Our model, MaskHIT, uses the transformer output to reconstruct masked patches, measured by contrastive loss. We pre-trained MaskHIT model using over 7000 WSIs from TCGA and extensively evaluated its performance in multiple experiments, covering survival prediction, cancer subtype classification, and grade prediction tasks. Our experiments demonstrate that the pre-training procedure enables context-aware understanding of WSIs, facilitates the learning of representative histological features based on patch positions and visual patterns, and is essential for the ViT model to achieve optimal results on WSI-level tasks. The pre-trained MaskHIT surpasses various multiple instance learning approaches by 3% and 2% on survival prediction and cancer subtype classification tasks, and also outperforms recent state-of-the-art transformer-based methods. Finally, a comparison between the attention maps generated by the MaskHIT model with pathologist's annotations indicates that the model can accurately identify clinically relevant histological structures on the whole slide for each task.

Palavras-chave

Deep learning; Digital pathology; Self-supervised learning; Transformer

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos