ABSTRACT
While machine learning is currently transforming the field of histopathology, the domain lacks a comprehensive evaluation of state-of-the-art models based on essential but complementary quality requirements beyond a mere classification accuracy. In order to fill this gap, we developed a new methodology to extensively evaluate a wide range of classification models, including recent vision transformers, and convolutional neural networks such as: ConvNeXt, ResNet (BiT), Inception, ViT and Swin transformer, with and without supervised or self-supervised pretraining. We thoroughly tested the models on five widely used histopathology datasets containing whole slide images of breast, gastric, and colorectal cancer and developed a novel approach using an image-to-image translation model to assess the robustness of a cancer classification model against stain variations. Further, we extended existing interpretability methods to previously unstudied models and systematically reveal insights of the models' classification strategies that allow for plausibility checks and systematic comparisons. The study resulted in specific model recommendations for practitioners as well as putting forward a general methodology to quantify a model's quality according to complementary requirements that can be transferred to future model architectures.
Subject(s)
Deep Learning , Humans , Neural Networks, Computer , Machine Learning , BreastABSTRACT
Machine learning is transforming the field of histopathology. Especially in classification related tasks, there have been many successful applications of deep learning already. Yet, in tasks that rely on regression and many niche applications, the domain lacks cohesive procedures that are adapted to the learning processes of neural networks. In this work, we investigate cell damage in whole slide images of the epidermis. A common way for pathologists to annotate a score, characterizing the degree of damage for these samples, is the ratio between healthy and unhealthy nuclei. The annotation procedure of these scores, however, is expensive and prone to be noisy among pathologists. We propose a new measure of damage, that is the total area of damage, relative to the total area of the epidermis. In this work, we present results of regression and segmentation models, predicting both scores on a curated and public dataset. We have acquired the dataset in collaborative efforts with medical professionals. Our study resulted in a comprehensive evaluation of the proposed damage metrics in the epidermis, with recommendations, emphasizing practical relevance for real world applications.