RESUMO
Batch effects (BEs) refer to systematic technical differences in data collection unrelated to biological variations whose noise is shown to negatively impact machine learning (ML) model generalizability. Here we release CohortFinder (http://cohortfinder.com), an open-source tool aimed at mitigating BEs via data-driven cohort partitioning. We demonstrate CohortFinder improves ML model performance in downstream digital pathology and medical image processing tasks. CohortFinder is freely available for download at cohortfinder.com.
RESUMO
Predicting distant recurrence of endometrial cancer (EC) is crucial for personalized adjuvant treatment. The current gold standard of combined pathological and molecular profiling is costly, hampering implementation. Here we developed HECTOR (histopathology-based endometrial cancer tailored outcome risk), a multimodal deep learning prognostic model using hematoxylin and eosin-stained, whole-slide images and tumor stage as input, on 2,072 patients from eight EC cohorts including the PORTEC-1/-2/-3 randomized trials. HECTOR demonstrated C-indices in internal (n = 353) and two external (n = 160 and n = 151) test sets of 0.789, 0.828 and 0.815, respectively, outperforming the current gold standard, and identified patients with markedly different outcomes (10-year distant recurrence-free probabilities of 97.0%, 77.7% and 58.1% for HECTOR low-, intermediate- and high-risk groups, respectively, by Kaplan-Meier analysis). HECTOR also predicted adjuvant chemotherapy benefit better than current methods. Morphological and genomic feature extraction identified correlates of HECTOR risk groups, some with therapeutic potential. HECTOR improves on the current gold standard and may help delivery of personalized treatment in EC.
Assuntos
Aprendizado Profundo , Neoplasias do Endométrio , Recidiva Local de Neoplasia , Humanos , Feminino , Neoplasias do Endométrio/patologia , Neoplasias do Endométrio/genética , Recidiva Local de Neoplasia/patologia , Recidiva Local de Neoplasia/genética , Prognóstico , Pessoa de Meia-Idade , Quimioterapia Adjuvante , Idoso , Estimativa de Kaplan-Meier , Fatores de Risco , Estadiamento de NeoplasiasRESUMO
The development of deep learning (DL) models to predict the consensus molecular subtypes (CMS) from histopathology images (imCMS) is a promising and cost-effective strategy to support patient stratification. Here, we investigate whether imCMS calls generated from whole slide histopathology images (WSIs) of rectal cancer (RC) pre-treatment biopsies are associated with pathological complete response (pCR) to neoadjuvant long course chemoradiotherapy (LCRT) with single agent fluoropyrimidine. DL models were trained to classify WSIs of colorectal cancers stained with hematoxylin and eosin into one of the four CMS classes using a multi-centric dataset of resection and biopsy specimens (n = 1057 WSIs) with paired transcriptional data. Classifiers were tested on a held out RC biopsy cohort (ARISTOTLE) and correlated with pCR to LCRT in an independent dataset merging two RC cohorts (ARISTOTLE, n = 114 and SALZBURG, n = 55 patients). DL models predicted CMS with high classification performance in multiple comparative analyses. In the independent cohorts (ARISTOTLE, SALZBURG), cases with WSIs classified as imCMS1 had a significantly higher likelihood of achieving pCR (OR = 2.69, 95% CI 1.01-7.17, p = 0.048). Conversely, imCMS4 was associated with lack of pCR (OR = 0.25, 95% CI 0.07-0.88, p = 0.031). Classification maps demonstrated pathologist-interpretable associations with high stromal content in imCMS4 cases, associated with poor outcome. No significant association was found in imCMS2 or imCMS3. imCMS classification of pre-treatment biopsies is a fast and inexpensive solution to identify patient groups that could benefit from neoadjuvant LCRT. The significant associations between imCMS1/imCMS4 with pCR suggest the existence of predictive morphological features that could enhance standard pathological assessment.
RESUMO
Recognition of mitotic figures in histologic tumor specimens is highly relevant to patient outcome assessment. This task is challenging for algorithms and human experts alike, with deterioration of algorithmic performance under shifts in image representations. Considerable covariate shifts occur when assessment is performed on different tumor types, images are acquired using different digitization devices, or specimens are produced in different laboratories. This observation motivated the inception of the 2022 challenge on MItosis Domain Generalization (MIDOG 2022). The challenge provided annotated histologic tumor images from six different domains and evaluated the algorithmic approaches for mitotic figure detection provided by nine challenge participants on ten independent domains. Ground truth for mitotic figure detection was established in two ways: a three-expert majority vote and an independent, immunohistochemistry-assisted set of labels. This work represents an overview of the challenge tasks, the algorithmic strategies employed by the participants, and potential factors contributing to their success. With an F1 score of 0.764 for the top-performing team, we summarize that domain generalization across various tumor domains is possible with today's deep learning-based recognition pipelines. However, we also found that domain characteristics not present in the training set (feline as new species, spindle cell shape as new morphology and a new scanner) led to small but significant decreases in performance. When assessed against the immunohistochemistry-assisted reference standard, all methods resulted in reduced recall scores, with only minor changes in the order of participants in the ranking.
Assuntos
Laboratórios , Mitose , Humanos , Animais , Gatos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Padrões de ReferênciaRESUMO
Molecular stratification using gene-level transcriptional data has identified subtypes with distinctive genotypic and phenotypic traits, as exemplified by the consensus molecular subtypes (CMS) in colorectal cancer (CRC). Here, rather than gene-level data, we make use of gene ontology and biological activation state information for initial molecular class discovery. In doing so, we defined three pathway-derived subtypes (PDS) in CRC: PDS1 tumors, which are canonical/LGR5+ stem-rich, highly proliferative and display good prognosis; PDS2 tumors, which are regenerative/ANXA1+ stem-rich, with elevated stromal and immune tumor microenvironmental lineages; and PDS3 tumors, which represent a previously overlooked slow-cycling subset of tumors within CMS2 with reduced stem populations and increased differentiated lineages, particularly enterocytes and enteroendocrine cells, yet display the worst prognosis in locally advanced disease. These PDS3 phenotypic traits are evident across numerous bulk and single-cell datasets, and demark a series of subtle biological states that are currently under-represented in pre-clinical models and are not identified using existing subtyping classifiers.
Assuntos
Neoplasias Colorretais , Humanos , Neoplasias Colorretais/patologia , Prognóstico , Diferenciação Celular/genética , Fenótipo , Biomarcadores Tumorais/genética , Perfilação da Expressão GênicaRESUMO
Both lymph node metastases (LNMs) and tumour deposits (TDs) are included in colorectal cancer (CRC) staging, although knowledge regarding their biological background is lacking. This study aimed to compare the biology of these prognostic features, which is essential for a better understanding of their role in CRC spread. Spatially resolved transcriptomic analysis using digital spatial profiling was performed on TDs and LNMs from 10 CRC patients using 1,388 RNA targets, for the tumour cells and tumour microenvironment. Shotgun proteomics identified 5,578 proteins in 12 different patients. Differences in RNA and protein expression were analysed, and spatial deconvolution was performed. Image-based consensus molecular subtype (imCMS) analysis was performed on all TDs and LNMs included in the study. Transcriptome and proteome profiles identified distinct clusters for TDs and LNMs in both the tumour and tumour microenvironment segment, with upregulation of matrix remodelling, cell adhesion/motility, and epithelial-mesenchymal transition (EMT) in TDs (all p < 0.05). Spatial deconvolution showed a significantly increased abundance of fibroblasts, macrophages, and regulatory T-cells (p < 0.05) in TDs. Consistent with a higher fibroblast and EMT component, imCMS classified 62% of TDs as poor prognosis subtype CMS4 compared to 36% of LNMs (p < 0.05). Compared to LNMs, TDs have a more invasive state involving a distinct tumour microenvironment and upregulation of EMT, which are reflected in a more frequent histological classification of TDs as CMS4. These results emphasise the heterogeneity of locoregional spread and the fact that TDs should merit more attention both in future research and during staging. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Assuntos
Neoplasias Colorretais , Transcriptoma , Humanos , Metástase Linfática , Extensão Extranodal , Proteômica , Prognóstico , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , RNA , Microambiente TumoralRESUMO
Introduction: Breast cancer (BC) prognosis is largely influenced by histopathological grade, assessed according to the Nottingham modification of Bloom-Richardson (BR). Mitotic count (MC) is a component of histopathological grading but is prone to subjectivity. This study investigated whether mitoses counting in BC using digital whole slide images (WSI) compares better to light microscopy (LM) when assisted by artificial intelligence (AI), and to which extent differences in digital MC (AI assisted or not) result in BR grade variations. Methods: Fifty BC patients with paired core biopsies and resections were randomly selected. Component scores for BR grade were extracted from pathology reports. MC was assessed using LM, WSI, and AI. Different modalities (LM-MC, WSI-MC, and AI-MC) were analyzed for correlation with scatterplots and linear regression, and for agreement in final BR with Cohen's κ. Results: MC modalities strongly correlated in both biopsies and resections: LM-MC and WSI-MC (R2 0.85 and 0.83, respectively), LM-MC and AI-MC (R2 0.85 and 0.95), and WSI-MC and AI-MC (R2 0.77 and 0.83). Agreement in BR between modalities was high in both biopsies and resections: LM-MC and WSI-MC (κ 0.93 and 0.83, respectively), LM-MC and AI-MC (κ 0.89 and 0.83), and WSI-MC and AI-MC (κ 0.96 and 0.73). Conclusion: This first validation study shows that WSI-MC may compare better to LM-MC when using AI. Agreement between BR grade based on the different mitoses counting modalities was high. These results suggest that mitoses counting on WSI can well be done, and validate the presented AI algorithm for pathologist supervised use in daily practice. Further research is required to advance our knowledge of AI-MC, but it appears at least non-inferior to LM-MC.
RESUMO
The density of mitotic figures (MF) within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of MF by pathologists is subject to a strong inter-rater bias, limiting its prognostic value. State-of-the-art deep learning methods can support experts but have been observed to strongly deteriorate when applied in a different clinical environment. The variability caused by using different whole slide scanners has been identified as one decisive component in the underlying domain shift. The goal of the MICCAI MIDOG 2021 challenge was the creation of scanner-agnostic MF detection algorithms. The challenge used a training set of 200 cases, split across four scanning systems. As test set, an additional 100 cases split across four scanning systems, including two previously unseen scanners, were provided. In this paper, we evaluate and compare the approaches that were submitted to the challenge and identify methodological factors contributing to better performance. The winning algorithm yielded an F1 score of 0.748 (CI95: 0.704-0.781), exceeding the performance of six experts on the same task.
Assuntos
Algoritmos , Mitose , Humanos , Gradação de Tumores , PrognósticoRESUMO
Rotation-invariance is a desired property of machine-learning models for medical image analysis and in particular for computational pathology applications. We propose a framework to encode the geometric structure of the special Euclidean motion group SE(2) in convolutional networks to yield translation and rotation equivariance via the introduction of SE(2)-group convolution layers. This structure enables models to learn feature representations with a discretized orientation dimension that guarantees that their outputs are invariant under a discrete set of rotations. Conventional approaches for rotation invariance rely mostly on data augmentation, but this does not guarantee the robustness of the output when the input is rotated. At that, trained conventional CNNs may require test-time rotation augmentation to reach their full capability. This study is focused on histopathology image analysis applications for which it is desirable that the arbitrary global orientation information of the imaged tissues is not captured by the machine learning models. The proposed framework is evaluated on three different histopathology image analysis tasks (mitosis detection, nuclei segmentation and tumor detection). We present a comparative analysis for each problem and show that consistent increase of performances can be achieved when using the proposed framework.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Humanos , Aprendizado de MáquinaRESUMO
Deep learning-based methods for deformable image registration are attractive alternatives to conventional registration methods because of their short registration times. However, these methods often fail to estimate larger displacements in complex deformation fields, for which a multi-resolution strategy is required. In this article, we propose to train neural networks progressively to address this problem. Instead of training a large convolutional neural network on the registration task all at once, we initially train smaller versions of the network on lower resolution versions of the images and deformation fields. During training, we progressively expand the network with additional layers that are trained on higher resolution data. We show that this way of training allows a network to learn larger displacements without sacrificing registration accuracy and that the resulting network is less sensitive to large misregistrations compared to training the full network all at once. We generate a large number of ground truth example data by applying random synthetic transformations to a training set of images, and test the network on the problem of intrapatient lung CT registration. We analyze the learned representations in the progressively growing network to assess how the progressive learning strategy influences training. Finally, we show that a progressive training procedure leads to improved registration accuracy when learning large and complex deformations.
Assuntos
Redes Neurais de Computação , Tomografia Computadorizada por Raios X , HumanosRESUMO
Histological images present high appearance variability due to inconsistent latent parameters related to the preparation and scanning procedure of histological slides, as well as the inherent biological variability of tissues. Machine-learning models are trained with images from a limited set of domains, and are expected to generalize to images from unseen domains. Methodological design choices have to be made in order to yield domain invariance and proper generalization. In digital pathology, standard approaches focus either on ad-hoc normalization of the latent parameters based on prior knowledge, such as staining normalization, or aim at anticipating new variations of these parameters via data augmentation. Since every histological image originates from a unique data distribution, we propose to consider every histological slide of the training data as a domain and investigated the alternative approach of domain-adversarial training to learn features that are invariant to this available domain information. We carried out a comparative analysis with staining normalization and data augmentation on two different tasks: generalization to images acquired in unseen pathology labs for mitosis detection and generalization to unseen organs for nuclei segmentation. We report that the utility of each method depends on the type of task and type of data variability present at training and test time. The proposed framework for domain-adversarial training is able to improve generalization performances on top of conventional methods.
RESUMO
We propose a novel variational autoencoder (VAE) framework for learning representations of cell images for the domain of image-based profiling, important for new therapeutic discovery. Previously, generative adversarial network-based (GAN) approaches were proposed to enable biologists to visualize structural variations in cells that drive differences in populations. However, while the images were realistic, they did not provide direct reconstructions from representations, and their performance in downstream analysis was poor. We address these limitations in our approach by adding an adversarial-driven similarity constraint applied to the standard VAE framework, and a progressive training procedure that allows higher quality reconstructions than standard VAE's. The proposed models improve classification accuracy by 22% (to 90%) compared to the best reported GAN model, making it competitive with other models that have higher quality representations, but lack the ability to synthesize images. This provides researchers a new tool to match cellular phenotypes effectively, and also to gain better insight into cellular structure variations that are driving differences between populations of cells.