|

1.

Patients' and dermatologists' preferences in artificial intelligence-driven skin cancer diagnostics: A prospective multicentric survey study.

Haggenmüller, Sarah; Maron, Roman C; Hekler, Achim; Krieghoff-Henning, Eva; Utikal, Jochen S; Gaiser, Maria; Müller, Verena; Fabian, Sascha; Meier, Friedegund; Hobelsberger, Sarah; Gellrich, Frank F; Sergon, Mildred; Hauschild, Axel; Weichenthal, Michael; French, Lars E; Heinzerling, Lucie; Schlager, Justin G; Ghoreschi, Kamran; Schlaak, Max; Hilke, Franz J; Poch, Gabriela; Korsing, Sören; Berking, Carola; Heppt, Markus V; Erdmann, Michael; Haferkamp, Sebastian; Drexler, Konstantin; Schadendorf, Dirk; Sondermann, Wiebke; Goebeler, Matthias; Schilling, Bastian; Kather, Jakob N; Fröhling, Stefan; Kaminski, Katharina; Doppler, Astrid; Bucher, Tabea; Brinker, Titus J.

J Am Acad Dermatol ; 2024 Apr 24.

Article En | MEDLINE | ID: mdl-38670313

2.

Federated Learning for Decentralized Artificial Intelligence in Melanoma Diagnostics.

Haggenmüller, Sarah; Schmitt, Max; Krieghoff-Henning, Eva; Hekler, Achim; Maron, Roman C; Wies, Christoph; Utikal, Jochen S; Meier, Friedegund; Hobelsberger, Sarah; Gellrich, Frank F; Sergon, Mildred; Hauschild, Axel; French, Lars E; Heinzerling, Lucie; Schlager, Justin G; Ghoreschi, Kamran; Schlaak, Max; Hilke, Franz J; Poch, Gabriela; Korsing, Sören; Berking, Carola; Heppt, Markus V; Erdmann, Michael; Haferkamp, Sebastian; Drexler, Konstantin; Schadendorf, Dirk; Sondermann, Wiebke; Goebeler, Matthias; Schilling, Bastian; Kather, Jakob N; Fröhling, Stefan; Brinker, Titus J.

JAMA Dermatol ; 160(3): 303-311, 2024 Mar 01.

Article En | MEDLINE | ID: mdl-38324293

Importance: The development of artificial intelligence (AI)-based melanoma classifiers typically calls for large, centralized datasets, requiring hospitals to give away their patient data, which raises serious privacy concerns. To address this concern, decentralized federated learning has been proposed, where classifier development is distributed across hospitals. Objective: To investigate whether a more privacy-preserving federated learning approach can achieve comparable diagnostic performance to a classical centralized (ie, single-model) and ensemble learning approach for AI-based melanoma diagnostics. Design, Setting, and Participants: This multicentric, single-arm diagnostic study developed a federated model for melanoma-nevus classification using histopathological whole-slide images prospectively acquired at 6 German university hospitals between April 2021 and February 2023 and benchmarked it using both a holdout and an external test dataset. Data analysis was performed from February to April 2023. Exposures: All whole-slide images were retrospectively analyzed by an AI-based classifier without influencing routine clinical care. Main Outcomes and Measures: The area under the receiver operating characteristic curve (AUROC) served as the primary end point for evaluating the diagnostic performance. Secondary end points included balanced accuracy, sensitivity, and specificity. Results: The study included 1025 whole-slide images of clinically melanoma-suspicious skin lesions from 923 patients, consisting of 388 histopathologically confirmed invasive melanomas and 637 nevi. The median (range) age at diagnosis was 58 (18-95) years for the training set, 57 (18-93) years for the holdout test dataset, and 61 (18-95) years for the external test dataset; the median (range) Breslow thickness was 0.70 (0.10-34.00) mm, 0.70 (0.20-14.40) mm, and 0.80 (0.30-20.00) mm, respectively. The federated approach (0.8579; 95% CI, 0.7693-0.9299) performed significantly worse than the classical centralized approach (0.9024; 95% CI, 0.8379-0.9565) in terms of AUROC on a holdout test dataset (pairwise Wilcoxon signed-rank, P < .001) but performed significantly better (0.9126; 95% CI, 0.8810-0.9412) than the classical centralized approach (0.9045; 95% CI, 0.8701-0.9331) on an external test dataset (pairwise Wilcoxon signed-rank, P < .001). Notably, the federated approach performed significantly worse than the ensemble approach on both the holdout (0.8867; 95% CI, 0.8103-0.9481) and external test dataset (0.9227; 95% CI, 0.8941-0.9479). Conclusions and Relevance: The findings of this diagnostic study suggest that federated learning is a viable approach for the binary classification of invasive melanomas and nevi on a clinically representative distributed dataset. Federated learning can improve privacy protection in AI-based melanoma diagnostics while simultaneously promoting collaboration across institutions and countries. Moreover, it may have the potential to be extended to other image classification tasks in digital cancer histopathology and beyond.

Dermatology , Melanoma , Nevus , Skin Neoplasms , Humans , Melanoma/diagnosis , Artificial Intelligence , Retrospective Studies , Skin Neoplasms/diagnosis , Nevus/diagnosis

3.

Evaluating deep learning-based melanoma classification using immunohistochemistry and routine histology: A three center study.

Wies, Christoph; Schneider, Lucas; Haggenmüller, Sarah; Bucher, Tabea-Clara; Hobelsberger, Sarah; Heppt, Markus V; Ferrara, Gerardo; Krieghoff-Henning, Eva I; Brinker, Titus J.

PLoS One ; 19(1): e0297146, 2024.

Article En | MEDLINE | ID: mdl-38241314

Pathologists routinely use immunohistochemical (IHC)-stained tissue slides against MelanA in addition to hematoxylin and eosin (H&E)-stained slides to improve their accuracy in diagnosing melanomas. The use of diagnostic Deep Learning (DL)-based support systems for automated examination of tissue morphology and cellular composition has been well studied in standard H&E-stained tissue slides. In contrast, there are few studies that analyze IHC slides using DL. Therefore, we investigated the separate and joint performance of ResNets trained on MelanA and corresponding H&E-stained slides. The MelanA classifier achieved an area under receiver operating characteristics curve (AUROC) of 0.82 and 0.74 on out of distribution (OOD)-datasets, similar to the H&E-based benchmark classification of 0.81 and 0.75, respectively. A combined classifier using MelanA and H&E achieved AUROCs of 0.85 and 0.81 on the OOD datasets. DL MelanA-based assistance systems show the same performance as the benchmark H&E classification and may be improved by multi stain classification to assist pathologists in their clinical routine.

Deep Learning , Melanoma , Humans , Melanoma/diagnosis , Immunohistochemistry , MART-1 Antigen , ROC Curve

4.

Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma.

Chanda, Tirtha; Hauser, Katja; Hobelsberger, Sarah; Bucher, Tabea-Clara; Garcia, Carina Nogueira; Wies, Christoph; Kittler, Harald; Tschandl, Philipp; Navarrete-Dechent, Cristian; Podlipnik, Sebastian; Chousakos, Emmanouil; Crnaric, Iva; Majstorovic, Jovana; Alhajwan, Linda; Foreman, Tanya; Peternel, Sandra; Sarap, Sergei; Özdemir, Irem; Barnhill, Raymond L; Llamas-Velasco, Mar; Poch, Gabriela; Korsing, Sören; Sondermann, Wiebke; Gellrich, Frank Friedrich; Heppt, Markus V; Erdmann, Michael; Haferkamp, Sebastian; Drexler, Konstantin; Goebeler, Matthias; Schilling, Bastian; Utikal, Jochen S; Ghoreschi, Kamran; Fröhling, Stefan; Krieghoff-Henning, Eva; Brinker, Titus J.

Nat Commun ; 15(1): 524, 2024 Jan 15.

Article En | MEDLINE | ID: mdl-38225244

Artificial intelligence (AI) systems have been shown to help dermatologists diagnose melanoma more accurately, however they lack transparency, hindering user acceptance. Explainable AI (XAI) methods can help to increase transparency, yet often lack precise, domain-specific explanations. Moreover, the impact of XAI methods on dermatologists' decisions has not yet been evaluated. Building upon previous research, we introduce an XAI system that provides precise and domain-specific explanations alongside its differential diagnoses of melanomas and nevi. Through a three-phase study, we assess its impact on dermatologists' diagnostic accuracy, diagnostic confidence, and trust in the XAI-support. Our results show strong alignment between XAI and dermatologist explanations. We also show that dermatologists' confidence in their diagnoses, and their trust in the support system significantly increase with XAI compared to conventional AI. This study highlights dermatologists' willingness to adopt such XAI systems, promoting future use in the clinic.

Melanoma , Trust , Humans , Artificial Intelligence , Dermatologists , Melanoma/diagnosis , Diagnosis, Differential

5.

Using multiple real-world dermoscopic photographs of one lesion improves melanoma classification via deep learning.

Hekler, Achim; Maron, Roman C; Haggenmüller, Sarah; Schmitt, Max; Wies, Christoph; Utikal, Jochen S; Meier, Friedegund; Hobelsberger, Sarah; Gellrich, Frank F; Sergon, Mildred; Hauschild, Axel; French, Lars E; Heinzerling, Lucie; Schlager, Justin G; Ghoreschi, Kamran; Schlaak, Max; Hilke, Franz J; Poch, Gabriela; Korsing, Sören; Berking, Carola; Heppt, Markus V; Erdmann, Michael; Haferkamp, Sebastian; Drexler, Konstantin; Schadendorf, Dirk; Sondermann, Wiebke; Goebeler, Matthias; Schilling, Bastian; Kather, Jakob N; Krieghoff-Henning, Eva; Brinker, Titus J.

J Am Acad Dermatol ; 90(5): 1028-1031, 2024 May.

Article En | MEDLINE | ID: mdl-38199280

Deep Learning , Melanoma , Skin Neoplasms , Humans , Melanoma/diagnostic imaging , Melanoma/pathology , Skin Neoplasms/diagnostic imaging , Skin Neoplasms/pathology , Algorithms , Dermoscopy

6.

Deep learning to predict breast cancer sentinel lymph node status on INSEMA histological images.

Marmé, Frederik; Krieghoff-Henning, Eva; Gerber, Bernd; Schmitt, Max; Zahm, Dirk-Michael; Bauerschlag, Dirk; Forstbauer, Helmut; Hildebrandt, Guido; Ataseven, Beyhan; Brodkorb, Tobias; Denkert, Carsten; Stachs, Angrit; Krug, David; Heil, Jörg; Golatta, Michael; Kühn, Thorsten; Nekljudova, Valentina; Gaiser, Timo; Schönmehl, Rebecca; Brochhausen, Christoph; Loibl, Sibylle; Reimer, Toralf; Brinker, Titus J.

Eur J Cancer ; 195: 113390, 2023 12.

Article En | MEDLINE | ID: mdl-37890350

BACKGROUND: Sentinel lymph node (SLN) status is a clinically important prognostic biomarker in breast cancer and is used to guide therapy, especially for hormone receptor-positive, HER2-negative cases. However, invasive lymph node staging is increasingly omitted before therapy, and studies such as the randomised Intergroup Sentinel Mamma (INSEMA) trial address the potential for further de-escalation of axillary surgery. Therefore, it would be helpful to accurately predict the pretherapeutic sentinel status using medical images. METHODS: Using a ResNet 50 architecture pretrained on ImageNet and a previously successful strategy, we trained deep learning (DL)-based image analysis algorithms to predict sentinel status on hematoxylin/eosin-stained images of predominantly luminal, primary breast tumours from the INSEMA trial and three additional, independent cohorts (The Cancer Genome Atlas (TCGA) and cohorts from the University hospitals of Mannheim and Regensburg), and compared their performance with that of a logistic regression using clinical data only. Performance on an INSEMA hold-out set was investigated in a blinded manner. RESULTS: None of the generated image analysis algorithms yielded significantly better than random areas under the receiver operating characteristic curves on the test sets, including the hold-out test set from INSEMA. In contrast, the logistic regression fitted on the Mannheim cohort retained a better than random performance on INSEMA and Regensburg. Including the image analysis model output in the logistic regression did not improve performance further on INSEMA. CONCLUSIONS: Employing DL-based image analysis on histological slides, we could not predict SLN status for unseen cases in the INSEMA trial and other predominantly luminal cohorts.

Breast Neoplasms , Deep Learning , Lymphadenopathy , Sentinel Lymph Node , Female , Humans , Axilla/pathology , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/surgery , Breast Neoplasms/genetics , Lymph Node Excision/methods , Lymph Nodes/pathology , Lymphatic Metastasis/pathology , Sentinel Lymph Node/pathology , Sentinel Lymph Node Biopsy/methods

7.

Colorectal cancer risk stratification on histological slides based on survival curves predicted by deep learning.

Höhn, Julia; Krieghoff-Henning, Eva; Wies, Christoph; Kiehl, Lennard; Hetz, Martin J; Bucher, Tabea-Clara; Jonnagaddala, Jitendra; Zatloukal, Kurt; Müller, Heimo; Plass, Markus; Jungwirth, Emilian; Gaiser, Timo; Steeg, Matthias; Holland-Letz, Tim; Brenner, Hermann; Hoffmeister, Michael; Brinker, Titus J.

NPJ Precis Oncol ; 7(1): 98, 2023 Sep 26.

Article En | MEDLINE | ID: mdl-37752266

Studies have shown that colorectal cancer prognosis can be predicted by deep learning-based analysis of histological tissue sections of the primary tumor. So far, this has been achieved using a binary prediction. Survival curves might contain more detailed information and thus enable a more fine-grained risk prediction. Therefore, we established survival curve-based CRC survival predictors and benchmarked them against standard binary survival predictors, comparing their performance extensively on the clinical high and low risk subsets of one internal and three external cohorts. Survival curve-based risk prediction achieved a very similar risk stratification to binary risk prediction for this task. Exchanging other components of the pipeline, namely input tissue and feature extractor, had largely identical effects on model performance independently of the type of risk prediction. An ensemble of all survival curve-based models exhibited a more robust performance, as did a similar ensemble based on binary risk prediction. Patients could be further stratified within clinical risk groups. However, performance still varied across cohorts, indicating limited generalization of all investigated image analysis pipelines, whereas models using clinical data performed robustly on all cohorts.

8.

A self-supervised vision transformer to predict survival from histopathology in renal cell carcinoma.

Wessels, Frederik; Schmitt, Max; Krieghoff-Henning, Eva; Nientiedt, Malin; Waldbillig, Frank; Neuberger, Manuel; Kriegmair, Maximilian C; Kowalewski, Karl-Friedrich; Worst, Thomas S; Steeg, Matthias; Popovic, Zoran V; Gaiser, Timo; von Kalle, Christof; Utikal, Jochen S; Fröhling, Stefan; Michel, Maurice S; Nuhn, Philipp; Brinker, Titus J.

World J Urol ; 41(8): 2233-2241, 2023 Aug.

Article En | MEDLINE | ID: mdl-37382622

PURPOSE: To develop and validate an interpretable deep learning model to predict overall and disease-specific survival (OS/DSS) in clear cell renal cell carcinoma (ccRCC). METHODS: Digitised haematoxylin and eosin-stained slides from The Cancer Genome Atlas were used as a training set for a vision transformer (ViT) to extract image features with a self-supervised model called DINO (self-distillation with no labels). Extracted features were used in Cox regression models to prognosticate OS and DSS. Kaplan-Meier for univariable evaluation and Cox regression analyses for multivariable evaluation of the DINO-ViT risk groups were performed for prediction of OS and DSS. For validation, a cohort from a tertiary care centre was used. RESULTS: A significant risk stratification was achieved in univariable analysis for OS and DSS in the training (n = 443, log rank test, p < 0.01) and validation set (n = 266, p < 0.01). In multivariable analysis, including age, metastatic status, tumour size and grading, the DINO-ViT risk stratification was a significant predictor for OS (hazard ratio [HR] 3.03; 95%-confidence interval [95%-CI] 2.11-4.35; p < 0.01) and DSS (HR 4.90; 95%-CI 2.78-8.64; p < 0.01) in the training set but only for DSS in the validation set (HR 2.31; 95%-CI 1.15-4.65; p = 0.02). DINO-ViT visualisation showed that features were mainly extracted from nuclei, cytoplasm, and peritumoural stroma, demonstrating good interpretability. CONCLUSION: The DINO-ViT can identify high-risk patients using histological images of ccRCC. This model might improve individual risk-adapted renal cancer therapy in the future.

Carcinoma, Renal Cell , Kidney Neoplasms , Humans , Carcinoma, Renal Cell/pathology , Kidney Neoplasms/pathology , Proportional Hazards Models , Risk Factors , Endoscopy , Prognosis

9.

Multimodal integration of image, epigenetic and clinical data to predict BRAF mutation status in melanoma.

Schneider, Lucas; Wies, Christoph; Krieghoff-Henning, Eva I; Bucher, Tabea-Clara; Utikal, Jochen S; Schadendorf, Dirk; Brinker, Titus J.

Eur J Cancer ; 183: 131-138, 2023 04.

Article En | MEDLINE | ID: mdl-36854237

BACKGROUND: In machine learning, multimodal classifiers can provide more generalised performance than unimodal classifiers. In clinical practice, physicians usually also rely on a range of information from different examinations for diagnosis. In this study, we used BRAF mutation status prediction in melanoma as a model system to analyse the contribution of different data types in a combined classifier because BRAF status can be determined accurately by sequencing as the current gold standard, thus nearly eliminating label noise. METHODS: We trained a deep learning-based classifier by combining individually trained random forests of image, clinical and methylation data to predict BRAF-V600 mutation status in primary and metastatic melanomas of The Cancer Genome Atlas cohort. RESULTS: With our multimodal approach, we achieved an area under the receiver operating characteristic curve of 0.80, whereas the individual classifiers yielded areas under the receiver operating characteristic curve of 0.63 (histopathologic image data), 0.66 (clinical data) and 0.66 (methylation data) on an independent data set. CONCLUSIONS: Our combined approach can predict BRAF status to some extent by identifying BRAF-V600 specific patterns at the histologic, clinical and epigenetic levels. The multimodal classifiers have improved generalisability in predicting BRAF mutation status.

Melanoma , Skin Neoplasms , Humans , Proto-Oncogene Proteins B-raf/genetics , Melanoma/pathology , Skin Neoplasms/pathology , Mutation , Epigenesis, Genetic

10.

[The Rise of Artificial Intelligence - High Prediction Accuracy in Early Detection of Pigmented Melanoma]. / Künstliche Intelligenz auf dem Vormarsch Hohe Vorhersage-Genauigkeit bei der Früherkennung pigmentierter Melanome.

Jutzi, Tanja; Krieghoff-Henning, Eva I; Brinker, Titus J.

Laryngorhinootologie ; 102(7): 496-503, 2023 07.

Article De | MEDLINE | ID: mdl-36580975

The incidence of malignant melanoma is increasing worldwide. If detected early, melanoma is highly treatable, so early detection is vital.Skin cancer early detection has improved significantly in recent decades, for example by the introduction of screening in 2008 and dermoscopy. Nevertheless, in particular visual detection of early melanomas remains challenging because they show many morphological overlaps with nevi. Hence, there continues to be a high medical need to further develop methods for early skin cancer detection in order to be able to reliably diagnosemelanomas at a very early stage.Routine diagnostics for melanoma detection include visual whole body inspection, often supplemented by dermoscopy, which can significantly increase the diagnostic accuracy of experienced dermatologists. A procedure that is additionally offered in some practices and clinics is wholebody photography combined with digital dermoscopy for the early detection of malignant melanoma, especially for monitoring high-risk patients.In recent decades, numerous noninvasive adjunctive diagnostic techniques were developed for the examination of suspicious pigmented moles, that may have the potential to allow improved and, in some cases, automated evaluation of these lesions. First, confocal laser microscopy should be mentioned here, as well as electrical impedance spectroscopy, multiphoton laser tomography, multispectral analysis, Raman spectroscopy or optical coherence tomography. These diagnostic techniques usually focus on high sensitivity to avoid malignant melanoma being overlooked. However, this usually implies lower specificity, which may lead to unnecessary excision of benign lesions in screening. Also, some of the procedures are time-consuming and costly, which also limits their applicability in skin cancer screening. In the near future, the use of artificial intelligence might change skin cancer diagnostics in many ways. The most promising approach may be the analysis of routine macroscopic and dermoscopic images by artificial intelligence.For the classification of pigmented skin lesions based on macroscopic and dermoscopic images, artificial intelligence, especially in form of neural networks, has achieved comparable diagnostic accuracies to dermatologists under experimental conditions in numerous studies. In particular, it achieved high accuracies in the binary melanoma/nevus classification task, but it also performed comparably well to dermatologists in multiclass differentiation of various skin diseases. However, proof of the basic applicability and utility of such systems in clinical practice is still pending. Prerequisites that remain to be established to enable translation of such diagnostic systems into dermatological routine are means that allow users to comprehend the system's decisions as well as a uniformly high performance of the algorithms on image data from other hospitals and practices.At present, hints are accumulating that computer-aided diagnosis systems could provide their greatest benefit as assistance systems, since studies indicate that a combination of human and machine achieves the best results. Diagnostic systems based on artificial intelligence are capable of detecting morphological characteristics quickly, quantitatively, objectively and reproducibly, and could thus provide a more objective analytical basis - in addition to medical experience.

Melanoma , Skin Neoplasms , Humans , Artificial Intelligence , Skin Neoplasms/diagnosis , Skin Neoplasms/pathology , Melanoma/diagnosis , Algorithms , Dermoscopy/methods , Sensitivity and Specificity , Melanoma, Cutaneous Malignant

11.

Uncertainty Estimation in Medical Image Classification: Systematic Review.

Kurz, Alexander; Hauser, Katja; Mehrtens, Hendrik Alexander; Krieghoff-Henning, Eva; Hekler, Achim; Kather, Jakob Nikolas; Fröhling, Stefan; von Kalle, Christof; Brinker, Titus Josef.

JMIR Med Inform ; 10(8): e36427, 2022 Aug 02.

Article En | MEDLINE | ID: mdl-35916701

BACKGROUND: Deep neural networks are showing impressive results in different medical image classification tasks. However, for real-world applications, there is a need to estimate the network's uncertainty together with its prediction. OBJECTIVE: In this review, we investigate in what form uncertainty estimation has been applied to the task of medical image classification. We also investigate which metrics are used to describe the effectiveness of the applied uncertainty estimation. METHODS: Google Scholar, PubMed, IEEE Xplore, and ScienceDirect were screened for peer-reviewed studies, published between 2016 and 2021, that deal with uncertainty estimation in medical image classification. The search terms "uncertainty," "uncertainty estimation," "network calibration," and "out-of-distribution detection" were used in combination with the terms "medical images," "medical image analysis," and "medical image classification." RESULTS: A total of 22 papers were chosen for detailed analysis through the systematic review process. This paper provides a table for a systematic comparison of the included works with respect to the applied method for estimating the uncertainty. CONCLUSIONS: The applied methods for estimating uncertainties are diverse, but the sampling-based methods Monte-Carlo Dropout and Deep Ensembles are used most frequently. We concluded that future works can investigate the benefits of uncertainty estimation in collaborative settings of artificial intelligence systems and human experts. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/11936.

12.

Model soups improve performance of dermoscopic skin cancer classifiers.

Maron, Roman C; Hekler, Achim; Haggenmüller, Sarah; von Kalle, Christof; Utikal, Jochen S; Müller, Verena; Gaiser, Maria; Meier, Friedegund; Hobelsberger, Sarah; Gellrich, Frank F; Sergon, Mildred; Hauschild, Axel; French, Lars E; Heinzerling, Lucie; Schlager, Justin G; Ghoreschi, Kamran; Schlaak, Max; Hilke, Franz J; Poch, Gabriela; Korsing, Sören; Berking, Carola; Heppt, Markus V; Erdmann, Michael; Haferkamp, Sebastian; Schadendorf, Dirk; Sondermann, Wiebke; Goebeler, Matthias; Schilling, Bastian; Kather, Jakob N; Fröhling, Stefan; Lipka, Daniel B; Krieghoff-Henning, Eva; Brinker, Titus J.

Eur J Cancer ; 173: 307-316, 2022 09.

Article En | MEDLINE | ID: mdl-35973360

BACKGROUND: Image-based cancer classifiers suffer from a variety of problems which negatively affect their performance. For example, variation in image brightness or different cameras can already suffice to diminish performance. Ensemble solutions, where multiple model predictions are combined into one, can improve these problems. However, ensembles are computationally intensive and less transparent to practitioners than single model solutions. Constructing model soups, by averaging the weights of multiple models into a single model, could circumvent these limitations while still improving performance. OBJECTIVE: To investigate the performance of model soups for a dermoscopic melanoma-nevus skin cancer classification task with respect to (1) generalisation to images from other clinics, (2) robustness against small image changes and (3) calibration such that the confidences correspond closely to the actual predictive uncertainties. METHODS: We construct model soups by fine-tuning pre-trained models on seven different image resolutions and subsequently averaging their weights. Performance is evaluated on a multi-source dataset including holdout and external components. RESULTS: We find that model soups improve generalisation and calibration on the external component while maintaining performance on the holdout component. For robustness, we observe performance improvements for pertubated test images, while the performance on corrupted test images remains on par. CONCLUSIONS: Overall, souping for skin cancer classifiers has a positive effect on generalisation, robustness and calibration. It is easy for practitioners to implement and by combining multiple models into a single model, complexity is reduced. This could be an important factor in achieving clinical applicability, as less complexity generally means more transparency.

Melanoma , Skin Neoplasms , Dermoscopy/methods , Humans , Melanoma/diagnostic imaging , Sensitivity and Specificity , Skin Neoplasms/diagnostic imaging , Melanoma, Cutaneous Malignant

13.

Deep learning can predict survival directly from histology in clear cell renal cell carcinoma.

Wessels, Frederik; Schmitt, Max; Krieghoff-Henning, Eva; Kather, Jakob N; Nientiedt, Malin; Kriegmair, Maximilian C; Worst, Thomas S; Neuberger, Manuel; Steeg, Matthias; Popovic, Zoran V; Gaiser, Timo; von Kalle, Christof; Utikal, Jochen S; Fröhling, Stefan; Michel, Maurice S; Nuhn, Philipp; Brinker, Titus J.

PLoS One ; 17(8): e0272656, 2022.

Article En | MEDLINE | ID: mdl-35976907

For clear cell renal cell carcinoma (ccRCC) risk-dependent diagnostic and therapeutic algorithms are routinely implemented in clinical practice. Artificial intelligence-based image analysis has the potential to improve outcome prediction and thereby risk stratification. Thus, we investigated whether a convolutional neural network (CNN) can extract relevant image features from a representative hematoxylin and eosin-stained slide to predict 5-year overall survival (5y-OS) in ccRCC. The CNN was trained to predict 5y-OS in a binary manner using slides from TCGA and validated using an independent in-house cohort. Multivariable logistic regression was used to combine of the CNNs prediction and clinicopathological parameters. A mean balanced accuracy of 72.0% (standard deviation [SD] = 7.9%), sensitivity of 72.4% (SD = 10.6%), specificity of 71.7% (SD = 11.9%) and area under receiver operating characteristics curve (AUROC) of 0.75 (SD = 0.07) was achieved on the TCGA training set (n = 254 patients / WSIs) using 10-fold cross-validation. On the external validation cohort (n = 99 patients / WSIs), mean accuracy, sensitivity, specificity and AUROC were 65.5% (95%-confidence interval [CI]: 62.9-68.1%), 86.2% (95%-CI: 81.8-90.5%), 44.9% (95%-CI: 40.2-49.6%), and 0.70 (95%-CI: 0.69-0.71). A multivariable model including age, tumor stage and metastasis yielded an AUROC of 0.75 on the TCGA cohort. The inclusion of the CNN-based classification (Odds ratio = 4.86, 95%-CI: 2.70-8.75, p < 0.01) raised the AUROC to 0.81. On the validation cohort, both models showed an AUROC of 0.88. In univariable Cox regression, the CNN showed a hazard ratio of 3.69 (95%-CI: 2.60-5.23, p < 0.01) on TCGA and 2.13 (95%-CI: 0.92-4.94, p = 0.08) on external validation. The results demonstrate that the CNN's image-based prediction of survival is promising and thus this widely applicable technique should be further investigated with the aim of improving existing risk stratification in ccRCC.

Carcinoma, Renal Cell , Deep Learning , Kidney Neoplasms , Artificial Intelligence , Carcinoma, Renal Cell/diagnosis , Carcinoma, Renal Cell/genetics , Humans , Kidney Neoplasms/diagnosis , Kidney Neoplasms/genetics , Neural Networks, Computer , Retrospective Studies

14.

Response to letter entitled: Re: Integration of deep learning-based image analysis and genomic data in cancer pathology: A systematic review.

Schneider, Lucas; Krieghoff-Henning, Eva; Laiouar-Pedari, Sara; Kuntz, Sara; Hekler, Achim; Kather, Jakob N; Gaiser, Timo; Fröhling, Stefan; Brinker, Titus J.

Eur J Cancer ; 172: 403-404, 2022 09.

Article En | MEDLINE | ID: mdl-35781181

Deep Learning , Neoplasms , Genomics , Humans , Image Processing, Computer-Assisted , Neoplasms/genetics

15.

Explainable artificial intelligence in skin cancer recognition: A systematic review.

Hauser, Katja; Kurz, Alexander; Haggenmüller, Sarah; Maron, Roman C; von Kalle, Christof; Utikal, Jochen S; Meier, Friedegund; Hobelsberger, Sarah; Gellrich, Frank F; Sergon, Mildred; Hauschild, Axel; French, Lars E; Heinzerling, Lucie; Schlager, Justin G; Ghoreschi, Kamran; Schlaak, Max; Hilke, Franz J; Poch, Gabriela; Kutzner, Heinz; Berking, Carola; Heppt, Markus V; Erdmann, Michael; Haferkamp, Sebastian; Schadendorf, Dirk; Sondermann, Wiebke; Goebeler, Matthias; Schilling, Bastian; Kather, Jakob N; Fröhling, Stefan; Lipka, Daniel B; Hekler, Achim; Krieghoff-Henning, Eva; Brinker, Titus J.

Eur J Cancer ; 167: 54-69, 2022 05.

Article En | MEDLINE | ID: mdl-35390650

BACKGROUND: Due to their ability to solve complex problems, deep neural networks (DNNs) are becoming increasingly popular in medical applications. However, decision-making by such algorithms is essentially a black-box process that renders it difficult for physicians to judge whether the decisions are reliable. The use of explainable artificial intelligence (XAI) is often suggested as a solution to this problem. We investigate how XAI is used for skin cancer detection: how is it used during the development of new DNNs? What kinds of visualisations are commonly used? Are there systematic evaluations of XAI with dermatologists or dermatopathologists? METHODS: Google Scholar, PubMed, IEEE Explore, Science Direct and Scopus were searched for peer-reviewed studies published between January 2017 and October 2021 applying XAI to dermatological images: the search terms histopathological image, whole-slide image, clinical image, dermoscopic image, skin, dermatology, explainable, interpretable and XAI were used in various combinations. Only studies concerned with skin cancer were included. RESULTS: 37 publications fulfilled our inclusion criteria. Most studies (19/37) simply applied existing XAI methods to their classifier to interpret its decision-making. Some studies (4/37) proposed new XAI methods or improved upon existing techniques. 14/37 studies addressed specific questions such as bias detection and impact of XAI on man-machine-interactions. However, only three of them evaluated the performance and confidence of humans using CAD systems with XAI. CONCLUSION: XAI is commonly applied during the development of DNNs for skin cancer detection. However, a systematic and rigorous evaluation of its usefulness in this scenario is lacking.

Artificial Intelligence , Skin Neoplasms , Algorithms , Humans , Neural Networks, Computer , Skin Neoplasms/diagnosis

16.

Artificial intelligence to predict oncological outcome directly from hematoxylin and eosin-stained slides in urology.

Wessels, Frederik; Kuntz, Sara; Krieghoff-Henning, Eva; Schmitt, Max; Braun, Volker; Worst, Thomas S; Neuberger, Manuel; Steeg, Matthias; Gaiser, Timo; Fröhling, Stefan; Michel, Maurice-Stephan; Nuhn, Philipp; Brinker, Titus J.

Minerva Urol Nephrol ; 74(5): 538-550, 2022 Oct.

Article En | MEDLINE | ID: mdl-35274903

INTRODUCTION: Artificial intelligence (AI) has been successfully applied for automatic tumor detection and grading in histopathological image analysis in urologic oncology. The aim of this review was to assess the applicability of these approaches in image-based oncological outcome prediction. EVIDENCE ACQUISITION: A systematic literature search was conducted using the databases MEDLINE through PubMed and Web of Science up to April 20, 2021. Studies investigating AI approaches to determine the risk of recurrence, metastasis, or survival directly from H&E-stained tissue sections in prostate, renal cell or urothelial carcinoma were included. Characteristics of the AI approach and performance metrics were extracted and summarized. Risk of bias (RoB) was assessed using the PROBAST tool. EVIDENCE SYNTHESIS: 16 studies yielding a total of 6658 patients and reporting on 17 outcome predictions were included. Six studies focused on renal cell, six on prostate and three on urothelial carcinoma while one study investigated renal cell and urothelial carcinoma. Handcrafted feature extraction was used in five, a convolutional neural network (CNN) in six and a deep feature extraction in four studies. One study compared a CNN with handcrafted feature extraction. In seven outcome predictions, a multivariable comparison with clinicopathological parameters was reported. Five of them showed statistically significant hazard ratios for the AI's model's-prediction. However, RoB was high in 15 outcome predictions and unclear in two. CONCLUSIONS: The included studies are promising but predominantly early pilot studies, therefore primarily highlighting the potential of AI approaches. Additional well-designed studies are needed to assess the actual clinical applicability.

Carcinoma, Transitional Cell , Urinary Bladder Neoplasms , Urology , Artificial Intelligence , Eosine Yellowish-(YS) , Hematoxylin , Humans , Male

17.

Diagnostic performance of artificial intelligence for histologic melanoma recognition compared to 18 international expert pathologists.

Brinker, Titus J; Schmitt, Max; Krieghoff-Henning, Eva I; Barnhill, Raymond; Beltraminelli, Helmut; Braun, Stephan A; Carr, Richard; Fernandez-Figueras, Maria-Teresa; Ferrara, Gerardo; Fraitag, Sylvie; Gianotti, Raffaele; Llamas-Velasco, Mar; Müller, Cornelia S L; Perasole, Antonio; Requena, Luis; Sangueza, Omar P; Santonja, Carlos; Starz, Hans; Vale, Esmeralda; Weyers, Wolfgang; Hekler, Achim; Kather, Jakob N; Fröhling, Stefan; Krahl, Dieter; Holland-Letz, Tim; Utikal, Jochen S; Saggini, Andrea; Kutzner, Heinz.

J Am Acad Dermatol ; 86(3): 640-642, 2022 03.

Article En | MEDLINE | ID: mdl-33581189

Melanoma , Skin Neoplasms , Artificial Intelligence , Humans , Melanoma/diagnosis , Melanoma/pathology , Pathologists , Skin Neoplasms/diagnosis , Skin Neoplasms/pathology

18.

Integration of deep learning-based image analysis and genomic data in cancer pathology: A systematic review.

Schneider, Lucas; Laiouar-Pedari, Sara; Kuntz, Sara; Krieghoff-Henning, Eva; Hekler, Achim; Kather, Jakob N; Gaiser, Timo; Fröhling, Stefan; Brinker, Titus J.

Eur J Cancer ; 160: 80-91, 2022 01.

Article En | MEDLINE | ID: mdl-34810047

BACKGROUND: Over the past decade, the development of molecular high-throughput methods (omics) increased rapidly and provided new insights for cancer research. In parallel, deep learning approaches revealed the enormous potential for medical image analysis, especially in digital pathology. Combining image and omics data with deep learning tools may enable the discovery of new cancer biomarkers and a more precise prediction of patient prognosis. This systematic review addresses different multimodal fusion methods of convolutional neural network-based image analyses with omics data, focussing on the impact of data combination on the classification performance. METHODS: PubMed was screened for peer-reviewed articles published in English between January 2015 and June 2021 by two independent researchers. Search terms related to deep learning, digital pathology, omics, and multimodal fusion were combined. RESULTS: We identified a total of 11 studies meeting the inclusion criteria, namely studies that used convolutional neural networks for haematoxylin and eosin image analysis of patients with cancer in combination with integrated omics data. Publications were categorised according to their endpoints: 7 studies focused on survival analysis and 4 studies on prediction of cancer subtypes, malignancy or microsatellite instability with spatial analysis. CONCLUSIONS: Image-based classifiers already show high performances in prognostic and predictive cancer diagnostics. The integration of omics data led to improved performance in all studies described here. However, these are very early studies that still require external validation to demonstrate their generalisability and robustness. Further and more comprehensive studies with larger sample sizes are needed to evaluate performance and determine clinical benefits.

Deep Learning/standards , Genomics/methods , Image Processing, Computer-Assisted/methods , Neoplasms/genetics , Humans , Neoplasms/pathology

19.

Deep learning can predict lymph node status directly from histology in colorectal cancer.

Kiehl, Lennard; Kuntz, Sara; Höhn, Julia; Jutzi, Tanja; Krieghoff-Henning, Eva; Kather, Jakob N; Holland-Letz, Tim; Kopp-Schneider, Annette; Chang-Claude, Jenny; Brobeil, Alexander; von Kalle, Christof; Fröhling, Stefan; Alwers, Elizabeth; Brenner, Hermann; Hoffmeister, Michael; Brinker, Titus J.

Eur J Cancer ; 157: 464-473, 2021 11.

Article En | MEDLINE | ID: mdl-34649117

BACKGROUND: Lymph node status is a prognostic marker and strongly influences therapeutic decisions in colorectal cancer (CRC). OBJECTIVES: The objective of the study is to investigate whether image features extracted by a deep learning model from routine histological slides and/or clinical data can be used to predict CRC lymph node metastasis (LNM). METHODS: Using histological whole slide images (WSIs) of primary tumours of 2431 patients in the DACHS cohort, we trained a convolutional neural network to predict LNM. In parallel, we used clinical data derived from the same cases in logistic regression analyses. Subsequently, the slide-based artificial intelligence predictor (SBAIP) score was included in the regression. WSIs and data from 582 patients of the TCGA cohort were used as the external test set. RESULTS: On the internal test set, the SBAIP achieved an area under receiver operating characteristic (AUROC) of 71.0%, the clinical classifier achieved an AUROC of 67.0% and a combination of the two classifiers yielded an improvement to 74.1%. Whereas the clinical classifier's performance remained stable on the TCGA set, performance of the SBAIP dropped to an AUROC of 61.2%. Performance of the clinical classifier depended strongly on the T stage. CONCLUSION: Deep learning-based image analysis may help predict LNM of patients with CRC using routine histological slides. Combination with clinical data such as T stage might be useful. Strategies to increase performance of the SBAIP on external images should be investigated.

Colorectal Neoplasms/pathology , Deep Learning , Image Processing, Computer-Assisted/methods , Lymphatic Metastasis/diagnosis , Aged , Aged, 80 and over , Case-Control Studies , Cohort Studies , Colon/pathology , Colon/surgery , Colorectal Neoplasms/diagnosis , Colorectal Neoplasms/surgery , Female , Humans , Lymph Nodes/pathology , Male , Middle Aged , Neoplasm Staging , Prognosis , ROC Curve , Rectum/pathology , Rectum/surgery

20.

Digital Natives' Preferences on Mobile Artificial Intelligence Apps for Skin Cancer Diagnostics: Survey Study.

Haggenmüller, Sarah; Krieghoff-Henning, Eva; Jutzi, Tanja; Trapp, Nicole; Kiehl, Lennard; Utikal, Jochen Sven; Fabian, Sascha; Brinker, Titus Josef.

JMIR Mhealth Uhealth ; 9(8): e22909, 2021 08 27.

Article En | MEDLINE | ID: mdl-34448722

BACKGROUND: Artificial intelligence (AI) has shown potential to improve diagnostics of various diseases, especially for early detection of skin cancer. Studies have yet to investigate the clear application of AI technology in clinical practice or determine the added value for younger user groups. Translation of AI-based diagnostic tools can only be successful if they are accepted by potential users. Young adults as digital natives may offer the greatest potential for successful implementation of AI into clinical practice, while at the same time, representing the future generation of skin cancer screening participants. OBJECTIVE: We conducted an anonymous online survey to examine how and to what extent individuals are willing to accept AI-based mobile apps for skin cancer diagnostics. We evaluated preferences and relative influences of concerns, with a focus on younger age groups. METHODS: We recruited participants below 35 years of age using three social media channels-Facebook, LinkedIn, and Xing. Descriptive analysis and statistical tests were performed to evaluate participants' attitudes toward mobile apps for skin examination. We integrated an adaptive choice-based conjoint to assess participants' preferences. We evaluated potential concerns using maximum difference scaling. RESULTS: We included 728 participants in the analysis. The majority of participants (66.5%, 484/728; 95% CI 0.631-0.699) expressed a positive attitude toward the use of AI-based apps. In particular, participants residing in big cities or small towns (P=.02) and individuals that were familiar with the use of health or fitness apps (P=.02) were significantly more open to mobile diagnostic systems. Hierarchical Bayes estimation of the preferences of participants with a positive attitude (n=484) revealed that the use of mobile apps as an assistance system was preferred. Participants ruled out app versions with an accuracy of ≤65%, apps using data storage without encryption, and systems that did not provide background information about the decision-making process. However, participants did not mind their data being used anonymously for research purposes, nor did they object to the inclusion of clinical patient information in the decision-making process. Maximum difference scaling analysis for the negative-minded participant group (n=244) showed that data security, insufficient trust in the app, and lack of personal interaction represented the dominant concerns with respect to app use. CONCLUSIONS: The majority of potential future users below 35 years of age were ready to accept AI-based diagnostic solutions for early detection of skin cancer. However, for translation into clinical practice, the participants' demands for increased transparency and explainability of AI-based tools seem to be critical. Altogether, digital natives between 18 and 24 years and between 25 and 34 years of age expressed similar preferences and concerns when compared both to each other and to results obtained by previous studies that included other age groups.

Mobile Applications , Skin Neoplasms , Artificial Intelligence , Bayes Theorem , Exercise , Humans , Skin Neoplasms/diagnosis , Young Adult