Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
2.
JAMA Dermatol ; 160(3): 303-311, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38324293

ABSTRACT

Importance: The development of artificial intelligence (AI)-based melanoma classifiers typically calls for large, centralized datasets, requiring hospitals to give away their patient data, which raises serious privacy concerns. To address this concern, decentralized federated learning has been proposed, where classifier development is distributed across hospitals. Objective: To investigate whether a more privacy-preserving federated learning approach can achieve comparable diagnostic performance to a classical centralized (ie, single-model) and ensemble learning approach for AI-based melanoma diagnostics. Design, Setting, and Participants: This multicentric, single-arm diagnostic study developed a federated model for melanoma-nevus classification using histopathological whole-slide images prospectively acquired at 6 German university hospitals between April 2021 and February 2023 and benchmarked it using both a holdout and an external test dataset. Data analysis was performed from February to April 2023. Exposures: All whole-slide images were retrospectively analyzed by an AI-based classifier without influencing routine clinical care. Main Outcomes and Measures: The area under the receiver operating characteristic curve (AUROC) served as the primary end point for evaluating the diagnostic performance. Secondary end points included balanced accuracy, sensitivity, and specificity. Results: The study included 1025 whole-slide images of clinically melanoma-suspicious skin lesions from 923 patients, consisting of 388 histopathologically confirmed invasive melanomas and 637 nevi. The median (range) age at diagnosis was 58 (18-95) years for the training set, 57 (18-93) years for the holdout test dataset, and 61 (18-95) years for the external test dataset; the median (range) Breslow thickness was 0.70 (0.10-34.00) mm, 0.70 (0.20-14.40) mm, and 0.80 (0.30-20.00) mm, respectively. The federated approach (0.8579; 95% CI, 0.7693-0.9299) performed significantly worse than the classical centralized approach (0.9024; 95% CI, 0.8379-0.9565) in terms of AUROC on a holdout test dataset (pairwise Wilcoxon signed-rank, P < .001) but performed significantly better (0.9126; 95% CI, 0.8810-0.9412) than the classical centralized approach (0.9045; 95% CI, 0.8701-0.9331) on an external test dataset (pairwise Wilcoxon signed-rank, P < .001). Notably, the federated approach performed significantly worse than the ensemble approach on both the holdout (0.8867; 95% CI, 0.8103-0.9481) and external test dataset (0.9227; 95% CI, 0.8941-0.9479). Conclusions and Relevance: The findings of this diagnostic study suggest that federated learning is a viable approach for the binary classification of invasive melanomas and nevi on a clinically representative distributed dataset. Federated learning can improve privacy protection in AI-based melanoma diagnostics while simultaneously promoting collaboration across institutions and countries. Moreover, it may have the potential to be extended to other image classification tasks in digital cancer histopathology and beyond.


Subject(s)
Dermatology , Melanoma , Nevus , Skin Neoplasms , Humans , Melanoma/diagnosis , Artificial Intelligence , Retrospective Studies , Skin Neoplasms/diagnosis , Nevus/diagnosis
4.
PLoS One ; 19(1): e0297146, 2024.
Article in English | MEDLINE | ID: mdl-38241314

ABSTRACT

Pathologists routinely use immunohistochemical (IHC)-stained tissue slides against MelanA in addition to hematoxylin and eosin (H&E)-stained slides to improve their accuracy in diagnosing melanomas. The use of diagnostic Deep Learning (DL)-based support systems for automated examination of tissue morphology and cellular composition has been well studied in standard H&E-stained tissue slides. In contrast, there are few studies that analyze IHC slides using DL. Therefore, we investigated the separate and joint performance of ResNets trained on MelanA and corresponding H&E-stained slides. The MelanA classifier achieved an area under receiver operating characteristics curve (AUROC) of 0.82 and 0.74 on out of distribution (OOD)-datasets, similar to the H&E-based benchmark classification of 0.81 and 0.75, respectively. A combined classifier using MelanA and H&E achieved AUROCs of 0.85 and 0.81 on the OOD datasets. DL MelanA-based assistance systems show the same performance as the benchmark H&E classification and may be improved by multi stain classification to assist pathologists in their clinical routine.


Subject(s)
Deep Learning , Melanoma , Humans , Melanoma/diagnosis , Immunohistochemistry , MART-1 Antigen , ROC Curve
5.
Eur J Cancer ; 173: 307-316, 2022 09.
Article in English | MEDLINE | ID: mdl-35973360

ABSTRACT

BACKGROUND: Image-based cancer classifiers suffer from a variety of problems which negatively affect their performance. For example, variation in image brightness or different cameras can already suffice to diminish performance. Ensemble solutions, where multiple model predictions are combined into one, can improve these problems. However, ensembles are computationally intensive and less transparent to practitioners than single model solutions. Constructing model soups, by averaging the weights of multiple models into a single model, could circumvent these limitations while still improving performance. OBJECTIVE: To investigate the performance of model soups for a dermoscopic melanoma-nevus skin cancer classification task with respect to (1) generalisation to images from other clinics, (2) robustness against small image changes and (3) calibration such that the confidences correspond closely to the actual predictive uncertainties. METHODS: We construct model soups by fine-tuning pre-trained models on seven different image resolutions and subsequently averaging their weights. Performance is evaluated on a multi-source dataset including holdout and external components. RESULTS: We find that model soups improve generalisation and calibration on the external component while maintaining performance on the holdout component. For robustness, we observe performance improvements for pertubated test images, while the performance on corrupted test images remains on par. CONCLUSIONS: Overall, souping for skin cancer classifiers has a positive effect on generalisation, robustness and calibration. It is easy for practitioners to implement and by combining multiple models into a single model, complexity is reduced. This could be an important factor in achieving clinical applicability, as less complexity generally means more transparency.


Subject(s)
Melanoma , Skin Neoplasms , Dermoscopy/methods , Humans , Melanoma/diagnostic imaging , Sensitivity and Specificity , Skin Neoplasms/diagnostic imaging , Melanoma, Cutaneous Malignant
6.
Eur J Cancer ; 167: 54-69, 2022 05.
Article in English | MEDLINE | ID: mdl-35390650

ABSTRACT

BACKGROUND: Due to their ability to solve complex problems, deep neural networks (DNNs) are becoming increasingly popular in medical applications. However, decision-making by such algorithms is essentially a black-box process that renders it difficult for physicians to judge whether the decisions are reliable. The use of explainable artificial intelligence (XAI) is often suggested as a solution to this problem. We investigate how XAI is used for skin cancer detection: how is it used during the development of new DNNs? What kinds of visualisations are commonly used? Are there systematic evaluations of XAI with dermatologists or dermatopathologists? METHODS: Google Scholar, PubMed, IEEE Explore, Science Direct and Scopus were searched for peer-reviewed studies published between January 2017 and October 2021 applying XAI to dermatological images: the search terms histopathological image, whole-slide image, clinical image, dermoscopic image, skin, dermatology, explainable, interpretable and XAI were used in various combinations. Only studies concerned with skin cancer were included. RESULTS: 37 publications fulfilled our inclusion criteria. Most studies (19/37) simply applied existing XAI methods to their classifier to interpret its decision-making. Some studies (4/37) proposed new XAI methods or improved upon existing techniques. 14/37 studies addressed specific questions such as bias detection and impact of XAI on man-machine-interactions. However, only three of them evaluated the performance and confidence of humans using CAD systems with XAI. CONCLUSION: XAI is commonly applied during the development of DNNs for skin cancer detection. However, a systematic and rigorous evaluation of its usefulness in this scenario is lacking.


Subject(s)
Artificial Intelligence , Skin Neoplasms , Algorithms , Humans , Neural Networks, Computer , Skin Neoplasms/diagnosis
7.
Eur J Cancer ; 156: 202-216, 2021 10.
Article in English | MEDLINE | ID: mdl-34509059

ABSTRACT

BACKGROUND: Multiple studies have compared the performance of artificial intelligence (AI)-based models for automated skin cancer classification to human experts, thus setting the cornerstone for a successful translation of AI-based tools into clinicopathological practice. OBJECTIVE: The objective of the study was to systematically analyse the current state of research on reader studies involving melanoma and to assess their potential clinical relevance by evaluating three main aspects: test set characteristics (holdout/out-of-distribution data set, composition), test setting (experimental/clinical, inclusion of metadata) and representativeness of participating clinicians. METHODS: PubMed, Medline and ScienceDirect were screened for peer-reviewed studies published between 2017 and 2021 and dealing with AI-based skin cancer classification involving melanoma. The search terms skin cancer classification, deep learning, convolutional neural network (CNN), melanoma (detection), digital biomarkers, histopathology and whole slide imaging were combined. Based on the search results, only studies that considered direct comparison of AI results with clinicians and had a diagnostic classification as their main objective were included. RESULTS: A total of 19 reader studies fulfilled the inclusion criteria. Of these, 11 CNN-based approaches addressed the classification of dermoscopic images; 6 concentrated on the classification of clinical images, whereas 2 dermatopathological studies utilised digitised histopathological whole slide images. CONCLUSIONS: All 19 included studies demonstrated superior or at least equivalent performance of CNN-based classifiers compared with clinicians. However, almost all studies were conducted in highly artificial settings based exclusively on single images of the suspicious lesions. Moreover, test sets mainly consisted of holdout images and did not represent the full range of patient populations and melanoma subtypes encountered in clinical practice.


Subject(s)
Dermatologists , Dermoscopy , Diagnosis, Computer-Assisted , Image Interpretation, Computer-Assisted , Melanoma/pathology , Microscopy , Neural Networks, Computer , Pathologists , Skin Neoplasms/pathology , Automation , Biopsy , Clinical Competence , Deep Learning , Humans , Melanoma/classification , Predictive Value of Tests , Reproducibility of Results , Skin Neoplasms/classification
8.
JMIR Mhealth Uhealth ; 9(8): e22909, 2021 08 27.
Article in English | MEDLINE | ID: mdl-34448722

ABSTRACT

BACKGROUND: Artificial intelligence (AI) has shown potential to improve diagnostics of various diseases, especially for early detection of skin cancer. Studies have yet to investigate the clear application of AI technology in clinical practice or determine the added value for younger user groups. Translation of AI-based diagnostic tools can only be successful if they are accepted by potential users. Young adults as digital natives may offer the greatest potential for successful implementation of AI into clinical practice, while at the same time, representing the future generation of skin cancer screening participants. OBJECTIVE: We conducted an anonymous online survey to examine how and to what extent individuals are willing to accept AI-based mobile apps for skin cancer diagnostics. We evaluated preferences and relative influences of concerns, with a focus on younger age groups. METHODS: We recruited participants below 35 years of age using three social media channels-Facebook, LinkedIn, and Xing. Descriptive analysis and statistical tests were performed to evaluate participants' attitudes toward mobile apps for skin examination. We integrated an adaptive choice-based conjoint to assess participants' preferences. We evaluated potential concerns using maximum difference scaling. RESULTS: We included 728 participants in the analysis. The majority of participants (66.5%, 484/728; 95% CI 0.631-0.699) expressed a positive attitude toward the use of AI-based apps. In particular, participants residing in big cities or small towns (P=.02) and individuals that were familiar with the use of health or fitness apps (P=.02) were significantly more open to mobile diagnostic systems. Hierarchical Bayes estimation of the preferences of participants with a positive attitude (n=484) revealed that the use of mobile apps as an assistance system was preferred. Participants ruled out app versions with an accuracy of ≤65%, apps using data storage without encryption, and systems that did not provide background information about the decision-making process. However, participants did not mind their data being used anonymously for research purposes, nor did they object to the inclusion of clinical patient information in the decision-making process. Maximum difference scaling analysis for the negative-minded participant group (n=244) showed that data security, insufficient trust in the app, and lack of personal interaction represented the dominant concerns with respect to app use. CONCLUSIONS: The majority of potential future users below 35 years of age were ready to accept AI-based diagnostic solutions for early detection of skin cancer. However, for translation into clinical practice, the participants' demands for increased transparency and explainability of AI-based tools seem to be critical. Altogether, digital natives between 18 and 24 years and between 25 and 34 years of age expressed similar preferences and concerns when compared both to each other and to results obtained by previous studies that included other age groups.


Subject(s)
Mobile Applications , Skin Neoplasms , Artificial Intelligence , Bayes Theorem , Exercise , Humans , Skin Neoplasms/diagnosis , Young Adult
9.
Eur J Cancer ; 155: 191-199, 2021 09.
Article in English | MEDLINE | ID: mdl-34388516

ABSTRACT

BACKGROUND: One prominent application for deep learning-based classifiers is skin cancer classification on dermoscopic images. However, classifier evaluation is often limited to holdout data which can mask common shortcomings such as susceptibility to confounding factors. To increase clinical applicability, it is necessary to thoroughly evaluate such classifiers on out-of-distribution (OOD) data. OBJECTIVE: The objective of the study was to establish a dermoscopic skin cancer benchmark in which classifier robustness to OOD data can be measured. METHODS: Using a proprietary dermoscopic image database and a set of image transformations, we create an OOD robustness benchmark and evaluate the robustness of four different convolutional neural network (CNN) architectures on it. RESULTS: The benchmark contains three data sets-Skin Archive Munich (SAM), SAM-corrupted (SAM-C) and SAM-perturbed (SAM-P)-and is publicly available for download. To maintain the benchmark's OOD status, ground truth labels are not provided and test results should be sent to us for assessment. The SAM data set contains 319 unmodified and biopsy-verified dermoscopic melanoma (n = 194) and nevus (n = 125) images. SAM-C and SAM-P contain images from SAM which were artificially modified to test a classifier against low-quality inputs and to measure its prediction stability over small image changes, respectively. All four CNNs showed susceptibility to corruptions and perturbations. CONCLUSIONS: This benchmark provides three data sets which allow for OOD testing of binary skin cancer classifiers. Our classifier performance confirms the shortcomings of CNNs and provides a frame of reference. Altogether, this benchmark should facilitate a more thorough evaluation process and thereby enable the development of more robust skin cancer classifiers.


Subject(s)
Benchmarking/standards , Neural Networks, Computer , Skin Neoplasms/classification , Humans
10.
Eur J Cancer ; 154: 227-234, 2021 09.
Article in English | MEDLINE | ID: mdl-34298373

ABSTRACT

AIM: Sentinel lymph node status is a central prognostic factor for melanomas. However, the surgical excision involves some risks for affected patients. In this study, we therefore aimed to develop a digital biomarker that can predict lymph node metastasis non-invasively from digitised H&E slides of primary melanoma tumours. METHODS: A total of 415 H&E slides from primary melanoma tumours with known sentinel node (SN) status from three German university hospitals and one private pathological practice were digitised (150 SN positive/265 SN negative). Two hundred ninety-one slides were used to train artificial neural networks (ANNs). The remaining 124 slides were used to test the ability of the ANNs to predict sentinel status. ANNs were trained and/or tested on data sets that were matched or not matched between SN-positive and SN-negative cases for patient age, ulceration, and tumour thickness, factors that are known to correlate with lymph node status. RESULTS: The best accuracy was achieved by an ANN that was trained and tested on unmatched cases (61.8% ± 0.2%) area under the receiver operating characteristic (AUROC). In contrast, ANNs that were trained and/or tested on matched cases achieved (55.0% ± 3.5%) AUROC or less. CONCLUSION: Our results indicate that the image classifier can predict lymph node status to some, albeit so far not clinically relevant, extent. It may do so by mostly detecting equivalents of factors on histological slides that are already known to correlate with lymph node status. Our results provide a basis for future research with larger data cohorts.


Subject(s)
Deep Learning , Melanoma/pathology , Sentinel Lymph Node/pathology , Adult , Aged , Humans , Lymphatic Metastasis , Middle Aged
11.
Eur J Cancer ; 149: 94-101, 2021 05.
Article in English | MEDLINE | ID: mdl-33838393

ABSTRACT

BACKGROUND: Clinicians and pathologists traditionally use patient data in addition to clinical examination to support their diagnoses. OBJECTIVES: We investigated whether a combination of histologic whole slides image (WSI) analysis based on convolutional neural networks (CNNs) and commonly available patient data (age, sex and anatomical site of the lesion) in a binary melanoma/nevus classification task could increase the performance compared with CNNs alone. METHODS: We used 431 WSIs from two different laboratories and analysed the performance of classifiers that used the image or patient data individually or three common fusion techniques. Furthermore, we tested a naive combination of patient data and an image classifier: for cases interpreted as 'uncertain' (CNN output score <0.7), the decision of the CNN was replaced by the decision of the patient data classifier. RESULTS: The CNN on its own achieved the best performance (mean ± standard deviation of five individual runs) with AUROC of 92.30% ± 0.23% and balanced accuracy of 83.17% ± 0.38%. While the classification performance was not significantly improved in general by any of the tested fusions, naive strategy of replacing the image classifier with the patient data classifier on slides with low output scores improved balanced accuracy to 86.72% ± 0.36%. CONCLUSION: In most cases, the CNN on its own was so accurate that patient data integration did not provide any benefit. However, incorporating patient data for lesions that were classified by the CNN with low 'confidence' improved balanced accuracy.


Subject(s)
Image Interpretation, Computer-Assisted , Melanoma/pathology , Microscopy , Neural Networks, Computer , Nevus/pathology , Skin Neoplasms/pathology , Adult , Age Factors , Aged , Databases, Factual , Female , Germany , Humans , Male , Melanoma/classification , Middle Aged , Nevus/classification , Predictive Value of Tests , Reproducibility of Results , Retrospective Studies , Sex Factors , Skin Neoplasms/classification
12.
Eur J Cancer ; 145: 81-91, 2021 03.
Article in English | MEDLINE | ID: mdl-33423009

ABSTRACT

BACKGROUND: A basic requirement for artificial intelligence (AI)-based image analysis systems, which are to be integrated into clinical practice, is a high robustness. Minor changes in how those images are acquired, for example, during routine skin cancer screening, should not change the diagnosis of such assistance systems. OBJECTIVE: To quantify to what extent minor image perturbations affect the convolutional neural network (CNN)-mediated skin lesion classification and to evaluate three possible solutions for this problem (additional data augmentation, test-time augmentation, anti-aliasing). METHODS: We trained three commonly used CNN architectures to differentiate between dermoscopic melanoma and nevus images. Subsequently, their performance and susceptibility to minor changes ('brittleness') was tested on two distinct test sets with multiple images per lesion. For the first set, image changes, such as rotations or zooms, were generated artificially. The second set contained natural changes that stemmed from multiple photographs taken of the same lesions. RESULTS: All architectures exhibited brittleness on the artificial and natural test set. The three reviewed methods were able to decrease brittleness to varying degrees while still maintaining performance. The observed improvement was greater for the artificial than for the natural test set, where enhancements were minor. CONCLUSIONS: Minor image changes, relatively inconspicuous for humans, can have an effect on the robustness of CNNs differentiating skin lesions. By the methods tested here, this effect can be reduced, but not fully eliminated. Thus, further research to sustain the performance of AI classifiers is needed to facilitate the translation of such systems into the clinic.


Subject(s)
Dermoscopy , Diagnosis, Computer-Assisted , Image Interpretation, Computer-Assisted , Melanoma/pathology , Neural Networks, Computer , Nevus/pathology , Skin Neoplasms/pathology , Diagnosis, Differential , Humans , Predictive Value of Tests , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...