Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
2.
JAMA Dermatol ; 160(3): 303-311, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38324293

ABSTRACT

Importance: The development of artificial intelligence (AI)-based melanoma classifiers typically calls for large, centralized datasets, requiring hospitals to give away their patient data, which raises serious privacy concerns. To address this concern, decentralized federated learning has been proposed, where classifier development is distributed across hospitals. Objective: To investigate whether a more privacy-preserving federated learning approach can achieve comparable diagnostic performance to a classical centralized (ie, single-model) and ensemble learning approach for AI-based melanoma diagnostics. Design, Setting, and Participants: This multicentric, single-arm diagnostic study developed a federated model for melanoma-nevus classification using histopathological whole-slide images prospectively acquired at 6 German university hospitals between April 2021 and February 2023 and benchmarked it using both a holdout and an external test dataset. Data analysis was performed from February to April 2023. Exposures: All whole-slide images were retrospectively analyzed by an AI-based classifier without influencing routine clinical care. Main Outcomes and Measures: The area under the receiver operating characteristic curve (AUROC) served as the primary end point for evaluating the diagnostic performance. Secondary end points included balanced accuracy, sensitivity, and specificity. Results: The study included 1025 whole-slide images of clinically melanoma-suspicious skin lesions from 923 patients, consisting of 388 histopathologically confirmed invasive melanomas and 637 nevi. The median (range) age at diagnosis was 58 (18-95) years for the training set, 57 (18-93) years for the holdout test dataset, and 61 (18-95) years for the external test dataset; the median (range) Breslow thickness was 0.70 (0.10-34.00) mm, 0.70 (0.20-14.40) mm, and 0.80 (0.30-20.00) mm, respectively. The federated approach (0.8579; 95% CI, 0.7693-0.9299) performed significantly worse than the classical centralized approach (0.9024; 95% CI, 0.8379-0.9565) in terms of AUROC on a holdout test dataset (pairwise Wilcoxon signed-rank, P < .001) but performed significantly better (0.9126; 95% CI, 0.8810-0.9412) than the classical centralized approach (0.9045; 95% CI, 0.8701-0.9331) on an external test dataset (pairwise Wilcoxon signed-rank, P < .001). Notably, the federated approach performed significantly worse than the ensemble approach on both the holdout (0.8867; 95% CI, 0.8103-0.9481) and external test dataset (0.9227; 95% CI, 0.8941-0.9479). Conclusions and Relevance: The findings of this diagnostic study suggest that federated learning is a viable approach for the binary classification of invasive melanomas and nevi on a clinically representative distributed dataset. Federated learning can improve privacy protection in AI-based melanoma diagnostics while simultaneously promoting collaboration across institutions and countries. Moreover, it may have the potential to be extended to other image classification tasks in digital cancer histopathology and beyond.


Subject(s)
Dermatology , Melanoma , Nevus , Skin Neoplasms , Humans , Melanoma/diagnosis , Artificial Intelligence , Retrospective Studies , Skin Neoplasms/diagnosis , Nevus/diagnosis
4.
N Biotechnol ; 76: 106-117, 2023 Sep 25.
Article in English | MEDLINE | ID: mdl-37146681

ABSTRACT

The limited ability of Convolutional Neural Networks to generalize to images from previously unseen domains is a major limitation, in particular, for safety-critical clinical tasks such as dermoscopic skin cancer classification. In order to translate CNN-based applications into the clinic, it is essential that they are able to adapt to domain shifts. Such new conditions can arise through the use of different image acquisition systems or varying lighting conditions. In dermoscopy, shifts can also occur as a change in patient age or occurrence of rare lesion localizations (e.g. palms). These are not prominently represented in most training datasets and can therefore lead to a decrease in performance. In order to verify the generalizability of classification models in real world clinical settings it is crucial to have access to data which mimics such domain shifts. To our knowledge no dermoscopic image dataset exists where such domain shifts are properly described and quantified. We therefore grouped publicly available images from ISIC archive based on their metadata (e.g. acquisition location, lesion localization, patient age) to generate meaningful domains. To verify that these domains are in fact distinct, we used multiple quantification measures to estimate the presence and intensity of domain shifts. Additionally, we analyzed the performance on these domains with and without an unsupervised domain adaptation technique. We observed that in most of our grouped domains, domain shifts in fact exist. Based on our results, we believe these datasets to be helpful for testing the generalization capabilities of dermoscopic skin cancer classifiers.


Subject(s)
Dermoscopy , Skin Neoplasms , Humans , Dermoscopy/methods , Skin Neoplasms/pathology , Neural Networks, Computer
5.
Eur J Cancer ; 173: 307-316, 2022 09.
Article in English | MEDLINE | ID: mdl-35973360

ABSTRACT

BACKGROUND: Image-based cancer classifiers suffer from a variety of problems which negatively affect their performance. For example, variation in image brightness or different cameras can already suffice to diminish performance. Ensemble solutions, where multiple model predictions are combined into one, can improve these problems. However, ensembles are computationally intensive and less transparent to practitioners than single model solutions. Constructing model soups, by averaging the weights of multiple models into a single model, could circumvent these limitations while still improving performance. OBJECTIVE: To investigate the performance of model soups for a dermoscopic melanoma-nevus skin cancer classification task with respect to (1) generalisation to images from other clinics, (2) robustness against small image changes and (3) calibration such that the confidences correspond closely to the actual predictive uncertainties. METHODS: We construct model soups by fine-tuning pre-trained models on seven different image resolutions and subsequently averaging their weights. Performance is evaluated on a multi-source dataset including holdout and external components. RESULTS: We find that model soups improve generalisation and calibration on the external component while maintaining performance on the holdout component. For robustness, we observe performance improvements for pertubated test images, while the performance on corrupted test images remains on par. CONCLUSIONS: Overall, souping for skin cancer classifiers has a positive effect on generalisation, robustness and calibration. It is easy for practitioners to implement and by combining multiple models into a single model, complexity is reduced. This could be an important factor in achieving clinical applicability, as less complexity generally means more transparency.


Subject(s)
Melanoma , Skin Neoplasms , Dermoscopy/methods , Humans , Melanoma/diagnostic imaging , Sensitivity and Specificity , Skin Neoplasms/diagnostic imaging , Melanoma, Cutaneous Malignant
6.
Eur J Cancer ; 167: 54-69, 2022 05.
Article in English | MEDLINE | ID: mdl-35390650

ABSTRACT

BACKGROUND: Due to their ability to solve complex problems, deep neural networks (DNNs) are becoming increasingly popular in medical applications. However, decision-making by such algorithms is essentially a black-box process that renders it difficult for physicians to judge whether the decisions are reliable. The use of explainable artificial intelligence (XAI) is often suggested as a solution to this problem. We investigate how XAI is used for skin cancer detection: how is it used during the development of new DNNs? What kinds of visualisations are commonly used? Are there systematic evaluations of XAI with dermatologists or dermatopathologists? METHODS: Google Scholar, PubMed, IEEE Explore, Science Direct and Scopus were searched for peer-reviewed studies published between January 2017 and October 2021 applying XAI to dermatological images: the search terms histopathological image, whole-slide image, clinical image, dermoscopic image, skin, dermatology, explainable, interpretable and XAI were used in various combinations. Only studies concerned with skin cancer were included. RESULTS: 37 publications fulfilled our inclusion criteria. Most studies (19/37) simply applied existing XAI methods to their classifier to interpret its decision-making. Some studies (4/37) proposed new XAI methods or improved upon existing techniques. 14/37 studies addressed specific questions such as bias detection and impact of XAI on man-machine-interactions. However, only three of them evaluated the performance and confidence of humans using CAD systems with XAI. CONCLUSION: XAI is commonly applied during the development of DNNs for skin cancer detection. However, a systematic and rigorous evaluation of its usefulness in this scenario is lacking.


Subject(s)
Artificial Intelligence , Skin Neoplasms , Algorithms , Humans , Neural Networks, Computer , Skin Neoplasms/diagnosis
7.
Eur J Cancer ; 156: 202-216, 2021 10.
Article in English | MEDLINE | ID: mdl-34509059

ABSTRACT

BACKGROUND: Multiple studies have compared the performance of artificial intelligence (AI)-based models for automated skin cancer classification to human experts, thus setting the cornerstone for a successful translation of AI-based tools into clinicopathological practice. OBJECTIVE: The objective of the study was to systematically analyse the current state of research on reader studies involving melanoma and to assess their potential clinical relevance by evaluating three main aspects: test set characteristics (holdout/out-of-distribution data set, composition), test setting (experimental/clinical, inclusion of metadata) and representativeness of participating clinicians. METHODS: PubMed, Medline and ScienceDirect were screened for peer-reviewed studies published between 2017 and 2021 and dealing with AI-based skin cancer classification involving melanoma. The search terms skin cancer classification, deep learning, convolutional neural network (CNN), melanoma (detection), digital biomarkers, histopathology and whole slide imaging were combined. Based on the search results, only studies that considered direct comparison of AI results with clinicians and had a diagnostic classification as their main objective were included. RESULTS: A total of 19 reader studies fulfilled the inclusion criteria. Of these, 11 CNN-based approaches addressed the classification of dermoscopic images; 6 concentrated on the classification of clinical images, whereas 2 dermatopathological studies utilised digitised histopathological whole slide images. CONCLUSIONS: All 19 included studies demonstrated superior or at least equivalent performance of CNN-based classifiers compared with clinicians. However, almost all studies were conducted in highly artificial settings based exclusively on single images of the suspicious lesions. Moreover, test sets mainly consisted of holdout images and did not represent the full range of patient populations and melanoma subtypes encountered in clinical practice.


Subject(s)
Dermatologists , Dermoscopy , Diagnosis, Computer-Assisted , Image Interpretation, Computer-Assisted , Melanoma/pathology , Microscopy , Neural Networks, Computer , Pathologists , Skin Neoplasms/pathology , Automation , Biopsy , Clinical Competence , Deep Learning , Humans , Melanoma/classification , Predictive Value of Tests , Reproducibility of Results , Skin Neoplasms/classification
8.
Eur J Cancer ; 155: 191-199, 2021 09.
Article in English | MEDLINE | ID: mdl-34388516

ABSTRACT

BACKGROUND: One prominent application for deep learning-based classifiers is skin cancer classification on dermoscopic images. However, classifier evaluation is often limited to holdout data which can mask common shortcomings such as susceptibility to confounding factors. To increase clinical applicability, it is necessary to thoroughly evaluate such classifiers on out-of-distribution (OOD) data. OBJECTIVE: The objective of the study was to establish a dermoscopic skin cancer benchmark in which classifier robustness to OOD data can be measured. METHODS: Using a proprietary dermoscopic image database and a set of image transformations, we create an OOD robustness benchmark and evaluate the robustness of four different convolutional neural network (CNN) architectures on it. RESULTS: The benchmark contains three data sets-Skin Archive Munich (SAM), SAM-corrupted (SAM-C) and SAM-perturbed (SAM-P)-and is publicly available for download. To maintain the benchmark's OOD status, ground truth labels are not provided and test results should be sent to us for assessment. The SAM data set contains 319 unmodified and biopsy-verified dermoscopic melanoma (n = 194) and nevus (n = 125) images. SAM-C and SAM-P contain images from SAM which were artificially modified to test a classifier against low-quality inputs and to measure its prediction stability over small image changes, respectively. All four CNNs showed susceptibility to corruptions and perturbations. CONCLUSIONS: This benchmark provides three data sets which allow for OOD testing of binary skin cancer classifiers. Our classifier performance confirms the shortcomings of CNNs and provides a frame of reference. Altogether, this benchmark should facilitate a more thorough evaluation process and thereby enable the development of more robust skin cancer classifiers.


Subject(s)
Benchmarking/standards , Neural Networks, Computer , Skin Neoplasms/classification , Humans
9.
J Med Internet Res ; 23(7): e20708, 2021 07 02.
Article in English | MEDLINE | ID: mdl-34255646

ABSTRACT

BACKGROUND: Recent years have been witnessing a substantial improvement in the accuracy of skin cancer classification using convolutional neural networks (CNNs). CNNs perform on par with or better than dermatologists with respect to the classification tasks of single images. However, in clinical practice, dermatologists also use other patient data beyond the visual aspects present in a digitized image, further increasing their diagnostic accuracy. Several pilot studies have recently investigated the effects of integrating different subtypes of patient data into CNN-based skin cancer classifiers. OBJECTIVE: This systematic review focuses on the current research investigating the impact of merging information from image features and patient data on the performance of CNN-based skin cancer image classification. This study aims to explore the potential in this field of research by evaluating the types of patient data used, the ways in which the nonimage data are encoded and merged with the image features, and the impact of the integration on the classifier performance. METHODS: Google Scholar, PubMed, MEDLINE, and ScienceDirect were screened for peer-reviewed studies published in English that dealt with the integration of patient data within a CNN-based skin cancer classification. The search terms skin cancer classification, convolutional neural network(s), deep learning, lesions, melanoma, metadata, clinical information, and patient data were combined. RESULTS: A total of 11 publications fulfilled the inclusion criteria. All of them reported an overall improvement in different skin lesion classification tasks with patient data integration. The most commonly used patient data were age, sex, and lesion location. The patient data were mostly one-hot encoded. There were differences in the complexity that the encoded patient data were processed with regarding deep learning methods before and after fusing them with the image features for a combined classifier. CONCLUSIONS: This study indicates the potential benefits of integrating patient data into CNN-based diagnostic algorithms. However, how exactly the individual patient data enhance classification performance, especially in the case of multiclass classification problems, is still unclear. Moreover, a substantial fraction of patient data used by dermatologists remains to be analyzed in the context of CNN-based skin cancer classification. Further exploratory analyses in this promising field may optimize patient data integration into CNN-based skin cancer diagnostics for patients' benefits.


Subject(s)
Melanoma , Skin Neoplasms , Dermoscopy , Humans , Melanoma/diagnosis , Neural Networks, Computer , Skin Neoplasms/diagnosis
10.
J Med Internet Res ; 23(3): e21695, 2021 03 25.
Article in English | MEDLINE | ID: mdl-33764307

ABSTRACT

BACKGROUND: Studies have shown that artificial intelligence achieves similar or better performance than dermatologists in specific dermoscopic image classification tasks. However, artificial intelligence is susceptible to the influence of confounding factors within images (eg, skin markings), which can lead to false diagnoses of cancerous skin lesions. Image segmentation can remove lesion-adjacent confounding factors but greatly change the image representation. OBJECTIVE: The aim of this study was to compare the performance of 2 image classification workflows where images were either segmented or left unprocessed before the subsequent training and evaluation of a binary skin lesion classifier. METHODS: Separate binary skin lesion classifiers (nevus vs melanoma) were trained and evaluated on segmented and unsegmented dermoscopic images. For a more informative result, separate classifiers were trained on 2 distinct training data sets (human against machine [HAM] and International Skin Imaging Collaboration [ISIC]). Each training run was repeated 5 times. The mean performance of the 5 runs was evaluated on a multi-source test set (n=688) consisting of a holdout and an external component. RESULTS: Our findings showed that when trained on HAM, the segmented classifiers showed a higher overall balanced accuracy (75.6% [SD 1.1%]) than the unsegmented classifiers (66.7% [SD 3.2%]), which was significant in 4 out of 5 runs (P<.001). The overall balanced accuracy was numerically higher for the unsegmented ISIC classifiers (78.3% [SD 1.8%]) than for the segmented ISIC classifiers (77.4% [SD 1.5%]), which was significantly different in 1 out of 5 runs (P=.004). CONCLUSIONS: Image segmentation does not result in overall performance decrease but it causes the beneficial removal of lesion-adjacent confounding factors. Thus, it is a viable option to address the negative impact that confounding factors have on deep learning models in dermatology. However, the segmentation step might introduce new pitfalls, which require further investigations.


Subject(s)
Melanoma , Skin Neoplasms , Algorithms , Artificial Intelligence , Dermoscopy , Humans , Neural Networks, Computer , Skin Neoplasms/diagnostic imaging
11.
BJU Int ; 128(3): 352-360, 2021 09.
Article in English | MEDLINE | ID: mdl-33706408

ABSTRACT

OBJECTIVE: To develop a new digital biomarker based on the analysis of primary tumour tissue by a convolutional neural network (CNN) to predict lymph node metastasis (LNM) in a cohort matched for already established risk factors. PATIENTS AND METHODS: Haematoxylin and eosin (H&E) stained primary tumour slides from 218 patients (102 N+; 116 N0), matched for Gleason score, tumour size, venous invasion, perineural invasion and age, who underwent radical prostatectomy were selected to train a CNN and evaluate its ability to predict LN status. RESULTS: With 10 models trained with the same data, a mean area under the receiver operating characteristic curve (AUROC) of 0.68 (95% confidence interval [CI] 0.678-0.682) and a mean balanced accuracy of 61.37% (95% CI 60.05-62.69%) was achieved. The mean sensitivity and specificity was 53.09% (95% CI 49.77-56.41%) and 69.65% (95% CI 68.21-71.1%), respectively. These results were confirmed via cross-validation. The probability score for LNM prediction was significantly higher on image sections from N+ samples (mean [SD] N+ probability score 0.58 [0.17] vs 0.47 [0.15] N0 probability score, P = 0.002). In multivariable analysis, the probability score of the CNN (odds ratio [OR] 1.04 per percentage probability, 95% CI 1.02-1.08; P = 0.04) and lymphovascular invasion (OR 11.73, 95% CI 3.96-35.7; P < 0.001) proved to be independent predictors for LNM. CONCLUSION: In our present study, CNN-based image analyses showed promising results as a potential novel low-cost method to extract relevant prognostic information directly from H&E histology to predict the LN status of patients with prostate cancer. Our ubiquitously available technique might contribute to an improved LN status prediction.


Subject(s)
Deep Learning , Lymphatic Metastasis , Neural Networks, Computer , Prostatic Neoplasms/pathology , Aged , Humans , Male , Middle Aged , Neoplasm Grading , Prognosis , Retrospective Studies
12.
Eur J Cancer ; 145: 81-91, 2021 03.
Article in English | MEDLINE | ID: mdl-33423009

ABSTRACT

BACKGROUND: A basic requirement for artificial intelligence (AI)-based image analysis systems, which are to be integrated into clinical practice, is a high robustness. Minor changes in how those images are acquired, for example, during routine skin cancer screening, should not change the diagnosis of such assistance systems. OBJECTIVE: To quantify to what extent minor image perturbations affect the convolutional neural network (CNN)-mediated skin lesion classification and to evaluate three possible solutions for this problem (additional data augmentation, test-time augmentation, anti-aliasing). METHODS: We trained three commonly used CNN architectures to differentiate between dermoscopic melanoma and nevus images. Subsequently, their performance and susceptibility to minor changes ('brittleness') was tested on two distinct test sets with multiple images per lesion. For the first set, image changes, such as rotations or zooms, were generated artificially. The second set contained natural changes that stemmed from multiple photographs taken of the same lesions. RESULTS: All architectures exhibited brittleness on the artificial and natural test set. The three reviewed methods were able to decrease brittleness to varying degrees while still maintaining performance. The observed improvement was greater for the artificial than for the natural test set, where enhancements were minor. CONCLUSIONS: Minor image changes, relatively inconspicuous for humans, can have an effect on the robustness of CNNs differentiating skin lesions. By the methods tested here, this effect can be reduced, but not fully eliminated. Thus, further research to sustain the performance of AI classifiers is needed to facilitate the translation of such systems into the clinic.


Subject(s)
Dermoscopy , Diagnosis, Computer-Assisted , Image Interpretation, Computer-Assisted , Melanoma/pathology , Neural Networks, Computer , Nevus/pathology , Skin Neoplasms/pathology , Diagnosis, Differential , Humans , Predictive Value of Tests , Reproducibility of Results
14.
J Med Internet Res ; 22(9): e18091, 2020 09 11.
Article in English | MEDLINE | ID: mdl-32915161

ABSTRACT

BACKGROUND: Early detection of melanoma can be lifesaving but this remains a challenge. Recent diagnostic studies have revealed the superiority of artificial intelligence (AI) in classifying dermoscopic images of melanoma and nevi, concluding that these algorithms should assist a dermatologist's diagnoses. OBJECTIVE: The aim of this study was to investigate whether AI support improves the accuracy and overall diagnostic performance of dermatologists in the dichotomous image-based discrimination between melanoma and nevus. METHODS: Twelve board-certified dermatologists were presented disjoint sets of 100 unique dermoscopic images of melanomas and nevi (total of 1200 unique images), and they had to classify the images based on personal experience alone (part I) and with the support of a trained convolutional neural network (CNN, part II). Additionally, dermatologists were asked to rate their confidence in their final decision for each image. RESULTS: While the mean specificity of the dermatologists based on personal experience alone remained almost unchanged (70.6% vs 72.4%; P=.54) with AI support, the mean sensitivity and mean accuracy increased significantly (59.4% vs 74.6%; P=.003 and 65.0% vs 73.6%; P=.002, respectively) with AI support. Out of the 10% (10/94; 95% CI 8.4%-11.8%) of cases where dermatologists were correct and AI was incorrect, dermatologists on average changed to the incorrect answer for 39% (4/10; 95% CI 23.2%-55.6%) of cases. When dermatologists were incorrect and AI was correct (25/94, 27%; 95% CI 24.0%-30.1%), dermatologists changed their answers to the correct answer for 46% (11/25; 95% CI 33.1%-58.4%) of cases. Additionally, the dermatologists' average confidence in their decisions increased when the CNN confirmed their decision and decreased when the CNN disagreed, even when the dermatologists were correct. Reported values are based on the mean of all participants. Whenever absolute values are shown, the denominator and numerator are approximations as every dermatologist ended up rating a varying number of images due to a quality control step. CONCLUSIONS: The findings of our study show that AI support can improve the overall accuracy of the dermatologists in the dichotomous image-based discrimination between melanoma and nevus. This supports the argument for AI-based tools to aid clinicians in skin lesion classification and provides a rationale for studies of such classifiers in real-life settings, wherein clinicians can integrate additional information such as patient age and medical history into their decisions.


Subject(s)
Artificial Intelligence/standards , Dermatologists/standards , Dermoscopy/methods , Diagnostic Imaging/classification , Melanoma/diagnostic imaging , Skin Neoplasms/diagnostic imaging , Humans , Internet , Melanoma/diagnosis , Skin Neoplasms/diagnosis , Surveys and Questionnaires
15.
J Dtsch Dermatol Ges ; 18(11): 1236-1243, 2020 Nov.
Article in English | MEDLINE | ID: mdl-32841508

ABSTRACT

Malignant melanoma is the skin tumor that causes most deaths in Germany. At an early stage, melanoma is well treatable, so early detection is essential. However, the skin cancer screening program in Germany has been criticized because although melanomas have been diagnosed more frequently since introduction of the program, the mortality from malignant melanoma has not decreased. This indicates that the observed increase in melanoma diagnoses be due to overdiagnosis, i.e. to the detection of lesions that would never have created serious health problems for the patients. One of the reasons is the challenging distinction between some benign and malignant lesions. In addition, there may be lesions that are biologically equivocal, and other lesions that are classified as malignant according to current criteria, but that grow so slowly that they would never have posed a threat to patient's life. So far, these "indolent" melanomas cannot be identified reliably due to a lack of biomarkers. Moreover, the likelihood that an in-situ melanoma will progress to an invasive tumor still cannot be determined with any certainty. When benign lesions are diagnosed as melanoma, the consequences are unnecessary psychological and physical stress for the affected patients and incurred therapy costs. Vice versa, underdiagnoses in the sense of overlooked melanomas can adversely affect patients' prognoses and may necessitate more intense therapies. Novel diagnostic options could reduce the number of over- and underdiagnoses and contribute to more objective diagnoses in borderline cases. One strategy that has yielded promising results in pilot studies is the use of artificial intelligence-based diagnostic tools. However, these applications still await translation into clinical and pathological routine.


Subject(s)
Melanoma , Skin Neoplasms , Artificial Intelligence , Germany , Humans , Medical Overuse
16.
Front Med (Lausanne) ; 7: 233, 2020.
Article in English | MEDLINE | ID: mdl-32671078

ABSTRACT

Background: Artificial intelligence (AI) has shown promise in numerous experimental studies, particularly in skin cancer diagnostics. Translation of these findings into the clinic is the logical next step. This translation can only be successful if patients' concerns and questions are addressed suitably. We therefore conducted a survey to evaluate the patients' view of artificial intelligence in melanoma diagnostics in Germany, with a particular focus on patients with a history of melanoma. Participants and Methods: A web-based questionnaire was designed using LimeSurvey, sent by e-mail to university hospitals and melanoma support groups and advertised on social media. The anonymous questionnaire evaluated patients' expectations and concerns toward artificial intelligence in general as well as their attitudes toward different application scenarios. Descriptive analysis was performed with expression of categorical variables as percentages and 95% confidence intervals. Statistical tests were performed to investigate associations between sociodemographic data and selected items of the questionnaire. Results: 298 individuals (154 with a melanoma diagnosis, 143 without) responded to the questionnaire. About 94% [95% CI = 0.91-0.97] of respondents supported the use of artificial intelligence in medical approaches. 88% [95% CI = 0.85-0.92] would even make their own health data anonymously available for the further development of AI-based applications in medicine. Only 41% [95% CI = 0.35-0.46] of respondents were amenable to the use of artificial intelligence as stand-alone system, 94% [95% CI = 0.92-0.97] to its use as assistance system for physicians. In sub-group analyses, only minor differences were detectable. Respondents with a previous history of melanoma were more amenable to the use of AI applications for early detection even at home. They would prefer an application scenario where physician and AI classify the lesions independently. With respect to AI-based applications in medicine, patients were concerned about insufficient data protection, impersonality and susceptibility to errors, but expected faster, more precise and unbiased diagnostics, less diagnostic errors and support for physicians. Conclusions: The vast majority of participants exhibited a positive attitude toward the use of artificial intelligence in melanoma diagnostics, especially as an assistance system.

17.
Eur J Cancer ; 120: 114-121, 2019 10.
Article in English | MEDLINE | ID: mdl-31518967

ABSTRACT

BACKGROUND: In recent studies, convolutional neural networks (CNNs) outperformed dermatologists in distinguishing dermoscopic images of melanoma and nevi. In these studies, dermatologists and artificial intelligence were considered as opponents. However, the combination of classifiers frequently yields superior results, both in machine learning and among humans. In this study, we investigated the potential benefit of combining human and artificial intelligence for skin cancer classification. METHODS: Using 11,444 dermoscopic images, which were divided into five diagnostic categories, novel deep learning techniques were used to train a single CNN. Then, both 112 dermatologists of 13 German university hospitals and the trained CNN independently classified a set of 300 biopsy-verified skin lesions into those five classes. Taking into account the certainty of the decisions, the two independently determined diagnoses were combined to a new classifier with the help of a gradient boosting method. The primary end-point of the study was the correct classification of the images into five designated categories, whereas the secondary end-point was the correct classification of lesions as either benign or malignant (binary classification). FINDINGS: Regarding the multiclass task, the combination of man and machine achieved an accuracy of 82.95%. This was 1.36% higher than the best of the two individual classifiers (81.59% achieved by the CNN). Owing to the class imbalance in the binary problem, sensitivity, but not accuracy, was examined and demonstrated to be superior (89%) to the best individual classifier (CNN with 86.1%). The specificity in the combined classifier decreased from 89.2% to 84%. However, at an equal sensitivity of 89%, the CNN achieved a specificity of only 81.5% INTERPRETATION: Our findings indicate that the combination of human and artificial intelligence achieves superior results over the independent results of both of these systems.


Subject(s)
Algorithms , Deep Learning , Dermatologists/statistics & numerical data , Dermoscopy/methods , Skin Neoplasms/classification , Skin Neoplasms/diagnosis , Humans , Image Interpretation, Computer-Assisted , Neural Networks, Computer , Observer Variation , Prognosis
18.
Eur J Cancer ; 119: 57-65, 2019 09.
Article in English | MEDLINE | ID: mdl-31419752

ABSTRACT

BACKGROUND: Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be taken into account. METHODS: Using 11,444 dermoscopic images, which covered dermatologic diagnoses comprising the majority of commonly pigmented skin lesions commonly faced in skin cancer screenings, a CNN was trained through novel deep learning techniques. A test set of 300 biopsy-verified images was used to compare the classifier's performance with that of 112 dermatologists from 13 German university hospitals. The primary end-point was the correct classification of the different lesions into benign and malignant. The secondary end-point was the correct classification of the images into one of the five diagnostic categories. FINDINGS: Sensitivity and specificity of dermatologists for the primary end-point were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5-97.1%). For the secondary end-point, the mean sensitivity and specificity of the dermatologists were at 56.5% (95% CI: 42.8-70.2%) and 89.2% (95% CI: 85.0-93.3%), respectively. At equal sensitivity, the algorithm achieved a specificity of 98.8%. Two-sided McNemar tests revealed significance for the primary end-point (p < 0.001). For the secondary end-point, outperformance (p < 0.001) was achieved except for basal cell carcinoma (on-par performance). INTERPRETATION: Our findings show that automated classification of dermoscopic melanoma and nevi images is extendable to a multiclass classification problem, thus better reflecting clinical differential diagnoses, while still outperforming dermatologists at a significant level (p < 0.001).


Subject(s)
Dermatologists/statistics & numerical data , Dermoscopy/methods , Melanoma/diagnostic imaging , Neural Networks, Computer , Nevus/diagnostic imaging , Skin Neoplasms/diagnostic imaging , Algorithms , Biopsy , Diagnosis, Differential , Female , Hospitals, University , Humans , Male , Melanoma/pathology , Nevus/pathology , Sensitivity and Specificity , Skin Neoplasms/classification , Skin Neoplasms/pathology , Surveys and Questionnaires
SELECTION OF CITATIONS
SEARCH DETAIL
...