Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
2.
JAMA Dermatol ; 160(3): 303-311, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38324293

RESUMO

Importance: The development of artificial intelligence (AI)-based melanoma classifiers typically calls for large, centralized datasets, requiring hospitals to give away their patient data, which raises serious privacy concerns. To address this concern, decentralized federated learning has been proposed, where classifier development is distributed across hospitals. Objective: To investigate whether a more privacy-preserving federated learning approach can achieve comparable diagnostic performance to a classical centralized (ie, single-model) and ensemble learning approach for AI-based melanoma diagnostics. Design, Setting, and Participants: This multicentric, single-arm diagnostic study developed a federated model for melanoma-nevus classification using histopathological whole-slide images prospectively acquired at 6 German university hospitals between April 2021 and February 2023 and benchmarked it using both a holdout and an external test dataset. Data analysis was performed from February to April 2023. Exposures: All whole-slide images were retrospectively analyzed by an AI-based classifier without influencing routine clinical care. Main Outcomes and Measures: The area under the receiver operating characteristic curve (AUROC) served as the primary end point for evaluating the diagnostic performance. Secondary end points included balanced accuracy, sensitivity, and specificity. Results: The study included 1025 whole-slide images of clinically melanoma-suspicious skin lesions from 923 patients, consisting of 388 histopathologically confirmed invasive melanomas and 637 nevi. The median (range) age at diagnosis was 58 (18-95) years for the training set, 57 (18-93) years for the holdout test dataset, and 61 (18-95) years for the external test dataset; the median (range) Breslow thickness was 0.70 (0.10-34.00) mm, 0.70 (0.20-14.40) mm, and 0.80 (0.30-20.00) mm, respectively. The federated approach (0.8579; 95% CI, 0.7693-0.9299) performed significantly worse than the classical centralized approach (0.9024; 95% CI, 0.8379-0.9565) in terms of AUROC on a holdout test dataset (pairwise Wilcoxon signed-rank, P < .001) but performed significantly better (0.9126; 95% CI, 0.8810-0.9412) than the classical centralized approach (0.9045; 95% CI, 0.8701-0.9331) on an external test dataset (pairwise Wilcoxon signed-rank, P < .001). Notably, the federated approach performed significantly worse than the ensemble approach on both the holdout (0.8867; 95% CI, 0.8103-0.9481) and external test dataset (0.9227; 95% CI, 0.8941-0.9479). Conclusions and Relevance: The findings of this diagnostic study suggest that federated learning is a viable approach for the binary classification of invasive melanomas and nevi on a clinically representative distributed dataset. Federated learning can improve privacy protection in AI-based melanoma diagnostics while simultaneously promoting collaboration across institutions and countries. Moreover, it may have the potential to be extended to other image classification tasks in digital cancer histopathology and beyond.


Assuntos
Dermatologia , Melanoma , Nevo , Neoplasias Cutâneas , Humanos , Melanoma/diagnóstico , Inteligência Artificial , Estudos Retrospectivos , Neoplasias Cutâneas/diagnóstico , Nevo/diagnóstico
4.
Eur J Cancer ; 173: 307-316, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35973360

RESUMO

BACKGROUND: Image-based cancer classifiers suffer from a variety of problems which negatively affect their performance. For example, variation in image brightness or different cameras can already suffice to diminish performance. Ensemble solutions, where multiple model predictions are combined into one, can improve these problems. However, ensembles are computationally intensive and less transparent to practitioners than single model solutions. Constructing model soups, by averaging the weights of multiple models into a single model, could circumvent these limitations while still improving performance. OBJECTIVE: To investigate the performance of model soups for a dermoscopic melanoma-nevus skin cancer classification task with respect to (1) generalisation to images from other clinics, (2) robustness against small image changes and (3) calibration such that the confidences correspond closely to the actual predictive uncertainties. METHODS: We construct model soups by fine-tuning pre-trained models on seven different image resolutions and subsequently averaging their weights. Performance is evaluated on a multi-source dataset including holdout and external components. RESULTS: We find that model soups improve generalisation and calibration on the external component while maintaining performance on the holdout component. For robustness, we observe performance improvements for pertubated test images, while the performance on corrupted test images remains on par. CONCLUSIONS: Overall, souping for skin cancer classifiers has a positive effect on generalisation, robustness and calibration. It is easy for practitioners to implement and by combining multiple models into a single model, complexity is reduced. This could be an important factor in achieving clinical applicability, as less complexity generally means more transparency.


Assuntos
Melanoma , Neoplasias Cutâneas , Dermoscopia/métodos , Humanos , Melanoma/diagnóstico por imagem , Sensibilidade e Especificidade , Neoplasias Cutâneas/diagnóstico por imagem , Melanoma Maligno Cutâneo
6.
Eur J Cancer ; 167: 54-69, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35390650

RESUMO

BACKGROUND: Due to their ability to solve complex problems, deep neural networks (DNNs) are becoming increasingly popular in medical applications. However, decision-making by such algorithms is essentially a black-box process that renders it difficult for physicians to judge whether the decisions are reliable. The use of explainable artificial intelligence (XAI) is often suggested as a solution to this problem. We investigate how XAI is used for skin cancer detection: how is it used during the development of new DNNs? What kinds of visualisations are commonly used? Are there systematic evaluations of XAI with dermatologists or dermatopathologists? METHODS: Google Scholar, PubMed, IEEE Explore, Science Direct and Scopus were searched for peer-reviewed studies published between January 2017 and October 2021 applying XAI to dermatological images: the search terms histopathological image, whole-slide image, clinical image, dermoscopic image, skin, dermatology, explainable, interpretable and XAI were used in various combinations. Only studies concerned with skin cancer were included. RESULTS: 37 publications fulfilled our inclusion criteria. Most studies (19/37) simply applied existing XAI methods to their classifier to interpret its decision-making. Some studies (4/37) proposed new XAI methods or improved upon existing techniques. 14/37 studies addressed specific questions such as bias detection and impact of XAI on man-machine-interactions. However, only three of them evaluated the performance and confidence of humans using CAD systems with XAI. CONCLUSION: XAI is commonly applied during the development of DNNs for skin cancer detection. However, a systematic and rigorous evaluation of its usefulness in this scenario is lacking.


Assuntos
Inteligência Artificial , Neoplasias Cutâneas , Algoritmos , Humanos , Redes Neurais de Computação , Neoplasias Cutâneas/diagnóstico
8.
Eur J Cancer ; 160: 80-91, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34810047

RESUMO

BACKGROUND: Over the past decade, the development of molecular high-throughput methods (omics) increased rapidly and provided new insights for cancer research. In parallel, deep learning approaches revealed the enormous potential for medical image analysis, especially in digital pathology. Combining image and omics data with deep learning tools may enable the discovery of new cancer biomarkers and a more precise prediction of patient prognosis. This systematic review addresses different multimodal fusion methods of convolutional neural network-based image analyses with omics data, focussing on the impact of data combination on the classification performance. METHODS: PubMed was screened for peer-reviewed articles published in English between January 2015 and June 2021 by two independent researchers. Search terms related to deep learning, digital pathology, omics, and multimodal fusion were combined. RESULTS: We identified a total of 11 studies meeting the inclusion criteria, namely studies that used convolutional neural networks for haematoxylin and eosin image analysis of patients with cancer in combination with integrated omics data. Publications were categorised according to their endpoints: 7 studies focused on survival analysis and 4 studies on prediction of cancer subtypes, malignancy or microsatellite instability with spatial analysis. CONCLUSIONS: Image-based classifiers already show high performances in prognostic and predictive cancer diagnostics. The integration of omics data led to improved performance in all studies described here. However, these are very early studies that still require external validation to demonstrate their generalisability and robustness. Further and more comprehensive studies with larger sample sizes are needed to evaluate performance and determine clinical benefits.


Assuntos
Aprendizado Profundo/normas , Genômica/métodos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias/genética , Humanos , Neoplasias/patologia
9.
Eur J Cancer ; 156: 202-216, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34509059

RESUMO

BACKGROUND: Multiple studies have compared the performance of artificial intelligence (AI)-based models for automated skin cancer classification to human experts, thus setting the cornerstone for a successful translation of AI-based tools into clinicopathological practice. OBJECTIVE: The objective of the study was to systematically analyse the current state of research on reader studies involving melanoma and to assess their potential clinical relevance by evaluating three main aspects: test set characteristics (holdout/out-of-distribution data set, composition), test setting (experimental/clinical, inclusion of metadata) and representativeness of participating clinicians. METHODS: PubMed, Medline and ScienceDirect were screened for peer-reviewed studies published between 2017 and 2021 and dealing with AI-based skin cancer classification involving melanoma. The search terms skin cancer classification, deep learning, convolutional neural network (CNN), melanoma (detection), digital biomarkers, histopathology and whole slide imaging were combined. Based on the search results, only studies that considered direct comparison of AI results with clinicians and had a diagnostic classification as their main objective were included. RESULTS: A total of 19 reader studies fulfilled the inclusion criteria. Of these, 11 CNN-based approaches addressed the classification of dermoscopic images; 6 concentrated on the classification of clinical images, whereas 2 dermatopathological studies utilised digitised histopathological whole slide images. CONCLUSIONS: All 19 included studies demonstrated superior or at least equivalent performance of CNN-based classifiers compared with clinicians. However, almost all studies were conducted in highly artificial settings based exclusively on single images of the suspicious lesions. Moreover, test sets mainly consisted of holdout images and did not represent the full range of patient populations and melanoma subtypes encountered in clinical practice.


Assuntos
Dermatologistas , Dermoscopia , Diagnóstico por Computador , Interpretação de Imagem Assistida por Computador , Melanoma/patologia , Microscopia , Redes Neurais de Computação , Patologistas , Neoplasias Cutâneas/patologia , Automação , Biópsia , Competência Clínica , Aprendizado Profundo , Humanos , Melanoma/classificação , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Neoplasias Cutâneas/classificação
10.
Eur J Cancer ; 155: 200-215, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34391053

RESUMO

BACKGROUND: Gastrointestinal cancers account for approximately 20% of all cancer diagnoses and are responsible for 22.5% of cancer deaths worldwide. Artificial intelligence-based diagnostic support systems, in particular convolutional neural network (CNN)-based image analysis tools, have shown great potential in medical computer vision. In this systematic review, we summarise recent studies reporting CNN-based approaches for digital biomarkers for characterization and prognostication of gastrointestinal cancer pathology. METHODS: Pubmed and Medline were screened for peer-reviewed papers dealing with CNN-based gastrointestinal cancer analyses from histological slides, published between 2015 and 2020.Seven hundred and ninety titles and abstracts were screened, and 58 full-text articles were assessed for eligibility. RESULTS: Sixteen publications fulfilled our inclusion criteria dealing with tumor or precursor lesion characterization or prognostic and predictive biomarkers: 14 studies on colorectal or rectal cancer, three studies on gastric cancer and none on esophageal cancer. These studies were categorised according to their end-points: polyp characterization, tumor characterization and patient outcome. Regarding the translation into clinical practice, we identified several studies demonstrating generalization of the classifier with external tests and comparisons with pathologists, but none presenting clinical implementation. CONCLUSIONS: Results of recent studies on CNN-based image analysis in gastrointestinal cancer pathology are promising, but studies were conducted in observational and retrospective settings. Large-scale trials are needed to assess performance and predict clinical usefulness. Furthermore, large-scale trials are required for approval of CNN-based prediction models as medical devices.


Assuntos
Aprendizado Profundo/normas , Neoplasias Gastrointestinais/classificação , Neoplasias Gastrointestinais/patologia , Humanos , Resultado do Tratamento
11.
Eur J Cancer ; 155: 191-199, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34388516

RESUMO

BACKGROUND: One prominent application for deep learning-based classifiers is skin cancer classification on dermoscopic images. However, classifier evaluation is often limited to holdout data which can mask common shortcomings such as susceptibility to confounding factors. To increase clinical applicability, it is necessary to thoroughly evaluate such classifiers on out-of-distribution (OOD) data. OBJECTIVE: The objective of the study was to establish a dermoscopic skin cancer benchmark in which classifier robustness to OOD data can be measured. METHODS: Using a proprietary dermoscopic image database and a set of image transformations, we create an OOD robustness benchmark and evaluate the robustness of four different convolutional neural network (CNN) architectures on it. RESULTS: The benchmark contains three data sets-Skin Archive Munich (SAM), SAM-corrupted (SAM-C) and SAM-perturbed (SAM-P)-and is publicly available for download. To maintain the benchmark's OOD status, ground truth labels are not provided and test results should be sent to us for assessment. The SAM data set contains 319 unmodified and biopsy-verified dermoscopic melanoma (n = 194) and nevus (n = 125) images. SAM-C and SAM-P contain images from SAM which were artificially modified to test a classifier against low-quality inputs and to measure its prediction stability over small image changes, respectively. All four CNNs showed susceptibility to corruptions and perturbations. CONCLUSIONS: This benchmark provides three data sets which allow for OOD testing of binary skin cancer classifiers. Our classifier performance confirms the shortcomings of CNNs and provides a frame of reference. Altogether, this benchmark should facilitate a more thorough evaluation process and thereby enable the development of more robust skin cancer classifiers.


Assuntos
Benchmarking/normas , Redes Neurais de Computação , Neoplasias Cutâneas/classificação , Humanos
12.
J Med Internet Res ; 23(7): e20708, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-34255646

RESUMO

BACKGROUND: Recent years have been witnessing a substantial improvement in the accuracy of skin cancer classification using convolutional neural networks (CNNs). CNNs perform on par with or better than dermatologists with respect to the classification tasks of single images. However, in clinical practice, dermatologists also use other patient data beyond the visual aspects present in a digitized image, further increasing their diagnostic accuracy. Several pilot studies have recently investigated the effects of integrating different subtypes of patient data into CNN-based skin cancer classifiers. OBJECTIVE: This systematic review focuses on the current research investigating the impact of merging information from image features and patient data on the performance of CNN-based skin cancer image classification. This study aims to explore the potential in this field of research by evaluating the types of patient data used, the ways in which the nonimage data are encoded and merged with the image features, and the impact of the integration on the classifier performance. METHODS: Google Scholar, PubMed, MEDLINE, and ScienceDirect were screened for peer-reviewed studies published in English that dealt with the integration of patient data within a CNN-based skin cancer classification. The search terms skin cancer classification, convolutional neural network(s), deep learning, lesions, melanoma, metadata, clinical information, and patient data were combined. RESULTS: A total of 11 publications fulfilled the inclusion criteria. All of them reported an overall improvement in different skin lesion classification tasks with patient data integration. The most commonly used patient data were age, sex, and lesion location. The patient data were mostly one-hot encoded. There were differences in the complexity that the encoded patient data were processed with regarding deep learning methods before and after fusing them with the image features for a combined classifier. CONCLUSIONS: This study indicates the potential benefits of integrating patient data into CNN-based diagnostic algorithms. However, how exactly the individual patient data enhance classification performance, especially in the case of multiclass classification problems, is still unclear. Moreover, a substantial fraction of patient data used by dermatologists remains to be analyzed in the context of CNN-based skin cancer classification. Further exploratory analyses in this promising field may optimize patient data integration into CNN-based skin cancer diagnostics for patients' benefits.


Assuntos
Melanoma , Neoplasias Cutâneas , Dermoscopia , Humanos , Melanoma/diagnóstico , Redes Neurais de Computação , Neoplasias Cutâneas/diagnóstico
13.
Eur J Cancer ; 154: 227-234, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34298373

RESUMO

AIM: Sentinel lymph node status is a central prognostic factor for melanomas. However, the surgical excision involves some risks for affected patients. In this study, we therefore aimed to develop a digital biomarker that can predict lymph node metastasis non-invasively from digitised H&E slides of primary melanoma tumours. METHODS: A total of 415 H&E slides from primary melanoma tumours with known sentinel node (SN) status from three German university hospitals and one private pathological practice were digitised (150 SN positive/265 SN negative). Two hundred ninety-one slides were used to train artificial neural networks (ANNs). The remaining 124 slides were used to test the ability of the ANNs to predict sentinel status. ANNs were trained and/or tested on data sets that were matched or not matched between SN-positive and SN-negative cases for patient age, ulceration, and tumour thickness, factors that are known to correlate with lymph node status. RESULTS: The best accuracy was achieved by an ANN that was trained and tested on unmatched cases (61.8% ± 0.2%) area under the receiver operating characteristic (AUROC). In contrast, ANNs that were trained and/or tested on matched cases achieved (55.0% ± 3.5%) AUROC or less. CONCLUSION: Our results indicate that the image classifier can predict lymph node status to some, albeit so far not clinically relevant, extent. It may do so by mostly detecting equivalents of factors on histological slides that are already known to correlate with lymph node status. Our results provide a basis for future research with larger data cohorts.


Assuntos
Aprendizado Profundo , Melanoma/patologia , Linfonodo Sentinela/patologia , Adulto , Idoso , Humanos , Metástase Linfática , Pessoa de Meia-Idade
14.
Eur J Cancer ; 149: 94-101, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33838393

RESUMO

BACKGROUND: Clinicians and pathologists traditionally use patient data in addition to clinical examination to support their diagnoses. OBJECTIVES: We investigated whether a combination of histologic whole slides image (WSI) analysis based on convolutional neural networks (CNNs) and commonly available patient data (age, sex and anatomical site of the lesion) in a binary melanoma/nevus classification task could increase the performance compared with CNNs alone. METHODS: We used 431 WSIs from two different laboratories and analysed the performance of classifiers that used the image or patient data individually or three common fusion techniques. Furthermore, we tested a naive combination of patient data and an image classifier: for cases interpreted as 'uncertain' (CNN output score <0.7), the decision of the CNN was replaced by the decision of the patient data classifier. RESULTS: The CNN on its own achieved the best performance (mean ± standard deviation of five individual runs) with AUROC of 92.30% ± 0.23% and balanced accuracy of 83.17% ± 0.38%. While the classification performance was not significantly improved in general by any of the tested fusions, naive strategy of replacing the image classifier with the patient data classifier on slides with low output scores improved balanced accuracy to 86.72% ± 0.36%. CONCLUSION: In most cases, the CNN on its own was so accurate that patient data integration did not provide any benefit. However, incorporating patient data for lesions that were classified by the CNN with low 'confidence' improved balanced accuracy.


Assuntos
Interpretação de Imagem Assistida por Computador , Melanoma/patologia , Microscopia , Redes Neurais de Computação , Nevo/patologia , Neoplasias Cutâneas/patologia , Adulto , Fatores Etários , Idoso , Bases de Dados Factuais , Feminino , Alemanha , Humanos , Masculino , Melanoma/classificação , Pessoa de Meia-Idade , Nevo/classificação , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Estudos Retrospectivos , Fatores Sexuais , Neoplasias Cutâneas/classificação
15.
J Med Internet Res ; 23(3): e21695, 2021 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-33764307

RESUMO

BACKGROUND: Studies have shown that artificial intelligence achieves similar or better performance than dermatologists in specific dermoscopic image classification tasks. However, artificial intelligence is susceptible to the influence of confounding factors within images (eg, skin markings), which can lead to false diagnoses of cancerous skin lesions. Image segmentation can remove lesion-adjacent confounding factors but greatly change the image representation. OBJECTIVE: The aim of this study was to compare the performance of 2 image classification workflows where images were either segmented or left unprocessed before the subsequent training and evaluation of a binary skin lesion classifier. METHODS: Separate binary skin lesion classifiers (nevus vs melanoma) were trained and evaluated on segmented and unsegmented dermoscopic images. For a more informative result, separate classifiers were trained on 2 distinct training data sets (human against machine [HAM] and International Skin Imaging Collaboration [ISIC]). Each training run was repeated 5 times. The mean performance of the 5 runs was evaluated on a multi-source test set (n=688) consisting of a holdout and an external component. RESULTS: Our findings showed that when trained on HAM, the segmented classifiers showed a higher overall balanced accuracy (75.6% [SD 1.1%]) than the unsegmented classifiers (66.7% [SD 3.2%]), which was significant in 4 out of 5 runs (P<.001). The overall balanced accuracy was numerically higher for the unsegmented ISIC classifiers (78.3% [SD 1.8%]) than for the segmented ISIC classifiers (77.4% [SD 1.5%]), which was significantly different in 1 out of 5 runs (P=.004). CONCLUSIONS: Image segmentation does not result in overall performance decrease but it causes the beneficial removal of lesion-adjacent confounding factors. Thus, it is a viable option to address the negative impact that confounding factors have on deep learning models in dermatology. However, the segmentation step might introduce new pitfalls, which require further investigations.


Assuntos
Melanoma , Neoplasias Cutâneas , Algoritmos , Inteligência Artificial , Dermoscopia , Humanos , Redes Neurais de Computação , Neoplasias Cutâneas/diagnóstico por imagem
16.
BJU Int ; 128(3): 352-360, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33706408

RESUMO

OBJECTIVE: To develop a new digital biomarker based on the analysis of primary tumour tissue by a convolutional neural network (CNN) to predict lymph node metastasis (LNM) in a cohort matched for already established risk factors. PATIENTS AND METHODS: Haematoxylin and eosin (H&E) stained primary tumour slides from 218 patients (102 N+; 116 N0), matched for Gleason score, tumour size, venous invasion, perineural invasion and age, who underwent radical prostatectomy were selected to train a CNN and evaluate its ability to predict LN status. RESULTS: With 10 models trained with the same data, a mean area under the receiver operating characteristic curve (AUROC) of 0.68 (95% confidence interval [CI] 0.678-0.682) and a mean balanced accuracy of 61.37% (95% CI 60.05-62.69%) was achieved. The mean sensitivity and specificity was 53.09% (95% CI 49.77-56.41%) and 69.65% (95% CI 68.21-71.1%), respectively. These results were confirmed via cross-validation. The probability score for LNM prediction was significantly higher on image sections from N+ samples (mean [SD] N+ probability score 0.58 [0.17] vs 0.47 [0.15] N0 probability score, P = 0.002). In multivariable analysis, the probability score of the CNN (odds ratio [OR] 1.04 per percentage probability, 95% CI 1.02-1.08; P = 0.04) and lymphovascular invasion (OR 11.73, 95% CI 3.96-35.7; P < 0.001) proved to be independent predictors for LNM. CONCLUSION: In our present study, CNN-based image analyses showed promising results as a potential novel low-cost method to extract relevant prognostic information directly from H&E histology to predict the LN status of patients with prostate cancer. Our ubiquitously available technique might contribute to an improved LN status prediction.


Assuntos
Aprendizado Profundo , Metástase Linfática , Redes Neurais de Computação , Neoplasias da Próstata/patologia , Idoso , Humanos , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Prognóstico , Estudos Retrospectivos
17.
J Med Internet Res ; 23(2): e23436, 2021 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-33528370

RESUMO

BACKGROUND: An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole slide images, which requires large and diverse data sets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects that compromise the accuracy of classification systems. OBJECTIVE: The objective of the study was to analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, and scanner type) that are commonly found in whole slide image data sets in digital pathology and could create batch effects. METHODS: We trained four separate convolutional neural networks (CNNs) to learn four variables using a data set of digitized whole slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of the 95% confidence interval of its mean balanced accuracy was above 50.0%. RESULTS: A mean balanced accuracy above 50.0% was achieved for all four tasks, even when considering the lower bound of the 95% confidence interval. Performance between tasks showed wide variation, ranging from 56.1% (slide preparation date) to 100% (slide origin). CONCLUSIONS: Because all of the analyzed hidden variables are learnable, they have the potential to create batch effects in dermatopathology data sets, which negatively affect AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effect variables in their data sets through sufficient data set stratification.


Assuntos
Inteligência Artificial/normas , Aprendizado Profundo/normas , Redes Neurais de Computação , Patologia/métodos , Humanos
18.
Eur J Cancer ; 145: 81-91, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33423009

RESUMO

BACKGROUND: A basic requirement for artificial intelligence (AI)-based image analysis systems, which are to be integrated into clinical practice, is a high robustness. Minor changes in how those images are acquired, for example, during routine skin cancer screening, should not change the diagnosis of such assistance systems. OBJECTIVE: To quantify to what extent minor image perturbations affect the convolutional neural network (CNN)-mediated skin lesion classification and to evaluate three possible solutions for this problem (additional data augmentation, test-time augmentation, anti-aliasing). METHODS: We trained three commonly used CNN architectures to differentiate between dermoscopic melanoma and nevus images. Subsequently, their performance and susceptibility to minor changes ('brittleness') was tested on two distinct test sets with multiple images per lesion. For the first set, image changes, such as rotations or zooms, were generated artificially. The second set contained natural changes that stemmed from multiple photographs taken of the same lesions. RESULTS: All architectures exhibited brittleness on the artificial and natural test set. The three reviewed methods were able to decrease brittleness to varying degrees while still maintaining performance. The observed improvement was greater for the artificial than for the natural test set, where enhancements were minor. CONCLUSIONS: Minor image changes, relatively inconspicuous for humans, can have an effect on the robustness of CNNs differentiating skin lesions. By the methods tested here, this effect can be reduced, but not fully eliminated. Thus, further research to sustain the performance of AI classifiers is needed to facilitate the translation of such systems into the clinic.


Assuntos
Dermoscopia , Diagnóstico por Computador , Interpretação de Imagem Assistida por Computador , Melanoma/patologia , Redes Neurais de Computação , Nevo/patologia , Neoplasias Cutâneas/patologia , Diagnóstico Diferencial , Humanos , Valor Preditivo dos Testes , Reprodutibilidade dos Testes
19.
J Med Internet Res ; 22(9): e18091, 2020 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-32915161

RESUMO

BACKGROUND: Early detection of melanoma can be lifesaving but this remains a challenge. Recent diagnostic studies have revealed the superiority of artificial intelligence (AI) in classifying dermoscopic images of melanoma and nevi, concluding that these algorithms should assist a dermatologist's diagnoses. OBJECTIVE: The aim of this study was to investigate whether AI support improves the accuracy and overall diagnostic performance of dermatologists in the dichotomous image-based discrimination between melanoma and nevus. METHODS: Twelve board-certified dermatologists were presented disjoint sets of 100 unique dermoscopic images of melanomas and nevi (total of 1200 unique images), and they had to classify the images based on personal experience alone (part I) and with the support of a trained convolutional neural network (CNN, part II). Additionally, dermatologists were asked to rate their confidence in their final decision for each image. RESULTS: While the mean specificity of the dermatologists based on personal experience alone remained almost unchanged (70.6% vs 72.4%; P=.54) with AI support, the mean sensitivity and mean accuracy increased significantly (59.4% vs 74.6%; P=.003 and 65.0% vs 73.6%; P=.002, respectively) with AI support. Out of the 10% (10/94; 95% CI 8.4%-11.8%) of cases where dermatologists were correct and AI was incorrect, dermatologists on average changed to the incorrect answer for 39% (4/10; 95% CI 23.2%-55.6%) of cases. When dermatologists were incorrect and AI was correct (25/94, 27%; 95% CI 24.0%-30.1%), dermatologists changed their answers to the correct answer for 46% (11/25; 95% CI 33.1%-58.4%) of cases. Additionally, the dermatologists' average confidence in their decisions increased when the CNN confirmed their decision and decreased when the CNN disagreed, even when the dermatologists were correct. Reported values are based on the mean of all participants. Whenever absolute values are shown, the denominator and numerator are approximations as every dermatologist ended up rating a varying number of images due to a quality control step. CONCLUSIONS: The findings of our study show that AI support can improve the overall accuracy of the dermatologists in the dichotomous image-based discrimination between melanoma and nevus. This supports the argument for AI-based tools to aid clinicians in skin lesion classification and provides a rationale for studies of such classifiers in real-life settings, wherein clinicians can integrate additional information such as patient age and medical history into their decisions.


Assuntos
Inteligência Artificial/normas , Dermatologistas/normas , Dermoscopia/métodos , Diagnóstico por Imagem/classificação , Melanoma/diagnóstico por imagem , Neoplasias Cutâneas/diagnóstico por imagem , Humanos , Internet , Melanoma/diagnóstico , Neoplasias Cutâneas/diagnóstico , Inquéritos e Questionários
20.
J Dtsch Dermatol Ges ; 18(11): 1236-1243, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-32841508

RESUMO

Malignant melanoma is the skin tumor that causes most deaths in Germany. At an early stage, melanoma is well treatable, so early detection is essential. However, the skin cancer screening program in Germany has been criticized because although melanomas have been diagnosed more frequently since introduction of the program, the mortality from malignant melanoma has not decreased. This indicates that the observed increase in melanoma diagnoses be due to overdiagnosis, i.e. to the detection of lesions that would never have created serious health problems for the patients. One of the reasons is the challenging distinction between some benign and malignant lesions. In addition, there may be lesions that are biologically equivocal, and other lesions that are classified as malignant according to current criteria, but that grow so slowly that they would never have posed a threat to patient's life. So far, these "indolent" melanomas cannot be identified reliably due to a lack of biomarkers. Moreover, the likelihood that an in-situ melanoma will progress to an invasive tumor still cannot be determined with any certainty. When benign lesions are diagnosed as melanoma, the consequences are unnecessary psychological and physical stress for the affected patients and incurred therapy costs. Vice versa, underdiagnoses in the sense of overlooked melanomas can adversely affect patients' prognoses and may necessitate more intense therapies. Novel diagnostic options could reduce the number of over- and underdiagnoses and contribute to more objective diagnoses in borderline cases. One strategy that has yielded promising results in pilot studies is the use of artificial intelligence-based diagnostic tools. However, these applications still await translation into clinical and pathological routine.


Assuntos
Melanoma , Neoplasias Cutâneas , Inteligência Artificial , Alemanha , Humanos , Uso Excessivo dos Serviços de Saúde
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA