RESUMO
OBJECTIVE: To develop a new digital biomarker based on the analysis of primary tumour tissue by a convolutional neural network (CNN) to predict lymph node metastasis (LNM) in a cohort matched for already established risk factors. PATIENTS AND METHODS: Haematoxylin and eosin (H&E) stained primary tumour slides from 218 patients (102 N+; 116 N0), matched for Gleason score, tumour size, venous invasion, perineural invasion and age, who underwent radical prostatectomy were selected to train a CNN and evaluate its ability to predict LN status. RESULTS: With 10 models trained with the same data, a mean area under the receiver operating characteristic curve (AUROC) of 0.68 (95% confidence interval [CI] 0.678-0.682) and a mean balanced accuracy of 61.37% (95% CI 60.05-62.69%) was achieved. The mean sensitivity and specificity was 53.09% (95% CI 49.77-56.41%) and 69.65% (95% CI 68.21-71.1%), respectively. These results were confirmed via cross-validation. The probability score for LNM prediction was significantly higher on image sections from N+ samples (mean [SD] N+ probability score 0.58 [0.17] vs 0.47 [0.15] N0 probability score, P = 0.002). In multivariable analysis, the probability score of the CNN (odds ratio [OR] 1.04 per percentage probability, 95% CI 1.02-1.08; P = 0.04) and lymphovascular invasion (OR 11.73, 95% CI 3.96-35.7; P < 0.001) proved to be independent predictors for LNM. CONCLUSION: In our present study, CNN-based image analyses showed promising results as a potential novel low-cost method to extract relevant prognostic information directly from H&E histology to predict the LN status of patients with prostate cancer. Our ubiquitously available technique might contribute to an improved LN status prediction.
Assuntos
Aprendizado Profundo , Metástase Linfática , Redes Neurais de Computação , Neoplasias da Próstata/patologia , Idoso , Humanos , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Prognóstico , Estudos RetrospectivosRESUMO
BACKGROUND: Studies have shown that artificial intelligence achieves similar or better performance than dermatologists in specific dermoscopic image classification tasks. However, artificial intelligence is susceptible to the influence of confounding factors within images (eg, skin markings), which can lead to false diagnoses of cancerous skin lesions. Image segmentation can remove lesion-adjacent confounding factors but greatly change the image representation. OBJECTIVE: The aim of this study was to compare the performance of 2 image classification workflows where images were either segmented or left unprocessed before the subsequent training and evaluation of a binary skin lesion classifier. METHODS: Separate binary skin lesion classifiers (nevus vs melanoma) were trained and evaluated on segmented and unsegmented dermoscopic images. For a more informative result, separate classifiers were trained on 2 distinct training data sets (human against machine [HAM] and International Skin Imaging Collaboration [ISIC]). Each training run was repeated 5 times. The mean performance of the 5 runs was evaluated on a multi-source test set (n=688) consisting of a holdout and an external component. RESULTS: Our findings showed that when trained on HAM, the segmented classifiers showed a higher overall balanced accuracy (75.6% [SD 1.1%]) than the unsegmented classifiers (66.7% [SD 3.2%]), which was significant in 4 out of 5 runs (P<.001). The overall balanced accuracy was numerically higher for the unsegmented ISIC classifiers (78.3% [SD 1.8%]) than for the segmented ISIC classifiers (77.4% [SD 1.5%]), which was significantly different in 1 out of 5 runs (P=.004). CONCLUSIONS: Image segmentation does not result in overall performance decrease but it causes the beneficial removal of lesion-adjacent confounding factors. Thus, it is a viable option to address the negative impact that confounding factors have on deep learning models in dermatology. However, the segmentation step might introduce new pitfalls, which require further investigations.
Assuntos
Melanoma , Neoplasias Cutâneas , Algoritmos , Inteligência Artificial , Dermoscopia , Humanos , Redes Neurais de Computação , Neoplasias Cutâneas/diagnóstico por imagemRESUMO
BACKGROUND: An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole slide images, which requires large and diverse data sets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects that compromise the accuracy of classification systems. OBJECTIVE: The objective of the study was to analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, and scanner type) that are commonly found in whole slide image data sets in digital pathology and could create batch effects. METHODS: We trained four separate convolutional neural networks (CNNs) to learn four variables using a data set of digitized whole slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of the 95% confidence interval of its mean balanced accuracy was above 50.0%. RESULTS: A mean balanced accuracy above 50.0% was achieved for all four tasks, even when considering the lower bound of the 95% confidence interval. Performance between tasks showed wide variation, ranging from 56.1% (slide preparation date) to 100% (slide origin). CONCLUSIONS: Because all of the analyzed hidden variables are learnable, they have the potential to create batch effects in dermatopathology data sets, which negatively affect AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effect variables in their data sets through sufficient data set stratification.
Assuntos
Inteligência Artificial/normas , Aprendizado Profundo/normas , Redes Neurais de Computação , Patologia/métodos , HumanosRESUMO
BACKGROUND: Recent years have been witnessing a substantial improvement in the accuracy of skin cancer classification using convolutional neural networks (CNNs). CNNs perform on par with or better than dermatologists with respect to the classification tasks of single images. However, in clinical practice, dermatologists also use other patient data beyond the visual aspects present in a digitized image, further increasing their diagnostic accuracy. Several pilot studies have recently investigated the effects of integrating different subtypes of patient data into CNN-based skin cancer classifiers. OBJECTIVE: This systematic review focuses on the current research investigating the impact of merging information from image features and patient data on the performance of CNN-based skin cancer image classification. This study aims to explore the potential in this field of research by evaluating the types of patient data used, the ways in which the nonimage data are encoded and merged with the image features, and the impact of the integration on the classifier performance. METHODS: Google Scholar, PubMed, MEDLINE, and ScienceDirect were screened for peer-reviewed studies published in English that dealt with the integration of patient data within a CNN-based skin cancer classification. The search terms skin cancer classification, convolutional neural network(s), deep learning, lesions, melanoma, metadata, clinical information, and patient data were combined. RESULTS: A total of 11 publications fulfilled the inclusion criteria. All of them reported an overall improvement in different skin lesion classification tasks with patient data integration. The most commonly used patient data were age, sex, and lesion location. The patient data were mostly one-hot encoded. There were differences in the complexity that the encoded patient data were processed with regarding deep learning methods before and after fusing them with the image features for a combined classifier. CONCLUSIONS: This study indicates the potential benefits of integrating patient data into CNN-based diagnostic algorithms. However, how exactly the individual patient data enhance classification performance, especially in the case of multiclass classification problems, is still unclear. Moreover, a substantial fraction of patient data used by dermatologists remains to be analyzed in the context of CNN-based skin cancer classification. Further exploratory analyses in this promising field may optimize patient data integration into CNN-based skin cancer diagnostics for patients' benefits.
Assuntos
Melanoma , Neoplasias Cutâneas , Dermoscopia , Humanos , Melanoma/diagnóstico , Redes Neurais de Computação , Neoplasias Cutâneas/diagnósticoRESUMO
BACKGROUND: Early detection of melanoma can be lifesaving but this remains a challenge. Recent diagnostic studies have revealed the superiority of artificial intelligence (AI) in classifying dermoscopic images of melanoma and nevi, concluding that these algorithms should assist a dermatologist's diagnoses. OBJECTIVE: The aim of this study was to investigate whether AI support improves the accuracy and overall diagnostic performance of dermatologists in the dichotomous image-based discrimination between melanoma and nevus. METHODS: Twelve board-certified dermatologists were presented disjoint sets of 100 unique dermoscopic images of melanomas and nevi (total of 1200 unique images), and they had to classify the images based on personal experience alone (part I) and with the support of a trained convolutional neural network (CNN, part II). Additionally, dermatologists were asked to rate their confidence in their final decision for each image. RESULTS: While the mean specificity of the dermatologists based on personal experience alone remained almost unchanged (70.6% vs 72.4%; P=.54) with AI support, the mean sensitivity and mean accuracy increased significantly (59.4% vs 74.6%; P=.003 and 65.0% vs 73.6%; P=.002, respectively) with AI support. Out of the 10% (10/94; 95% CI 8.4%-11.8%) of cases where dermatologists were correct and AI was incorrect, dermatologists on average changed to the incorrect answer for 39% (4/10; 95% CI 23.2%-55.6%) of cases. When dermatologists were incorrect and AI was correct (25/94, 27%; 95% CI 24.0%-30.1%), dermatologists changed their answers to the correct answer for 46% (11/25; 95% CI 33.1%-58.4%) of cases. Additionally, the dermatologists' average confidence in their decisions increased when the CNN confirmed their decision and decreased when the CNN disagreed, even when the dermatologists were correct. Reported values are based on the mean of all participants. Whenever absolute values are shown, the denominator and numerator are approximations as every dermatologist ended up rating a varying number of images due to a quality control step. CONCLUSIONS: The findings of our study show that AI support can improve the overall accuracy of the dermatologists in the dichotomous image-based discrimination between melanoma and nevus. This supports the argument for AI-based tools to aid clinicians in skin lesion classification and provides a rationale for studies of such classifiers in real-life settings, wherein clinicians can integrate additional information such as patient age and medical history into their decisions.
Assuntos
Inteligência Artificial/normas , Dermatologistas/normas , Dermoscopia/métodos , Diagnóstico por Imagem/classificação , Melanoma/diagnóstico por imagem , Neoplasias Cutâneas/diagnóstico por imagem , Humanos , Internet , Melanoma/diagnóstico , Neoplasias Cutâneas/diagnóstico , Inquéritos e QuestionáriosRESUMO
Malignant melanoma is the skin tumor that causes most deaths in Germany. At an early stage, melanoma is well treatable, so early detection is essential. However, the skin cancer screening program in Germany has been criticized because although melanomas have been diagnosed more frequently since introduction of the program, the mortality from malignant melanoma has not decreased. This indicates that the observed increase in melanoma diagnoses be due to overdiagnosis, i.e. to the detection of lesions that would never have created serious health problems for the patients. One of the reasons is the challenging distinction between some benign and malignant lesions. In addition, there may be lesions that are biologically equivocal, and other lesions that are classified as malignant according to current criteria, but that grow so slowly that they would never have posed a threat to patient's life. So far, these "indolent" melanomas cannot be identified reliably due to a lack of biomarkers. Moreover, the likelihood that an in-situ melanoma will progress to an invasive tumor still cannot be determined with any certainty. When benign lesions are diagnosed as melanoma, the consequences are unnecessary psychological and physical stress for the affected patients and incurred therapy costs. Vice versa, underdiagnoses in the sense of overlooked melanomas can adversely affect patients' prognoses and may necessitate more intense therapies. Novel diagnostic options could reduce the number of over- and underdiagnoses and contribute to more objective diagnoses in borderline cases. One strategy that has yielded promising results in pilot studies is the use of artificial intelligence-based diagnostic tools. However, these applications still await translation into clinical and pathological routine.
Assuntos
Melanoma , Neoplasias Cutâneas , Inteligência Artificial , Alemanha , Humanos , Uso Excessivo dos Serviços de SaúdeAssuntos
Inteligência Artificial , Dermatologistas , Preferência do Paciente , Neoplasias Cutâneas , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Dermatologistas/estatística & dados numéricos , Dermatologistas/psicologia , Dermatologia/métodos , Preferência do Paciente/estatística & dados numéricos , Estudos Prospectivos , Neoplasias Cutâneas/diagnóstico , Inquéritos e Questionários/estatística & dados numéricosRESUMO
BACKGROUND: State-of-the-art classifiers based on convolutional neural networks (CNNs) were shown to classify images of skin cancer on par with dermatologists and could enable lifesaving and fast diagnoses, even outside the hospital via installation of apps on mobile devices. To our knowledge, at present there is no review of the current work in this research area. OBJECTIVE: This study presents the first systematic review of the state-of-the-art research on classifying skin lesions with CNNs. We limit our review to skin lesion classifiers. In particular, methods that apply a CNN only for segmentation or for the classification of dermoscopic patterns are not considered here. Furthermore, this study discusses why the comparability of the presented procedures is very difficult and which challenges must be addressed in the future. METHODS: We searched the Google Scholar, PubMed, Medline, ScienceDirect, and Web of Science databases for systematic reviews and original research articles published in English. Only papers that reported sufficient scientific proceedings are included in this review. RESULTS: We found 13 papers that classified skin lesions using CNNs. In principle, classification methods can be differentiated according to three principles. Approaches that use a CNN already trained by means of another large dataset and then optimize its parameters to the classification of skin lesions are the most common ones used and they display the best performance with the currently available limited datasets. CONCLUSIONS: CNNs display a high performance as state-of-the-art skin lesion classifiers. Unfortunately, it is difficult to compare different classification methods because some approaches use nonpublic datasets for training and/or testing, thereby making reproducibility difficult. Future publications should use publicly available benchmarks and fully disclose methods used for training to allow comparability.
Assuntos
Redes Neurais de Computação , Neoplasias Cutâneas/classificação , Humanos , Reprodutibilidade dos TestesRESUMO
A decreasing number of dermatologists and an increasing number of patients in Western countries have led to a relative lack of clinicians providing expert dermatologic care. This, in turn, has prolonged wait times for patients to be examined, putting them at risk. Store-and-forward teledermatology improves patient access to dermatologists through asynchronous consultations, reducing wait times to obtain a consultation. However, live video conferencing as a synchronous service is also frequently used by practitioners because it allows immediate interaction between patient and physician. This raises the question of which of the two approaches is superior in terms of quality of care and convenience. There are pros and cons for each in terms of technical requirements and features. This viewpoint compares the two techniques based on a literature review and a clinical perspective to help dermatologists assess the value of teledermatology and determine which techniques would be valuable in their practice.
Assuntos
Dermatologia/métodos , Consulta Remota/métodos , Dermatopatias/diagnóstico , Telemedicina/métodos , Comunicação por Videoconferência/normas , Humanos , Dermatopatias/patologiaRESUMO
BACKGROUND: Early detection of melanoma, a potentially lethal type of skin cancer with high prevalence worldwide, improves patient prognosis. In retrospective studies, artificial intelligence (AI) has proven to be helpful for enhancing melanoma detection. However, there are few prospective studies confirming these promising results. Existing studies are limited by low sample sizes, too homogenous datasets, or lack of inclusion of rare melanoma subtypes, preventing a fair and thorough evaluation of AI and its generalizability, a crucial aspect for its application in the clinical setting. METHODS: Therefore, we assessed "All Data are Ext" (ADAE), an established open-source ensemble algorithm for detecting melanomas, by comparing its diagnostic accuracy to that of dermatologists on a prospectively collected, external, heterogeneous test set comprising eight distinct hospitals, four different camera setups, rare melanoma subtypes, and special anatomical sites. We advanced the algorithm with real test-time augmentation (R-TTA, i.e., providing real photographs of lesions taken from multiple angles and averaging the predictions), and evaluated its generalization capabilities. RESULTS: Overall, the AI shows higher balanced accuracy than dermatologists (0.798, 95% confidence interval (CI) 0.779-0.814 vs. 0.781, 95% CI 0.760-0.802; p = 4.0e-145), obtaining a higher sensitivity (0.921, 95% CI 0.900-0.942 vs. 0.734, 95% CI 0.701-0.770; p = 3.3e-165) at the cost of a lower specificity (0.673, 95% CI 0.641-0.702 vs. 0.828, 95% CI 0.804-0.852; p = 3.3e-165). CONCLUSION: As the algorithm exhibits a significant performance advantage on our heterogeneous dataset exclusively comprising melanoma-suspicious lesions, AI may offer the potential to support dermatologists, particularly in diagnosing challenging cases.
Melanoma is a type of skin cancer that can spread to other parts of the body, often resulting in death. Early detection improves survival rates. Computational tools that use artificial intelligence (AI) can be used to detect melanoma. However, few studies have checked how well the AI works on real-world data obtained from patients. We tested a previously developed AI tool on data obtained from eight different hospitals that used different types of cameras, which also included images taken of rare melanoma types and from a range of different parts of the body. The AI tool was more likely to correctly identify melanoma than dermatologists. This AI tool could be used to help dermatologists diagnose melanoma, particularly those that are difficult for dermatologists to diagnose.
RESUMO
Importance: The development of artificial intelligence (AI)-based melanoma classifiers typically calls for large, centralized datasets, requiring hospitals to give away their patient data, which raises serious privacy concerns. To address this concern, decentralized federated learning has been proposed, where classifier development is distributed across hospitals. Objective: To investigate whether a more privacy-preserving federated learning approach can achieve comparable diagnostic performance to a classical centralized (ie, single-model) and ensemble learning approach for AI-based melanoma diagnostics. Design, Setting, and Participants: This multicentric, single-arm diagnostic study developed a federated model for melanoma-nevus classification using histopathological whole-slide images prospectively acquired at 6 German university hospitals between April 2021 and February 2023 and benchmarked it using both a holdout and an external test dataset. Data analysis was performed from February to April 2023. Exposures: All whole-slide images were retrospectively analyzed by an AI-based classifier without influencing routine clinical care. Main Outcomes and Measures: The area under the receiver operating characteristic curve (AUROC) served as the primary end point for evaluating the diagnostic performance. Secondary end points included balanced accuracy, sensitivity, and specificity. Results: The study included 1025 whole-slide images of clinically melanoma-suspicious skin lesions from 923 patients, consisting of 388 histopathologically confirmed invasive melanomas and 637 nevi. The median (range) age at diagnosis was 58 (18-95) years for the training set, 57 (18-93) years for the holdout test dataset, and 61 (18-95) years for the external test dataset; the median (range) Breslow thickness was 0.70 (0.10-34.00) mm, 0.70 (0.20-14.40) mm, and 0.80 (0.30-20.00) mm, respectively. The federated approach (0.8579; 95% CI, 0.7693-0.9299) performed significantly worse than the classical centralized approach (0.9024; 95% CI, 0.8379-0.9565) in terms of AUROC on a holdout test dataset (pairwise Wilcoxon signed-rank, P < .001) but performed significantly better (0.9126; 95% CI, 0.8810-0.9412) than the classical centralized approach (0.9045; 95% CI, 0.8701-0.9331) on an external test dataset (pairwise Wilcoxon signed-rank, P < .001). Notably, the federated approach performed significantly worse than the ensemble approach on both the holdout (0.8867; 95% CI, 0.8103-0.9481) and external test dataset (0.9227; 95% CI, 0.8941-0.9479). Conclusions and Relevance: The findings of this diagnostic study suggest that federated learning is a viable approach for the binary classification of invasive melanomas and nevi on a clinically representative distributed dataset. Federated learning can improve privacy protection in AI-based melanoma diagnostics while simultaneously promoting collaboration across institutions and countries. Moreover, it may have the potential to be extended to other image classification tasks in digital cancer histopathology and beyond.
Assuntos
Dermatologia , Melanoma , Nevo , Neoplasias Cutâneas , Humanos , Melanoma/diagnóstico , Inteligência Artificial , Estudos Retrospectivos , Neoplasias Cutâneas/diagnóstico , Nevo/diagnósticoRESUMO
BACKGROUND: Over the past decade, the development of molecular high-throughput methods (omics) increased rapidly and provided new insights for cancer research. In parallel, deep learning approaches revealed the enormous potential for medical image analysis, especially in digital pathology. Combining image and omics data with deep learning tools may enable the discovery of new cancer biomarkers and a more precise prediction of patient prognosis. This systematic review addresses different multimodal fusion methods of convolutional neural network-based image analyses with omics data, focussing on the impact of data combination on the classification performance. METHODS: PubMed was screened for peer-reviewed articles published in English between January 2015 and June 2021 by two independent researchers. Search terms related to deep learning, digital pathology, omics, and multimodal fusion were combined. RESULTS: We identified a total of 11 studies meeting the inclusion criteria, namely studies that used convolutional neural networks for haematoxylin and eosin image analysis of patients with cancer in combination with integrated omics data. Publications were categorised according to their endpoints: 7 studies focused on survival analysis and 4 studies on prediction of cancer subtypes, malignancy or microsatellite instability with spatial analysis. CONCLUSIONS: Image-based classifiers already show high performances in prognostic and predictive cancer diagnostics. The integration of omics data led to improved performance in all studies described here. However, these are very early studies that still require external validation to demonstrate their generalisability and robustness. Further and more comprehensive studies with larger sample sizes are needed to evaluate performance and determine clinical benefits.
Assuntos
Aprendizado Profundo/normas , Genômica/métodos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias/genética , Humanos , Neoplasias/patologiaRESUMO
BACKGROUND: Deep neural networks are showing impressive results in different medical image classification tasks. However, for real-world applications, there is a need to estimate the network's uncertainty together with its prediction. OBJECTIVE: In this review, we investigate in what form uncertainty estimation has been applied to the task of medical image classification. We also investigate which metrics are used to describe the effectiveness of the applied uncertainty estimation. METHODS: Google Scholar, PubMed, IEEE Xplore, and ScienceDirect were screened for peer-reviewed studies, published between 2016 and 2021, that deal with uncertainty estimation in medical image classification. The search terms "uncertainty," "uncertainty estimation," "network calibration," and "out-of-distribution detection" were used in combination with the terms "medical images," "medical image analysis," and "medical image classification." RESULTS: A total of 22 papers were chosen for detailed analysis through the systematic review process. This paper provides a table for a systematic comparison of the included works with respect to the applied method for estimating the uncertainty. CONCLUSIONS: The applied methods for estimating uncertainties are diverse, but the sampling-based methods Monte-Carlo Dropout and Deep Ensembles are used most frequently. We concluded that future works can investigate the benefits of uncertainty estimation in collaborative settings of artificial intelligence systems and human experts. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/11936.
RESUMO
BACKGROUND: Image-based cancer classifiers suffer from a variety of problems which negatively affect their performance. For example, variation in image brightness or different cameras can already suffice to diminish performance. Ensemble solutions, where multiple model predictions are combined into one, can improve these problems. However, ensembles are computationally intensive and less transparent to practitioners than single model solutions. Constructing model soups, by averaging the weights of multiple models into a single model, could circumvent these limitations while still improving performance. OBJECTIVE: To investigate the performance of model soups for a dermoscopic melanoma-nevus skin cancer classification task with respect to (1) generalisation to images from other clinics, (2) robustness against small image changes and (3) calibration such that the confidences correspond closely to the actual predictive uncertainties. METHODS: We construct model soups by fine-tuning pre-trained models on seven different image resolutions and subsequently averaging their weights. Performance is evaluated on a multi-source dataset including holdout and external components. RESULTS: We find that model soups improve generalisation and calibration on the external component while maintaining performance on the holdout component. For robustness, we observe performance improvements for pertubated test images, while the performance on corrupted test images remains on par. CONCLUSIONS: Overall, souping for skin cancer classifiers has a positive effect on generalisation, robustness and calibration. It is easy for practitioners to implement and by combining multiple models into a single model, complexity is reduced. This could be an important factor in achieving clinical applicability, as less complexity generally means more transparency.
Assuntos
Melanoma , Neoplasias Cutâneas , Dermoscopia/métodos , Humanos , Melanoma/diagnóstico por imagem , Sensibilidade e Especificidade , Neoplasias Cutâneas/diagnóstico por imagem , Melanoma Maligno CutâneoRESUMO
BACKGROUND: Due to their ability to solve complex problems, deep neural networks (DNNs) are becoming increasingly popular in medical applications. However, decision-making by such algorithms is essentially a black-box process that renders it difficult for physicians to judge whether the decisions are reliable. The use of explainable artificial intelligence (XAI) is often suggested as a solution to this problem. We investigate how XAI is used for skin cancer detection: how is it used during the development of new DNNs? What kinds of visualisations are commonly used? Are there systematic evaluations of XAI with dermatologists or dermatopathologists? METHODS: Google Scholar, PubMed, IEEE Explore, Science Direct and Scopus were searched for peer-reviewed studies published between January 2017 and October 2021 applying XAI to dermatological images: the search terms histopathological image, whole-slide image, clinical image, dermoscopic image, skin, dermatology, explainable, interpretable and XAI were used in various combinations. Only studies concerned with skin cancer were included. RESULTS: 37 publications fulfilled our inclusion criteria. Most studies (19/37) simply applied existing XAI methods to their classifier to interpret its decision-making. Some studies (4/37) proposed new XAI methods or improved upon existing techniques. 14/37 studies addressed specific questions such as bias detection and impact of XAI on man-machine-interactions. However, only three of them evaluated the performance and confidence of humans using CAD systems with XAI. CONCLUSION: XAI is commonly applied during the development of DNNs for skin cancer detection. However, a systematic and rigorous evaluation of its usefulness in this scenario is lacking.
Assuntos
Inteligência Artificial , Neoplasias Cutâneas , Algoritmos , Humanos , Redes Neurais de Computação , Neoplasias Cutâneas/diagnósticoRESUMO
BACKGROUND: Gastrointestinal cancers account for approximately 20% of all cancer diagnoses and are responsible for 22.5% of cancer deaths worldwide. Artificial intelligence-based diagnostic support systems, in particular convolutional neural network (CNN)-based image analysis tools, have shown great potential in medical computer vision. In this systematic review, we summarise recent studies reporting CNN-based approaches for digital biomarkers for characterization and prognostication of gastrointestinal cancer pathology. METHODS: Pubmed and Medline were screened for peer-reviewed papers dealing with CNN-based gastrointestinal cancer analyses from histological slides, published between 2015 and 2020.Seven hundred and ninety titles and abstracts were screened, and 58 full-text articles were assessed for eligibility. RESULTS: Sixteen publications fulfilled our inclusion criteria dealing with tumor or precursor lesion characterization or prognostic and predictive biomarkers: 14 studies on colorectal or rectal cancer, three studies on gastric cancer and none on esophageal cancer. These studies were categorised according to their end-points: polyp characterization, tumor characterization and patient outcome. Regarding the translation into clinical practice, we identified several studies demonstrating generalization of the classifier with external tests and comparisons with pathologists, but none presenting clinical implementation. CONCLUSIONS: Results of recent studies on CNN-based image analysis in gastrointestinal cancer pathology are promising, but studies were conducted in observational and retrospective settings. Large-scale trials are needed to assess performance and predict clinical usefulness. Furthermore, large-scale trials are required for approval of CNN-based prediction models as medical devices.
Assuntos
Aprendizado Profundo/normas , Neoplasias Gastrointestinais/classificação , Neoplasias Gastrointestinais/patologia , Humanos , Resultado do TratamentoRESUMO
AIM: Sentinel lymph node status is a central prognostic factor for melanomas. However, the surgical excision involves some risks for affected patients. In this study, we therefore aimed to develop a digital biomarker that can predict lymph node metastasis non-invasively from digitised H&E slides of primary melanoma tumours. METHODS: A total of 415 H&E slides from primary melanoma tumours with known sentinel node (SN) status from three German university hospitals and one private pathological practice were digitised (150 SN positive/265 SN negative). Two hundred ninety-one slides were used to train artificial neural networks (ANNs). The remaining 124 slides were used to test the ability of the ANNs to predict sentinel status. ANNs were trained and/or tested on data sets that were matched or not matched between SN-positive and SN-negative cases for patient age, ulceration, and tumour thickness, factors that are known to correlate with lymph node status. RESULTS: The best accuracy was achieved by an ANN that was trained and tested on unmatched cases (61.8% ± 0.2%) area under the receiver operating characteristic (AUROC). In contrast, ANNs that were trained and/or tested on matched cases achieved (55.0% ± 3.5%) AUROC or less. CONCLUSION: Our results indicate that the image classifier can predict lymph node status to some, albeit so far not clinically relevant, extent. It may do so by mostly detecting equivalents of factors on histological slides that are already known to correlate with lymph node status. Our results provide a basis for future research with larger data cohorts.