Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
JAMA Dermatol ; 160(3): 303-311, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38324293

RESUMEN

Importance: The development of artificial intelligence (AI)-based melanoma classifiers typically calls for large, centralized datasets, requiring hospitals to give away their patient data, which raises serious privacy concerns. To address this concern, decentralized federated learning has been proposed, where classifier development is distributed across hospitals. Objective: To investigate whether a more privacy-preserving federated learning approach can achieve comparable diagnostic performance to a classical centralized (ie, single-model) and ensemble learning approach for AI-based melanoma diagnostics. Design, Setting, and Participants: This multicentric, single-arm diagnostic study developed a federated model for melanoma-nevus classification using histopathological whole-slide images prospectively acquired at 6 German university hospitals between April 2021 and February 2023 and benchmarked it using both a holdout and an external test dataset. Data analysis was performed from February to April 2023. Exposures: All whole-slide images were retrospectively analyzed by an AI-based classifier without influencing routine clinical care. Main Outcomes and Measures: The area under the receiver operating characteristic curve (AUROC) served as the primary end point for evaluating the diagnostic performance. Secondary end points included balanced accuracy, sensitivity, and specificity. Results: The study included 1025 whole-slide images of clinically melanoma-suspicious skin lesions from 923 patients, consisting of 388 histopathologically confirmed invasive melanomas and 637 nevi. The median (range) age at diagnosis was 58 (18-95) years for the training set, 57 (18-93) years for the holdout test dataset, and 61 (18-95) years for the external test dataset; the median (range) Breslow thickness was 0.70 (0.10-34.00) mm, 0.70 (0.20-14.40) mm, and 0.80 (0.30-20.00) mm, respectively. The federated approach (0.8579; 95% CI, 0.7693-0.9299) performed significantly worse than the classical centralized approach (0.9024; 95% CI, 0.8379-0.9565) in terms of AUROC on a holdout test dataset (pairwise Wilcoxon signed-rank, P < .001) but performed significantly better (0.9126; 95% CI, 0.8810-0.9412) than the classical centralized approach (0.9045; 95% CI, 0.8701-0.9331) on an external test dataset (pairwise Wilcoxon signed-rank, P < .001). Notably, the federated approach performed significantly worse than the ensemble approach on both the holdout (0.8867; 95% CI, 0.8103-0.9481) and external test dataset (0.9227; 95% CI, 0.8941-0.9479). Conclusions and Relevance: The findings of this diagnostic study suggest that federated learning is a viable approach for the binary classification of invasive melanomas and nevi on a clinically representative distributed dataset. Federated learning can improve privacy protection in AI-based melanoma diagnostics while simultaneously promoting collaboration across institutions and countries. Moreover, it may have the potential to be extended to other image classification tasks in digital cancer histopathology and beyond.


Asunto(s)
Dermatología , Melanoma , Nevo , Neoplasias Cutáneas , Humanos , Melanoma/diagnóstico , Inteligencia Artificial , Estudios Retrospectivos , Neoplasias Cutáneas/diagnóstico , Nevo/diagnóstico
4.
Eur J Cancer ; 173: 307-316, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35973360

RESUMEN

BACKGROUND: Image-based cancer classifiers suffer from a variety of problems which negatively affect their performance. For example, variation in image brightness or different cameras can already suffice to diminish performance. Ensemble solutions, where multiple model predictions are combined into one, can improve these problems. However, ensembles are computationally intensive and less transparent to practitioners than single model solutions. Constructing model soups, by averaging the weights of multiple models into a single model, could circumvent these limitations while still improving performance. OBJECTIVE: To investigate the performance of model soups for a dermoscopic melanoma-nevus skin cancer classification task with respect to (1) generalisation to images from other clinics, (2) robustness against small image changes and (3) calibration such that the confidences correspond closely to the actual predictive uncertainties. METHODS: We construct model soups by fine-tuning pre-trained models on seven different image resolutions and subsequently averaging their weights. Performance is evaluated on a multi-source dataset including holdout and external components. RESULTS: We find that model soups improve generalisation and calibration on the external component while maintaining performance on the holdout component. For robustness, we observe performance improvements for pertubated test images, while the performance on corrupted test images remains on par. CONCLUSIONS: Overall, souping for skin cancer classifiers has a positive effect on generalisation, robustness and calibration. It is easy for practitioners to implement and by combining multiple models into a single model, complexity is reduced. This could be an important factor in achieving clinical applicability, as less complexity generally means more transparency.


Asunto(s)
Melanoma , Neoplasias Cutáneas , Dermoscopía/métodos , Humanos , Melanoma/diagnóstico por imagen , Sensibilidad y Especificidad , Neoplasias Cutáneas/diagnóstico por imagen , Melanoma Cutáneo Maligno
5.
Eur J Cancer ; 167: 54-69, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35390650

RESUMEN

BACKGROUND: Due to their ability to solve complex problems, deep neural networks (DNNs) are becoming increasingly popular in medical applications. However, decision-making by such algorithms is essentially a black-box process that renders it difficult for physicians to judge whether the decisions are reliable. The use of explainable artificial intelligence (XAI) is often suggested as a solution to this problem. We investigate how XAI is used for skin cancer detection: how is it used during the development of new DNNs? What kinds of visualisations are commonly used? Are there systematic evaluations of XAI with dermatologists or dermatopathologists? METHODS: Google Scholar, PubMed, IEEE Explore, Science Direct and Scopus were searched for peer-reviewed studies published between January 2017 and October 2021 applying XAI to dermatological images: the search terms histopathological image, whole-slide image, clinical image, dermoscopic image, skin, dermatology, explainable, interpretable and XAI were used in various combinations. Only studies concerned with skin cancer were included. RESULTS: 37 publications fulfilled our inclusion criteria. Most studies (19/37) simply applied existing XAI methods to their classifier to interpret its decision-making. Some studies (4/37) proposed new XAI methods or improved upon existing techniques. 14/37 studies addressed specific questions such as bias detection and impact of XAI on man-machine-interactions. However, only three of them evaluated the performance and confidence of humans using CAD systems with XAI. CONCLUSION: XAI is commonly applied during the development of DNNs for skin cancer detection. However, a systematic and rigorous evaluation of its usefulness in this scenario is lacking.


Asunto(s)
Inteligencia Artificial , Neoplasias Cutáneas , Algoritmos , Humanos , Redes Neurales de la Computación , Neoplasias Cutáneas/diagnóstico
6.
Eur J Cancer ; 156: 202-216, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34509059

RESUMEN

BACKGROUND: Multiple studies have compared the performance of artificial intelligence (AI)-based models for automated skin cancer classification to human experts, thus setting the cornerstone for a successful translation of AI-based tools into clinicopathological practice. OBJECTIVE: The objective of the study was to systematically analyse the current state of research on reader studies involving melanoma and to assess their potential clinical relevance by evaluating three main aspects: test set characteristics (holdout/out-of-distribution data set, composition), test setting (experimental/clinical, inclusion of metadata) and representativeness of participating clinicians. METHODS: PubMed, Medline and ScienceDirect were screened for peer-reviewed studies published between 2017 and 2021 and dealing with AI-based skin cancer classification involving melanoma. The search terms skin cancer classification, deep learning, convolutional neural network (CNN), melanoma (detection), digital biomarkers, histopathology and whole slide imaging were combined. Based on the search results, only studies that considered direct comparison of AI results with clinicians and had a diagnostic classification as their main objective were included. RESULTS: A total of 19 reader studies fulfilled the inclusion criteria. Of these, 11 CNN-based approaches addressed the classification of dermoscopic images; 6 concentrated on the classification of clinical images, whereas 2 dermatopathological studies utilised digitised histopathological whole slide images. CONCLUSIONS: All 19 included studies demonstrated superior or at least equivalent performance of CNN-based classifiers compared with clinicians. However, almost all studies were conducted in highly artificial settings based exclusively on single images of the suspicious lesions. Moreover, test sets mainly consisted of holdout images and did not represent the full range of patient populations and melanoma subtypes encountered in clinical practice.


Asunto(s)
Dermatólogos , Dermoscopía , Diagnóstico por Computador , Interpretación de Imagen Asistida por Computador , Melanoma/patología , Microscopía , Redes Neurales de la Computación , Patólogos , Neoplasias Cutáneas/patología , Automatización , Biopsia , Competencia Clínica , Aprendizaje Profundo , Humanos , Melanoma/clasificación , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Neoplasias Cutáneas/clasificación
7.
Eur J Cancer ; 155: 191-199, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34388516

RESUMEN

BACKGROUND: One prominent application for deep learning-based classifiers is skin cancer classification on dermoscopic images. However, classifier evaluation is often limited to holdout data which can mask common shortcomings such as susceptibility to confounding factors. To increase clinical applicability, it is necessary to thoroughly evaluate such classifiers on out-of-distribution (OOD) data. OBJECTIVE: The objective of the study was to establish a dermoscopic skin cancer benchmark in which classifier robustness to OOD data can be measured. METHODS: Using a proprietary dermoscopic image database and a set of image transformations, we create an OOD robustness benchmark and evaluate the robustness of four different convolutional neural network (CNN) architectures on it. RESULTS: The benchmark contains three data sets-Skin Archive Munich (SAM), SAM-corrupted (SAM-C) and SAM-perturbed (SAM-P)-and is publicly available for download. To maintain the benchmark's OOD status, ground truth labels are not provided and test results should be sent to us for assessment. The SAM data set contains 319 unmodified and biopsy-verified dermoscopic melanoma (n = 194) and nevus (n = 125) images. SAM-C and SAM-P contain images from SAM which were artificially modified to test a classifier against low-quality inputs and to measure its prediction stability over small image changes, respectively. All four CNNs showed susceptibility to corruptions and perturbations. CONCLUSIONS: This benchmark provides three data sets which allow for OOD testing of binary skin cancer classifiers. Our classifier performance confirms the shortcomings of CNNs and provides a frame of reference. Altogether, this benchmark should facilitate a more thorough evaluation process and thereby enable the development of more robust skin cancer classifiers.


Asunto(s)
Benchmarking/normas , Redes Neurales de la Computación , Neoplasias Cutáneas/clasificación , Humanos
8.
Eur J Cancer ; 149: 94-101, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33838393

RESUMEN

BACKGROUND: Clinicians and pathologists traditionally use patient data in addition to clinical examination to support their diagnoses. OBJECTIVES: We investigated whether a combination of histologic whole slides image (WSI) analysis based on convolutional neural networks (CNNs) and commonly available patient data (age, sex and anatomical site of the lesion) in a binary melanoma/nevus classification task could increase the performance compared with CNNs alone. METHODS: We used 431 WSIs from two different laboratories and analysed the performance of classifiers that used the image or patient data individually or three common fusion techniques. Furthermore, we tested a naive combination of patient data and an image classifier: for cases interpreted as 'uncertain' (CNN output score <0.7), the decision of the CNN was replaced by the decision of the patient data classifier. RESULTS: The CNN on its own achieved the best performance (mean ± standard deviation of five individual runs) with AUROC of 92.30% ± 0.23% and balanced accuracy of 83.17% ± 0.38%. While the classification performance was not significantly improved in general by any of the tested fusions, naive strategy of replacing the image classifier with the patient data classifier on slides with low output scores improved balanced accuracy to 86.72% ± 0.36%. CONCLUSION: In most cases, the CNN on its own was so accurate that patient data integration did not provide any benefit. However, incorporating patient data for lesions that were classified by the CNN with low 'confidence' improved balanced accuracy.


Asunto(s)
Interpretación de Imagen Asistida por Computador , Melanoma/patología , Microscopía , Redes Neurales de la Computación , Nevo/patología , Neoplasias Cutáneas/patología , Adulto , Factores de Edad , Anciano , Bases de Datos Factuales , Femenino , Alemania , Humanos , Masculino , Melanoma/clasificación , Persona de Mediana Edad , Nevo/clasificación , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Estudios Retrospectivos , Factores Sexuales , Neoplasias Cutáneas/clasificación
9.
Eur J Cancer ; 145: 81-91, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33423009

RESUMEN

BACKGROUND: A basic requirement for artificial intelligence (AI)-based image analysis systems, which are to be integrated into clinical practice, is a high robustness. Minor changes in how those images are acquired, for example, during routine skin cancer screening, should not change the diagnosis of such assistance systems. OBJECTIVE: To quantify to what extent minor image perturbations affect the convolutional neural network (CNN)-mediated skin lesion classification and to evaluate three possible solutions for this problem (additional data augmentation, test-time augmentation, anti-aliasing). METHODS: We trained three commonly used CNN architectures to differentiate between dermoscopic melanoma and nevus images. Subsequently, their performance and susceptibility to minor changes ('brittleness') was tested on two distinct test sets with multiple images per lesion. For the first set, image changes, such as rotations or zooms, were generated artificially. The second set contained natural changes that stemmed from multiple photographs taken of the same lesions. RESULTS: All architectures exhibited brittleness on the artificial and natural test set. The three reviewed methods were able to decrease brittleness to varying degrees while still maintaining performance. The observed improvement was greater for the artificial than for the natural test set, where enhancements were minor. CONCLUSIONS: Minor image changes, relatively inconspicuous for humans, can have an effect on the robustness of CNNs differentiating skin lesions. By the methods tested here, this effect can be reduced, but not fully eliminated. Thus, further research to sustain the performance of AI classifiers is needed to facilitate the translation of such systems into the clinic.


Asunto(s)
Dermoscopía , Diagnóstico por Computador , Interpretación de Imagen Asistida por Computador , Melanoma/patología , Redes Neurales de la Computación , Nevo/patología , Neoplasias Cutáneas/patología , Diagnóstico Diferencial , Humanos , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados
10.
Front Med (Lausanne) ; 7: 177, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32435646

RESUMEN

Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39-75.66%) for dermatological and 73.80% (95% CI: 73.10-74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12-65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66-65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...