Search | VHL Search Portal

1.

Computer algorithms show potential for improving dermatologists' accuracy to diagnose cutaneous melanoma: Results of the International Skin Imaging Collaboration 2017.

Marchetti, Michael A; Liopyris, Konstantinos; Dusza, Stephen W; Codella, Noel C F; Gutman, David A; Helba, Brian; Kalloo, Aadi; Halpern, Allan C.

J Am Acad Dermatol ; 82(3): 622-627, 2020 Mar.

Article in English | MEDLINE | ID: mdl-31306724

ABSTRACT

BACKGROUND: Computer vision has promise in image-based cutaneous melanoma diagnosis but clinical utility is uncertain. OBJECTIVE: To determine if computer algorithms from an international melanoma detection challenge can improve dermatologists' accuracy in diagnosing melanoma. METHODS: In this cross-sectional study, we used 150 dermoscopy images (50 melanomas, 50 nevi, 50 seborrheic keratoses) from the test dataset of a melanoma detection challenge, along with algorithm results from 23 teams. Eight dermatologists and 9 dermatology residents classified dermoscopic lesion images in an online reader study and provided their confidence level. RESULTS: The top-ranked computer algorithm had an area under the receiver operating characteristic curve of 0.87, which was higher than that of the dermatologists (0.74) and residents (0.66) (P < .001 for all comparisons). At the dermatologists' overall sensitivity in classification of 76.0%, the algorithm had a superior specificity (85.0% vs. 72.6%, P = .001). Imputation of computer algorithm classifications into dermatologist evaluations with low confidence ratings (26.6% of evaluations) increased dermatologist sensitivity from 76.0% to 80.8% and specificity from 72.6% to 72.8%. LIMITATIONS: Artificial study setting lacking the full spectrum of skin lesions as well as clinical metadata. CONCLUSION: Accumulating evidence suggests that deep neural networks can classify skin images of melanoma and its benign mimickers with high accuracy and potentially improve human performance.

Subject(s)

Deep Learning , Dermoscopy/methods , Image Interpretation, Computer-Assisted/methods , Melanoma/diagnosis , Skin Neoplasms/diagnosis , Colombia , Cross-Sectional Studies , Dermatologists/statistics & numerical data , Dermoscopy/statistics & numerical data , Diagnosis, Differential , Humans , International Cooperation , Internship and Residency/statistics & numerical data , Israel , Keratosis, Seborrheic/diagnosis , Melanoma/pathology , Nevus/diagnosis , ROC Curve , Skin/diagnostic imaging , Skin/pathology , Skin Neoplasms/pathology , Spain , United States

2.

Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.

Tschandl, Philipp; Codella, Noel; Akay, Bengü Nisa; Argenziano, Giuseppe; Braun, Ralph P; Cabo, Horacio; Gutman, David; Halpern, Allan; Helba, Brian; Hofmann-Wellenhof, Rainer; Lallas, Aimilios; Lapins, Jan; Longo, Caterina; Malvehy, Josep; Marchetti, Michael A; Marghoob, Ashfaq; Menzies, Scott; Oakley, Amanda; Paoli, John; Puig, Susana; Rinner, Christoph; Rosendahl, Cliff; Scope, Alon; Sinz, Christoph; Soyer, H Peter; Thomas, Luc; Zalaudek, Iris; Kittler, Harald.

Lancet Oncol ; 20(7): 938-947, 2019 07.

Article in English | MEDLINE | ID: mdl-31201137

ABSTRACT

BACKGROUND: Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. METHODS: For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10â015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms. FINDINGS: Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001). INTERPRETATION: State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research. FUNDING: None.

Subject(s)

Algorithms , Dermoscopy , Internet , Machine Learning , Pigmentation Disorders/pathology , Skin Neoplasms/pathology , Adult , Female , Humans , Male , Reproducibility of Results , Retrospective Studies

3.

Machine learning derived segmentation of phase velocity encoded cardiovascular magnetic resonance for fully automated aortic flow quantification.

Bratt, Alex; Kim, Jiwon; Pollie, Meridith; Beecy, Ashley N; Tehrani, Nathan H; Codella, Noel; Perez-Johnston, Rocio; Palumbo, Maria Chiara; Alakbarli, Javid; Colizza, Wayne; Drexler, Ian R; Azevedo, Clerio F; Kim, Raymond J; Devereux, Richard B; Weinsaft, Jonathan W.

J Cardiovasc Magn Reson ; 21(1): 1, 2019 01 07.

Article in English | MEDLINE | ID: mdl-30612574

ABSTRACT

BACKGROUND: Phase contrast (PC) cardiovascular magnetic resonance (CMR) is widely employed for flow quantification, but analysis typically requires time consuming manual segmentation which can require human correction. Advances in machine learning have markedly improved automated processing, but have yet to be applied to PC-CMR. This study tested a novel machine learning model for fully automated analysis of PC-CMR aortic flow. METHODS: A machine learning model was designed to track aortic valve borders based on neural network approaches. The model was trained in a derivation cohort encompassing 150 patients who underwent clinical PC-CMR then compared to manual and commercially-available automated segmentation in a prospective validation cohort. Further validation testing was performed in an external cohort acquired from a different site/CMR vendor. RESULTS: Among 190 coronary artery disease patients prospectively undergoing CMR on commercial scanners (84% 1.5T, 16% 3T), machine learning segmentation was uniformly successful, requiring no human intervention: Segmentation time was < 0.01 min/case (1.2 min for entire dataset); manual segmentation required 3.96 ± 0.36 min/case (12.5 h for entire dataset). Correlations between machine learning and manual segmentation-derived flow approached unity (r = 0.99, p < 0.001). Machine learning yielded smaller absolute differences with manual segmentation than did commercial automation (1.85 ± 1.80 vs. 3.33 ± 3.18 mL, p < 0.01): Nearly all (98%) of cases differed by ≤5 mL between machine learning and manual methods. Among patients without advanced mitral regurgitation, machine learning correlated well (r = 0.63, p < 0.001) and yielded small differences with cine-CMR stroke volume (∆ 1.3 ± 17.7 mL, p = 0.36). Among advanced mitral regurgitation patients, machine learning yielded lower stroke volume than did volumetric cine-CMR (∆ 12.6 ± 20.9 mL, p = 0.005), further supporting validity of this method. Among the external validation cohort (n = 80) acquired using a different CMR vendor, the algorithm yielded equivalently small differences (∆ 1.39 ± 1.77 mL, p = 0.4) and high correlations (r = 0.99, p < 0.001) with manual segmentation, including similar results in 20 patients with bicuspid or stenotic aortic valve pathology (∆ 1.71 ± 2.25 mL, p = 0.25). CONCLUSION: Fully automated machine learning PC-CMR segmentation performs robustly for aortic flow quantification - yielding rapid segmentation, small differences with manual segmentation, and identification of differential forward/left ventricular volumetric stroke volume in context of concomitant mitral regurgitation. Findings support use of machine learning for analysis of large scale CMR datasets.

Subject(s)

Aorta/diagnostic imaging , Aortic Valve/diagnostic imaging , Heart Diseases/diagnostic imaging , Hemodynamics , Machine Learning , Magnetic Resonance Imaging, Cine , Myocardial Perfusion Imaging/methods , Aged , Aorta/physiopathology , Aortic Valve/physiopathology , Automation , Blood Flow Velocity , Female , Heart Diseases/physiopathology , Humans , Image Interpretation, Computer-Assisted , Male , Middle Aged , Predictive Value of Tests , Proof of Concept Study , Prospective Studies , Reproducibility of Results , Retrospective Studies , United States

4.

Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images.

Marchetti, Michael A; Codella, Noel C F; Dusza, Stephen W; Gutman, David A; Helba, Brian; Kalloo, Aadi; Mishra, Nabin; Carrera, Cristina; Celebi, M Emre; DeFazio, Jennifer L; Jaimes, Natalia; Marghoob, Ashfaq A; Quigley, Elizabeth; Scope, Alon; Yélamos, Oriol; Halpern, Allan C.

J Am Acad Dermatol ; 78(2): 270-277.e1, 2018 02.

Article in English | MEDLINE | ID: mdl-28969863

ABSTRACT

BACKGROUND: Computer vision may aid in melanoma detection. OBJECTIVE: We sought to compare melanoma diagnostic accuracy of computer algorithms to dermatologists using dermoscopic images. METHODS: We conducted a cross-sectional study using 100 randomly selected dermoscopic images (50 melanomas, 44 nevi, and 6 lentigines) from an international computer vision melanoma challenge dataset (n = 379), along with individual algorithm results from 25 teams. We used 5 methods (nonlearned and machine learning) to combine individual automated predictions into "fusion" algorithms. In a companion study, 8 dermatologists classified the lesions in the 100 images as either benign or malignant. RESULTS: The average sensitivity and specificity of dermatologists in classification was 82% and 59%. At 82% sensitivity, dermatologist specificity was similar to the top challenge algorithm (59% vs. 62%, P = .68) but lower than the best-performing fusion algorithm (59% vs. 76%, P = .02). Receiver operating characteristic area of the top fusion algorithm was greater than the mean receiver operating characteristic area of dermatologists (0.86 vs. 0.71, P = .001). LIMITATIONS: The dataset lacked the full spectrum of skin lesions encountered in clinical practice, particularly banal lesions. Readers and algorithms were not provided clinical data (eg, age or lesion history/symptoms). Results obtained using our study design cannot be extrapolated to clinical practice. CONCLUSION: Deep learning computer vision systems classified melanoma dermoscopy images with accuracy that exceeded some but not all dermatologists.

Subject(s)

Algorithms , Dermatologists , Dermoscopy , Lentigo/diagnostic imaging , Melanoma/diagnosis , Nevus/diagnostic imaging , Skin Neoplasms/diagnostic imaging , Congresses as Topic , Cross-Sectional Studies , Diagnosis, Computer-Assisted , Humans , Machine Learning , Melanoma/pathology , ROC Curve , Skin Neoplasms/pathology

5.

BCN20000: Dermoscopic Lesions in the Wild.

Hernández-Pérez, Carlos; Combalia, Marc; Podlipnik, Sebastian; Codella, Noel C F; Rotemberg, Veronica; Halpern, Allan C; Reiter, Ofer; Carrera, Cristina; Barreiro, Alicia; Helba, Brian; Puig, Susana; Vilaplana, Veronica; Malvehy, Josep.

Sci Data ; 11(1): 641, 2024 Jun 17.

Article in English | MEDLINE | ID: mdl-38886204

ABSTRACT

Advancements in dermatological artificial intelligence research require high-quality and comprehensive datasets that mirror real-world clinical scenarios. We introduce a collection of 18,946 dermoscopic images spanning from 2010 to 2016, collated at the Hospital Clínic in Barcelona, Spain. The BCN20000 dataset aims to address the problem of unconstrained classification of dermoscopic images of skin cancer, including lesions in hard-to-diagnose locations such as those found in nails and mucosa, large lesions which do not fit in the aperture of the dermoscopy device, and hypo-pigmented lesions. Our dataset covers eight key diagnostic categories in dermoscopy, providing a diverse range of lesions for artificial intelligence model training. Furthermore, a ninth out-of-distribution (OOD) class is also present on the test set, comprised of lesions which could not be distinctively classified as any of the others. By providing a comprehensive collection of varied images, BCN20000 helps bridge the gap between the training data for machine learning models and the day-to-day practice of medical practitioners. Additionally, we present a set of baseline classifiers based on state-of-the-art neural networks, which can be extended by other researchers for further experimentation.

Subject(s)

Dermoscopy , Skin Neoplasms , Humans , Skin Neoplasms/diagnostic imaging , Spain , Neural Networks, Computer , Artificial Intelligence , Machine Learning

6.

Expert Agreement on the Presence and Spatial Localization of Melanocytic Features in Dermoscopy.

Liopyris, Konstantinos; Navarrete-Dechent, Cristian; Marchetti, Michael A; Rotemberg, Veronica; Apalla, Zoe; Argenziano, Giuseppe; Blum, Andreas; Braun, Ralph P; Carrera, Cristina; Codella, Noel C F; Combalia, Marc; Dusza, Stephen W; Gutman, David A; Helba, Brian; Hofmann-Wellenhof, Rainer; Jaimes, Natalia; Kittler, Harald; Kose, Kivanc; Lallas, Aimilios; Longo, Caterina; Malvehy, Josep; Menzies, Scott; Nelson, Kelly C; Paoli, John; Puig, Susana; Rabinovitz, Harold S; Rishpon, Ayelet; Russo, Teresa; Scope, Alon; Soyer, H Peter; Stein, Jennifer A; Stolz, Willhelm; Sgouros, Dimitrios; Stratigos, Alexander J; Swanson, David L; Thomas, Luc; Tschandl, Philipp; Zalaudek, Iris; Weber, Jochen; Halpern, Allan C; Marghoob, Ashfaq A.

J Invest Dermatol ; 144(3): 531-539.e13, 2024 Mar.

Article in English | MEDLINE | ID: mdl-37689267

ABSTRACT

Dermoscopy aids in melanoma detection; however, agreement on dermoscopic features, including those of high clinical relevance, remains poor. In this study, we attempted to evaluate agreement among experts on exemplar images not only for the presence of melanocytic-specific features but also for spatial localization. This was a cross-sectional, multicenter, observational study. Dermoscopy images exhibiting at least 1 of 31 melanocytic-specific features were submitted by 25 world experts as exemplars. Using a web-based platform that allows for image markup of specific contrast-defined regions (superpixels), 20 expert readers annotated 248 dermoscopic images in collections of 62 images. Each collection was reviewed by five independent readers. A total of 4,507 feature observations were performed. Good-to-excellent agreement was found for 14 of 31 features (45.2%), with eight achieving excellent agreement (Gwet's AC >0.75) and seven of them being melanoma-specific features. These features were peppering/granularity (0.91), shiny white streaks (0.89), typical pigment network (0.83), blotch irregular (0.82), negative network (0.81), irregular globules (0.78), dotted vessels (0.77), and blue-whitish veil (0.76). By utilizing an exemplar dataset, a good-to-excellent agreement was found for 14 features that have previously been shown useful in discriminating nevi from melanoma. All images are public (www.isic-archive.com) and can be used for education, scientific communication, and machine learning experiments.

Subject(s)

Melanoma , Skin Neoplasms , Humans , Melanoma/diagnostic imaging , Skin Neoplasms/diagnostic imaging , Dermoscopy/methods , Cross-Sectional Studies , Melanocytes

7.

A reinforcement learning model for AI-based decision support in skin cancer.

Barata, Catarina; Rotemberg, Veronica; Codella, Noel C F; Tschandl, Philipp; Rinner, Christoph; Akay, Bengu Nisa; Apalla, Zoe; Argenziano, Giuseppe; Halpern, Allan; Lallas, Aimilios; Longo, Caterina; Malvehy, Josep; Puig, Susana; Rosendahl, Cliff; Soyer, H Peter; Zalaudek, Iris; Kittler, Harald.

Nat Med ; 29(8): 1941-1946, 2023 08.

Article in English | MEDLINE | ID: mdl-37501017

ABSTRACT

We investigated whether human preferences hold the potential to improve diagnostic artificial intelligence (AI)-based decision support using skin cancer diagnosis as a use case. We utilized nonuniform rewards and penalties based on expert-generated tables, balancing the benefits and harms of various diagnostic errors, which were applied using reinforcement learning. Compared with supervised learning, the reinforcement learning model improved the sensitivity for melanoma from 61.4% to 79.5% (95% confidence interval (CI): 73.5-85.6%) and for basal cell carcinoma from 79.4% to 87.1% (95% CI: 80.3-93.9%). AI overconfidence was also reduced while simultaneously maintaining accuracy. Reinforcement learning increased the rate of correct diagnoses made by dermatologists by 12.0% (95% CI: 8.8-15.1%) and improved the rate of optimal management decisions from 57.4% to 65.3% (95% CI: 61.7-68.9%). We further demonstrated that the reward-adjusted reinforcement learning model and a threshold-based model outperformed naïve supervised learning in various clinical scenarios. Our findings suggest the potential for incorporating human preferences into image-based diagnostic algorithms.

Subject(s)

Carcinoma, Basal Cell , Melanoma , Skin Neoplasms , Humans , Artificial Intelligence , Algorithms , Skin Neoplasms/diagnosis , Skin Neoplasms/pathology , Melanoma/diagnosis , Melanoma/pathology , Carcinoma, Basal Cell/diagnosis

8.

Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 International Skin Imaging Collaboration Grand Challenge.

Combalia, Marc; Codella, Noel; Rotemberg, Veronica; Carrera, Cristina; Dusza, Stephen; Gutman, David; Helba, Brian; Kittler, Harald; Kurtansky, Nicholas R; Liopyris, Konstantinos; Marchetti, Michael A; Podlipnik, Sebastian; Puig, Susana; Rinner, Christoph; Tschandl, Philipp; Weber, Jochen; Halpern, Allan; Malvehy, Josep.

Lancet Digit Health ; 4(5): e330-e339, 2022 05.

Article in English | MEDLINE | ID: mdl-35461690

ABSTRACT

BACKGROUND: Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy. METHODS: We designed a large dermoscopic image classification challenge to quantify the performance of machine learning algorithms for the task of skin cancer classification from dermoscopic images, and how this performance is affected by shifts in statistical distributions of data, disease categories not represented in training datasets, and imaging or lesion artifacts. Factors that might be beneficial to performance, such as clinical metadata and external training data collected by challenge participants, were also evaluated. 25 331 training images collected from two datasets (in Vienna [HAM10000] and Barcelona [BCN20000]) between Jan 1, 2000, and Dec 31, 2018, across eight skin diseases, were provided to challenge participants to design appropriate algorithms. The trained algorithms were then tested for balanced accuracy against the HAM10000 and BCN20000 test datasets and data from countries not included in the training dataset (Turkey, New Zealand, Sweden, and Argentina). Test datasets contained images of all diagnostic categories available in training plus other diagnoses not included in training data (not trained category). We compared the performance of the algorithms against that of 18 dermatologists in a simulated setting that reflected intended clinical use. FINDINGS: 64 teams submitted 129 state-of-the-art algorithm predictions on a test set of 8238 images. The best performing algorithm achieved 58·8% balanced accuracy on the BCN20000 data, which was designed to better reflect realistic clinical scenarios, compared with 82·0% balanced accuracy on HAM10000, which was used in a previously published benchmark. Shifted statistical distributions and disease categories not included in training data contributed to decreases in accuracy. Image artifacts, including hair, pen markings, ulceration, and imaging source institution, decreased accuracy in a complex manner that varied based on the underlying diagnosis. When comparing algorithms to expert dermatologists (2460 ratings on 1269 images), algorithms performed better than experts in most categories, except for actinic keratoses (similar accuracy on average) and images from categories not included in training data (26% correct for experts vs 6% correct for algorithms, p<0·0001). For the top 25 submitted algorithms, 47·1% of the images from categories not included in training data were misclassified as malignant diagnoses, which would lead to a substantial number of unnecessary biopsies if current state-of-the-art AI technologies were clinically deployed. INTERPRETATION: We have identified specific deficiencies and safety issues in AI diagnostic systems for skin cancer that should be addressed in future diagnostic evaluation protocols to improve safety and reliability in clinical practice. FUNDING: Melanoma Research Alliance and La Marató de TV3.

Subject(s)

Melanoma , Skin Neoplasms , Artificial Intelligence , Dermoscopy/methods , Humans , Melanoma/diagnostic imaging , Melanoma/pathology , Reproducibility of Results , Skin Neoplasms/diagnostic imaging , Skin Neoplasms/pathology

9.

Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology: CLEAR Derm Consensus Guidelines From the International Skin Imaging Collaboration Artificial Intelligence Working Group.

Daneshjou, Roxana; Barata, Catarina; Betz-Stablein, Brigid; Celebi, M Emre; Codella, Noel; Combalia, Marc; Guitera, Pascale; Gutman, David; Halpern, Allan; Helba, Brian; Kittler, Harald; Kose, Kivanc; Liopyris, Konstantinos; Malvehy, Josep; Seog, Han Seung; Soyer, H Peter; Tkaczyk, Eric R; Tschandl, Philipp; Rotemberg, Veronica.

JAMA Dermatol ; 158(1): 90-96, 2022 Jan 01.

Article in English | MEDLINE | ID: mdl-34851366

ABSTRACT

IMPORTANCE: The use of artificial intelligence (AI) is accelerating in all aspects of medicine and has the potential to transform clinical care and dermatology workflows. However, to develop image-based algorithms for dermatology applications, comprehensive criteria establishing development and performance evaluation standards are required to ensure product fairness, reliability, and safety. OBJECTIVE: To consolidate limited existing literature with expert opinion to guide developers and reviewers of dermatology AI. EVIDENCE REVIEW: In this consensus statement, the 19 members of the International Skin Imaging Collaboration AI working group volunteered to provide a consensus statement. A systematic PubMed search was performed of English-language articles published between December 1, 2008, and August 24, 2021, for "artificial intelligence" and "reporting guidelines," as well as other pertinent studies identified by the expert panel. Factors that were viewed as critical to AI development and performance evaluation were included and underwent 2 rounds of electronic discussion to achieve consensus. FINDINGS: A checklist of items was developed that outlines best practices of image-based AI development and assessment in dermatology. CONCLUSIONS AND RELEVANCE: Clinically effective AI needs to be fair, reliable, and safe; this checklist of best practices will help both developers and reviewers achieve this goal.

Subject(s)

Artificial Intelligence , Dermatology , Checklist , Consensus , Humans , Reproducibility of Results

10.

A radial self-calibrated (RASCAL) generalized autocalibrating partially parallel acquisition (GRAPPA) method using weight interpolation.

Codella, Noel C F; Spincemaille, Pascal; Prince, Martin; Wang, Yi.

NMR Biomed ; 24(7): 844-54, 2011 Aug.

Article in English | MEDLINE | ID: mdl-21834008

ABSTRACT

A generalized autocalibrating partially parallel acquisition (GRAPPA) method for radial k-space sampling is presented that calculates GRAPPA weights without synthesized or acquired calibration data. Instead, GRAPPA weights are fitted to the undersampled data as if they were the calibration data. Because the relative k-space shifts associated with these GRAPPA weights vary for a radial trajectory, new GRAPPA weights can be resampled for arbitrary shifts through interpolation, which are then used to generate missing projections between the acquired projections. The method is demonstrated in phantoms and in abdominal and brain imaging. Image quality is similar to radial GRAPPA using fully sampled calibration data, and improved relative to a previously described self-calibrated radial GRAPPA technique.

Subject(s)

Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Adult , Algorithms , Brain/anatomy & histology , Brain Mapping/methods , Calibration , Female , Humans , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Male , Middle Aged , Phantoms, Imaging , Reproducibility of Results , Sensitivity and Specificity , Young Adult

11.

A patient-centric dataset of images and metadata for identifying melanomas using clinical context.

Rotemberg, Veronica; Kurtansky, Nicholas; Betz-Stablein, Brigid; Caffery, Liam; Chousakos, Emmanouil; Codella, Noel; Combalia, Marc; Dusza, Stephen; Guitera, Pascale; Gutman, David; Halpern, Allan; Helba, Brian; Kittler, Harald; Kose, Kivanc; Langer, Steve; Lioprys, Konstantinos; Malvehy, Josep; Musthaq, Shenara; Nanda, Jabpani; Reiter, Ofer; Shih, George; Stratigos, Alexander; Tschandl, Philipp; Weber, Jochen; Soyer, H Peter.

Sci Data ; 8(1): 34, 2021 01 28.

Article in English | MEDLINE | ID: mdl-33510154

ABSTRACT

Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another. This patient-level contextual information is frequently used by clinicians to diagnose melanoma and is especially useful in ruling out false positives in patients with many atypical nevi. The dataset represents 2,056 patients (20.8% with at least one melanoma, 79.2% with zero melanomas) from three continents with an average of 16 lesions per patient, consisting of 33,126 dermoscopic images and 584 (1.8%) histopathologically confirmed melanomas compared with benign melanoma mimickers.

Subject(s)

Melanoma , Skin Neoplasms , Artificial Intelligence , Humans , Melanoma/diagnostic imaging , Melanoma/pathology , Melanoma/physiopathology , Metadata , Skin/pathology , Skin Neoplasms/diagnostic imaging , Skin Neoplasms/pathology , Skin Neoplasms/physiopathology

12.

Respiratory and cardiac self-gated free-breathing cardiac CINE imaging with multiecho 3D hybrid radial SSFP acquisition.

Liu, Jing; Spincemaille, Pascal; Codella, Noel C F; Nguyen, Thanh D; Prince, Martin R; Wang, Yi.

Magn Reson Med ; 63(5): 1230-7, 2010 May.

Article in English | MEDLINE | ID: mdl-20432294

ABSTRACT

A respiratory and cardiac self-gated free-breathing three-dimensional cine steady-state free precession imaging method using multiecho hybrid radial sampling is presented. Cartesian mapping of the k-space center along the slice encoding direction provides intensity-weighted position information, from which both respiratory and cardiac motions are derived. With in plan radial sampling acquired at every pulse repetition time, no extra scan time is required for sampling the k-space center. Temporal filtering based on density compensation is used for radial reconstruction to achieve high signal-to-noise ratio and contrast-to-noise ratio. High correlation between the self-gating signals and external gating signals is demonstrated. This respiratory and cardiac self-gated, free-breathing, three-dimensional, radial cardiac cine imaging technique provides image quality comparable to that acquired with the multiple breath-hold two-dimensional Cartesian steady-state free precession technique in short-axis, four-chamber, and two-chamber orientations. Functional measurements from the three-dimensional cardiac short axis cine images are found to be comparable to those obtained using the standard two-dimensional technique.

Subject(s)

Algorithms , Cardiac-Gated Imaging Techniques/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Magnetic Resonance Imaging, Cine/methods , Respiratory-Gated Imaging Techniques/methods , Adult , Female , Humans , Male , Reproducibility of Results , Respiratory Mechanics , Sensitivity and Specificity

13.

Rapid and accurate left ventricular chamber quantification using a novel CMR segmentation algorithm: a clinical validation study.

Codella, Noel C F; Cham, Matthew D; Wong, Richard; Chu, Christopher; Min, James K; Prince, Martin R; Wang, Yi; Weinsaft, Jonathan W.

J Magn Reson Imaging ; 31(4): 845-53, 2010 Apr.

Article in English | MEDLINE | ID: mdl-20373428

ABSTRACT

PURPOSE: To evaluate the clinical performance of a novel automated left ventricle (LV) segmentation algorithm (LV-METRIC) that involves no geometric assumptions. MATERIALS AND METHODS: LV-METRIC and manual tracing (MT) were used independently to quantify LV volumes and LVEF (ejection fraction) for 151 consecutive patients who underwent cine-CMR (steady-state free precession). Phase contrast imaging was used to independently measure stroke volume. RESULTS: LV-METRIC was successful in all cases. Mean LVEF was within 1 point of MT (Delta 0.6 +/- 2.3%, P < 0.05), with smaller differences among patients with (0.5 +/- 2.5%) versus those without (0.9 +/- 2.3%; P = 0.01) advanced systolic dysfunction (LVEF

Subject(s)

Heart Ventricles/pathology , Magnetic Resonance Imaging/methods , Myocardium/pathology , Ventricular Function, Left , Adult , Aged , Algorithms , Automation , Female , Heart Ventricles/anatomy & histology , Humans , Image Processing, Computer-Assisted , Male , Middle Aged , Reproducibility of Results

14.

Impact of diastolic dysfunction severity on global left ventricular volumetric filling - assessment by automated segmentation of routine cine cardiovascular magnetic resonance.

Mendoza, Dorinna D; Codella, Noel C F; Wang, Yi; Prince, Martin R; Sethi, Sonia; Manoushagian, Shant J; Kawaji, Keigo; Min, James K; LaBounty, Troy M; Devereux, Richard B; Weinsaft, Jonathan W.

J Cardiovasc Magn Reson ; 12: 46, 2010 Jul 31.

Article in English | MEDLINE | ID: mdl-20673372

ABSTRACT

OBJECTIVES: To examine relationships between severity of echocardiography (echo) -evidenced diastolic dysfunction (DD) and volumetric filling by automated processing of routine cine cardiovascular magnetic resonance (CMR). BACKGROUND: Cine-CMR provides high-resolution assessment of left ventricular (LV) chamber volumes. Automated segmentation (LV-METRIC) yields LV filling curves by segmenting all short-axis images across all temporal phases. This study used cine-CMR to assess filling changes that occur with progressive DD. METHODS: 115 post-MI patients underwent CMR and echo within 1 day. LV-METRIC yielded multiple diastolic indices - E:A ratio, peak filling rate (PFR), time to peak filling rate (TPFR), and diastolic volume recovery (DVR80 - proportion of diastole required to recover 80% stroke volume). Echo was the reference for DD. RESULTS: LV-METRIC successfully generated LV filling curves in all patients. CMR indices were reproducible (< or = 1% inter-reader differences) and required minimal processing time (175 +/- 34 images/exam, 2:09 +/- 0:51 minutes). CMR E:A ratio decreased with grade 1 and increased with grades 2-3 DD. Diastolic filling intervals, measured by DVR80 or TPFR, prolonged with grade 1 and shortened with grade 3 DD, paralleling echo deceleration time (p < 0.001). PFR by CMR increased with DD grade, similar to E/e' (p < 0.001). Prolonged DVR80 identified 71% of patients with echo-evidenced grade 1 but no patients with grade 3 DD, and stroke-volume adjusted PFR identified 67% with grade 3 but none with grade 1 DD (matched specificity = 83%). The combination of DVR80 and PFR identified 53% of patients with grade 2 DD. Prolonged DVR80 was associated with grade 1 (OR 2.79, CI 1.65-4.05, p = 0.001) with a similar trend for grade 2 (OR 1.35, CI 0.98-1.74, p = 0.06), whereas high PFR was associated with grade 3 (OR 1.14, CI 1.02-1.25, p = 0.02) DD. CONCLUSIONS: Automated cine-CMR segmentation can discern LV filling changes that occur with increasing severity of echo-evidenced DD. Impaired relaxation is associated with prolonged filling intervals whereas restrictive filling is characterized by increased filling rates.

Subject(s)

Magnetic Resonance Imaging, Cine , Myocardial Infarction/complications , Ventricular Dysfunction, Left/diagnosis , Ventricular Dysfunction, Left/physiopathology , Aged , Automation , Diastole , Female , Humans , Male , Middle Aged , Myocardial Infarction/physiopathology , Severity of Illness Index , Ventricular Dysfunction, Left/etiology

15.

Human-computer collaboration for skin cancer recognition.

Tschandl, Philipp; Rinner, Christoph; Apalla, Zoe; Argenziano, Giuseppe; Codella, Noel; Halpern, Allan; Janda, Monika; Lallas, Aimilios; Longo, Caterina; Malvehy, Josep; Paoli, John; Puig, Susana; Rosendahl, Cliff; Soyer, H Peter; Zalaudek, Iris; Kittler, Harald.

Nat Med ; 26(8): 1229-1234, 2020 08.

Article in English | MEDLINE | ID: mdl-32572267

ABSTRACT

The rapid increase in telemedicine coupled with recent advances in diagnostic artificial intelligence (AI) create the imperative to consider the opportunities and risks of inserting AI-based support into new paradigms of care. Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows. We find that good quality AI-based support of clinical decision-making improves diagnostic accuracy over that of either AI or physicians alone, and that the least experienced clinicians gain the most from AI-based support. We further find that AI-based multiclass probabilities outperformed content-based image retrieval (CBIR) representations of AI in the mobile technology environment, and AI-based support had utility in simulations of second opinions and of telemedicine triage. In addition to demonstrating the potential benefits associated with good quality AI in the hands of non-expert clinicians, we find that faulty AI can mislead the entire spectrum of clinicians, including experts. Lastly, we show that insights derived from AI class-activation maps can inform improvements in human diagnosis. Together, our approach and findings offer a framework for future studies across the spectrum of image-based diagnostics to improve human-computer collaboration in clinical practice.

Subject(s)

Artificial Intelligence , Skin Neoplasms/diagnostic imaging , Telemedicine , User-Computer Interface , Clinical Decision-Making , Humans , Neural Networks, Computer , Physicians , Skin Neoplasms/pathology

16.

Dermoscopy Image Analysis: Overview and Future Directions.

Celebi, M Emre; Codella, Noel; Halpern, Allan.

IEEE J Biomed Health Inform ; 23(2): 474-478, 2019 03.

Article in English | MEDLINE | ID: mdl-30703051

ABSTRACT

Dermoscopy is a non-invasive skin imaging technique that permits visualization of features of pigmented melanocytic neoplasms that are not discernable by examination with the naked eye. While studies on the automated analysis of dermoscopy images date back to the late 1990s, because of various factors (lack of publicly available datasets, open-source software, computational power, etc.), the field progressed rather slowly in its first two decades. With the release of a large public dataset by the International Skin Imaging Collaboration in 2016, development of open-source software for convolutional neural networks, and the availability of inexpensive graphics processing units, dermoscopy image analysis has recently become a very active research field. In this paper, we present a brief overview of this exciting subfield of medical image analysis, primarily focusing on three aspects of it, namely, segmentation, feature extraction, and classification. We then provide future directions for researchers.

Subject(s)

Dermoscopy , Image Interpretation, Computer-Assisted , Humans , Melanoma/diagnostic imaging , Skin Neoplasms/diagnostic imaging

17.

The role of public challenges and data sets towards algorithm development, trust, and use in clinical practice.

Rotemberg, Veronica; Halpern, Allan; Dusza, Steven; Codella, Noel Cf.

Semin Cutan Med Surg ; 38(1): E38-E42, 2019 Mar 01.

Article in English | MEDLINE | ID: mdl-31051022

ABSTRACT

In the past decade, machine learning and artificial intelligence have made significant advancements in pattern analysis, including speech and natural language processing, image recognition, object detection, facial recognition, and action categorization. Indeed, in many of these applications, accuracy has reached or exceeded human levels of performance. Subsequently, a multitude of studies have begun to examine the application of these technologies to health care, and in particular, medical image analysis. Perhaps the most difficult subdomain involves skin imaging because of the lack of standards around imaging hardware, technique, color, and lighting conditions. In addition, unlike radiological images, skin image appearance can be significantly affected by skin tone as well as the broad range of diseases. Furthermore, automated algorithm development relies on large high-quality annotated image data sets that incorporate the breadth of this circumstantial and diagnostic variety. These issues, in combination with unique complexities regarding integrating artificial intelligence systems into a clinical workflow, have led to difficulty in using these systems to improve sensitivity and specificity of skin diagnostics in health care networks around the world. In this article, we summarize recent advancements in machine learning, with a focused perspective on the role of public challenges and data sets on the progression of these technologies in skin imaging. In addition, we highlight the remaining hurdles toward effective implementation of technologies to the clinical workflow and discuss how public challenges and data sets can catalyze the development of solutions.

Subject(s)

Algorithms , Artificial Intelligence , Benchmarking , Dermatology , Humans , Machine Learning

18.

Left ventricle: automated segmentation by using myocardial effusion threshold reduction and intravoxel computation at MR imaging.

Codella, Noel C F; Weinsaft, Jonathan W; Cham, Matthew D; Janik, Matthew; Prince, Martin R; Wang, Yi.

Radiology ; 248(3): 1004-12, 2008 Sep.

Article in English | MEDLINE | ID: mdl-18710989

ABSTRACT

UNLABELLED: This retrospective analysis of existing patient data had institutional review board approval and was performed in compliance with HIPAA. No informed consent was required. The purpose of the study was to develop and validate an algorithm for automated segmentation of the left ventricular (LV) cavity that accounts for papillary and/or trabecular muscles and partial voxels in cine magnetic resonance (MR) images, an algorithm called LV Myocardial Effusion Threshold Reduction with Intravoxel Computation (LV-METRIC). The algorithm was validated in biologic phantoms, and its results were compared with those of manual tracing, as well as those of a commercial automated segmentation software (MASS [MR Analytical Software System]), in 38 subjects. LV-METRIC accuracy in vitro was 98.7%. Among the 38 subjects studied, LV-METRIC and MASS ejection fraction estimations were highly correlated with manual tracing (R(2) = 0.97 and R(2) = 0.95, respectively). Ventricular volume estimations were smaller with LV-METRIC and larger with MASS than those calculated by using manual tracing, though all results were well correlated (R(2) = 0.99). LV-METRIC volume measurements without partial voxel interpolation were statistically equivalent to manual tracing results (P > .05). LV-METRIC had reduced intraobserver and interobserver variability compared with other methods. MASS required additional manual intervention in 58% of cases, whereas LV-METRIC required no additional corrections. LV-METRIC reliably and reproducibly measured LV volumes. SUPPLEMENTAL MATERIAL: http://radiology.rsnajnls.org/cgi/content/full/248/3/1004/DC1.

Subject(s)

Heart Ventricles/pathology , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Magnetic Resonance Imaging/methods , Pattern Recognition, Automated/methods , Ventricular Dysfunction, Left/diagnosis , Algorithms , Artificial Intelligence , Female , Humans , Male , Reproducibility of Results , Sensitivity and Specificity

19.

Effects of papillary muscles and trabeculae on left ventricular quantification: increased impact of methodological variability in patients with left ventricular hypertrophy.

Janik, Matthew; Cham, Matthew D; Ross, Michael I; Wang, Yi; Codella, Noel; Min, James K; Prince, Martin R; Manoushagian, Shant; Okin, Peter M; Devereux, Richard B; Weinsaft, Jonathan W.

J Hypertens ; 26(8): 1677-85, 2008 Aug.

Article in English | MEDLINE | ID: mdl-18622248

ABSTRACT

BACKGROUND: Accurate quantification of left ventricular mass and ejection fraction is important for patients with left ventricular hypertrophy. Although cardiac magnetic resonance imaging has been proposed as a standard for these indices, prior studies have variably included papillary muscles and trabeculae in myocardial volume. This study investigated the contribution of papillary muscles and trabeculae to left ventricular quantification in relation to the presence and pattern of hypertrophy. METHODS: Cardiac magnetic resonance quantification was performed on patients with concentric or eccentric hypertrophy and normal controls (20 per group) using two established methods that included papillary muscles and trabeculae in myocardium (method 1) or intracavitary (method 2) volumes. RESULTS: Among all patients, papillary muscles and trabeculae accounted for 10.5% of ventricular mass, with greater contribution with left ventricular hypertrophy than normals (12.6 vs. 6.2%, P < 0.001). Papillary muscles and trabeculae mass correlated with ventricular wall mass (r = 0.53) and end-diastolic volume (r = 0.52; P < 0.001). Papillary muscles and trabeculae inclusion in myocardium (method 1) yielded smaller differences with a standard of mass quantification from linear ventricular measurements than did method 2 (P < 0.001). Method 1 in comparison with method 2 yielded differences in left ventricular mass, ejection fraction and volume in all groups, especially in patients with hypertrophy: the difference in ventricular mass index was three-fold to six-fold greater in hypertrophy than normal groups (P < 0.001). Difference in ejection fraction, greatest in concentric hypertrophy (P < 0.001), was independently related to papillary muscles and trabeculae mass, ventricular wall mass, and smaller ventricular volume (R = 0.56, P < 0.001). CONCLUSION: Established cardiac magnetic resonance methods yield differences in left ventricular quantification due to variable exclusion of papillary muscles and trabeculae from myocardium. The relative impact of papillary muscles and trabeculae exclusion on calculated mass and ejection fraction is increased among patients with hypertrophy-associated left ventricular remodeling.

Subject(s)

Hypertrophy, Left Ventricular/pathology , Magnetic Resonance Imaging/methods , Magnetic Resonance Imaging/standards , Myocardium/pathology , Papillary Muscles/pathology , Aged , Female , Humans , Male , Middle Aged , Predictive Value of Tests , Reproducibility of Results , Ventricular Remodeling

20.

Segmentation of Both Diseased and Healthy Skin From Clinical Photographs in a Primary Care Setting.

Codella, Noel C F; Anderson, Daren; Philips, Tyler; Porto, Anthony; Massey, Kevin; Snowdon, Jane; Feris, Rogerio; Smith, John.

Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 3414-3417, 2018 Jul.

Article in English | MEDLINE | ID: mdl-30441121

ABSTRACT

This work presents the first segmentation study of both diseased and healthy skin in standard camera photographs from a clinical environment. Challenges arise from varied lighting conditions, skin types, backgrounds, and pathological states. For study, 400 clinical photographs (with skin segmentation masks) representing various pathological states of skin are retrospectively collected from a primary care network. 100 images are used for training and fine-tuning, and 300 are used for evaluation. This distribution between training and test partitions is chosen to reflect the difficulty in amassing large quantities of labeled data in this domain. A deep learning approach is used, and 3 public segmentation datasets of healthy skin are collected to study the potential benefits of pretraining. Two variants of U-Net are evaluated: U-Net and Dense Residual U-Net. We find that Dense Residual U-Nets have a 7.8% improvement in Jaccard, compared to classical U-Net architectures (0.55 vs. 0.51 Jaccard), for direct transfer, where fine-tuning data is not utilized. However, U-Net outperforms Dense Residual U-Net for both direct training (0.83 vs. 0.80) and fine-tuning (0.89 vs. 0.88). The stark performance improvement with fine-tuning compared to direct transfer and direct training emphasizes both the need for adequate representative data of diseased skin, and the utility of other publicly available data sources for this task.

Subject(s)

Primary Health Care , Skin , Deep Learning , Image Processing, Computer-Assisted , Retrospective Studies

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL