RESUMEN
BACKGROUND: Computer vision has promise in image-based cutaneous melanoma diagnosis but clinical utility is uncertain. OBJECTIVE: To determine if computer algorithms from an international melanoma detection challenge can improve dermatologists' accuracy in diagnosing melanoma. METHODS: In this cross-sectional study, we used 150 dermoscopy images (50 melanomas, 50 nevi, 50 seborrheic keratoses) from the test dataset of a melanoma detection challenge, along with algorithm results from 23 teams. Eight dermatologists and 9 dermatology residents classified dermoscopic lesion images in an online reader study and provided their confidence level. RESULTS: The top-ranked computer algorithm had an area under the receiver operating characteristic curve of 0.87, which was higher than that of the dermatologists (0.74) and residents (0.66) (P < .001 for all comparisons). At the dermatologists' overall sensitivity in classification of 76.0%, the algorithm had a superior specificity (85.0% vs. 72.6%, P = .001). Imputation of computer algorithm classifications into dermatologist evaluations with low confidence ratings (26.6% of evaluations) increased dermatologist sensitivity from 76.0% to 80.8% and specificity from 72.6% to 72.8%. LIMITATIONS: Artificial study setting lacking the full spectrum of skin lesions as well as clinical metadata. CONCLUSION: Accumulating evidence suggests that deep neural networks can classify skin images of melanoma and its benign mimickers with high accuracy and potentially improve human performance.
Asunto(s)
Aprendizaje Profundo , Dermoscopía/métodos , Interpretación de Imagen Asistida por Computador/métodos , Melanoma/diagnóstico , Neoplasias Cutáneas/diagnóstico , Colombia , Estudios Transversales , Dermatólogos/estadística & datos numéricos , Dermoscopía/estadística & datos numéricos , Diagnóstico Diferencial , Humanos , Cooperación Internacional , Internado y Residencia/estadística & datos numéricos , Israel , Queratosis Seborreica/diagnóstico , Melanoma/patología , Nevo/diagnóstico , Curva ROC , Piel/diagnóstico por imagen , Piel/patología , Neoplasias Cutáneas/patología , España , Estados UnidosRESUMEN
BACKGROUND: Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. METHODS: For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10â015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms. FINDINGS: Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001). INTERPRETATION: State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research. FUNDING: None.
Asunto(s)
Algoritmos , Dermoscopía , Internet , Aprendizaje Automático , Trastornos de la Pigmentación/patología , Neoplasias Cutáneas/patología , Adulto , Femenino , Humanos , Masculino , Reproducibilidad de los Resultados , Estudios RetrospectivosRESUMEN
BACKGROUND: Phase contrast (PC) cardiovascular magnetic resonance (CMR) is widely employed for flow quantification, but analysis typically requires time consuming manual segmentation which can require human correction. Advances in machine learning have markedly improved automated processing, but have yet to be applied to PC-CMR. This study tested a novel machine learning model for fully automated analysis of PC-CMR aortic flow. METHODS: A machine learning model was designed to track aortic valve borders based on neural network approaches. The model was trained in a derivation cohort encompassing 150 patients who underwent clinical PC-CMR then compared to manual and commercially-available automated segmentation in a prospective validation cohort. Further validation testing was performed in an external cohort acquired from a different site/CMR vendor. RESULTS: Among 190 coronary artery disease patients prospectively undergoing CMR on commercial scanners (84% 1.5T, 16% 3T), machine learning segmentation was uniformly successful, requiring no human intervention: Segmentation time was < 0.01 min/case (1.2 min for entire dataset); manual segmentation required 3.96 ± 0.36 min/case (12.5 h for entire dataset). Correlations between machine learning and manual segmentation-derived flow approached unity (r = 0.99, p < 0.001). Machine learning yielded smaller absolute differences with manual segmentation than did commercial automation (1.85 ± 1.80 vs. 3.33 ± 3.18 mL, p < 0.01): Nearly all (98%) of cases differed by ≤5 mL between machine learning and manual methods. Among patients without advanced mitral regurgitation, machine learning correlated well (r = 0.63, p < 0.001) and yielded small differences with cine-CMR stroke volume (∆ 1.3 ± 17.7 mL, p = 0.36). Among advanced mitral regurgitation patients, machine learning yielded lower stroke volume than did volumetric cine-CMR (∆ 12.6 ± 20.9 mL, p = 0.005), further supporting validity of this method. Among the external validation cohort (n = 80) acquired using a different CMR vendor, the algorithm yielded equivalently small differences (∆ 1.39 ± 1.77 mL, p = 0.4) and high correlations (r = 0.99, p < 0.001) with manual segmentation, including similar results in 20 patients with bicuspid or stenotic aortic valve pathology (∆ 1.71 ± 2.25 mL, p = 0.25). CONCLUSION: Fully automated machine learning PC-CMR segmentation performs robustly for aortic flow quantification - yielding rapid segmentation, small differences with manual segmentation, and identification of differential forward/left ventricular volumetric stroke volume in context of concomitant mitral regurgitation. Findings support use of machine learning for analysis of large scale CMR datasets.
Asunto(s)
Aorta/diagnóstico por imagen , Válvula Aórtica/diagnóstico por imagen , Cardiopatías/diagnóstico por imagen , Hemodinámica , Aprendizaje Automático , Imagen por Resonancia Cinemagnética , Imagen de Perfusión Miocárdica/métodos , Anciano , Aorta/fisiopatología , Válvula Aórtica/fisiopatología , Automatización , Velocidad del Flujo Sanguíneo , Femenino , Cardiopatías/fisiopatología , Humanos , Interpretación de Imagen Asistida por Computador , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Prueba de Estudio Conceptual , Estudios Prospectivos , Reproducibilidad de los Resultados , Estudios Retrospectivos , Estados UnidosRESUMEN
BACKGROUND: Computer vision may aid in melanoma detection. OBJECTIVE: We sought to compare melanoma diagnostic accuracy of computer algorithms to dermatologists using dermoscopic images. METHODS: We conducted a cross-sectional study using 100 randomly selected dermoscopic images (50 melanomas, 44 nevi, and 6 lentigines) from an international computer vision melanoma challenge dataset (n = 379), along with individual algorithm results from 25 teams. We used 5 methods (nonlearned and machine learning) to combine individual automated predictions into "fusion" algorithms. In a companion study, 8 dermatologists classified the lesions in the 100 images as either benign or malignant. RESULTS: The average sensitivity and specificity of dermatologists in classification was 82% and 59%. At 82% sensitivity, dermatologist specificity was similar to the top challenge algorithm (59% vs. 62%, P = .68) but lower than the best-performing fusion algorithm (59% vs. 76%, P = .02). Receiver operating characteristic area of the top fusion algorithm was greater than the mean receiver operating characteristic area of dermatologists (0.86 vs. 0.71, P = .001). LIMITATIONS: The dataset lacked the full spectrum of skin lesions encountered in clinical practice, particularly banal lesions. Readers and algorithms were not provided clinical data (eg, age or lesion history/symptoms). Results obtained using our study design cannot be extrapolated to clinical practice. CONCLUSION: Deep learning computer vision systems classified melanoma dermoscopy images with accuracy that exceeded some but not all dermatologists.
Asunto(s)
Algoritmos , Dermatólogos , Dermoscopía , Lentigo/diagnóstico por imagen , Melanoma/diagnóstico , Nevo/diagnóstico por imagen , Neoplasias Cutáneas/diagnóstico por imagen , Congresos como Asunto , Estudios Transversales , Diagnóstico por Computador , Humanos , Aprendizaje Automático , Melanoma/patología , Curva ROC , Neoplasias Cutáneas/patologíaRESUMEN
Advancements in dermatological artificial intelligence research require high-quality and comprehensive datasets that mirror real-world clinical scenarios. We introduce a collection of 18,946 dermoscopic images spanning from 2010 to 2016, collated at the Hospital Clínic in Barcelona, Spain. The BCN20000 dataset aims to address the problem of unconstrained classification of dermoscopic images of skin cancer, including lesions in hard-to-diagnose locations such as those found in nails and mucosa, large lesions which do not fit in the aperture of the dermoscopy device, and hypo-pigmented lesions. Our dataset covers eight key diagnostic categories in dermoscopy, providing a diverse range of lesions for artificial intelligence model training. Furthermore, a ninth out-of-distribution (OOD) class is also present on the test set, comprised of lesions which could not be distinctively classified as any of the others. By providing a comprehensive collection of varied images, BCN20000 helps bridge the gap between the training data for machine learning models and the day-to-day practice of medical practitioners. Additionally, we present a set of baseline classifiers based on state-of-the-art neural networks, which can be extended by other researchers for further experimentation.
Asunto(s)
Dermoscopía , Neoplasias Cutáneas , Humanos , Neoplasias Cutáneas/diagnóstico por imagen , España , Redes Neurales de la Computación , Inteligencia Artificial , Aprendizaje AutomáticoRESUMEN
Dermoscopy aids in melanoma detection; however, agreement on dermoscopic features, including those of high clinical relevance, remains poor. In this study, we attempted to evaluate agreement among experts on exemplar images not only for the presence of melanocytic-specific features but also for spatial localization. This was a cross-sectional, multicenter, observational study. Dermoscopy images exhibiting at least 1 of 31 melanocytic-specific features were submitted by 25 world experts as exemplars. Using a web-based platform that allows for image markup of specific contrast-defined regions (superpixels), 20 expert readers annotated 248 dermoscopic images in collections of 62 images. Each collection was reviewed by five independent readers. A total of 4,507 feature observations were performed. Good-to-excellent agreement was found for 14 of 31 features (45.2%), with eight achieving excellent agreement (Gwet's AC >0.75) and seven of them being melanoma-specific features. These features were peppering/granularity (0.91), shiny white streaks (0.89), typical pigment network (0.83), blotch irregular (0.82), negative network (0.81), irregular globules (0.78), dotted vessels (0.77), and blue-whitish veil (0.76). By utilizing an exemplar dataset, a good-to-excellent agreement was found for 14 features that have previously been shown useful in discriminating nevi from melanoma. All images are public (www.isic-archive.com) and can be used for education, scientific communication, and machine learning experiments.
Asunto(s)
Melanoma , Neoplasias Cutáneas , Humanos , Melanoma/diagnóstico por imagen , Neoplasias Cutáneas/diagnóstico por imagen , Dermoscopía/métodos , Estudios Transversales , MelanocitosRESUMEN
We investigated whether human preferences hold the potential to improve diagnostic artificial intelligence (AI)-based decision support using skin cancer diagnosis as a use case. We utilized nonuniform rewards and penalties based on expert-generated tables, balancing the benefits and harms of various diagnostic errors, which were applied using reinforcement learning. Compared with supervised learning, the reinforcement learning model improved the sensitivity for melanoma from 61.4% to 79.5% (95% confidence interval (CI): 73.5-85.6%) and for basal cell carcinoma from 79.4% to 87.1% (95% CI: 80.3-93.9%). AI overconfidence was also reduced while simultaneously maintaining accuracy. Reinforcement learning increased the rate of correct diagnoses made by dermatologists by 12.0% (95% CI: 8.8-15.1%) and improved the rate of optimal management decisions from 57.4% to 65.3% (95% CI: 61.7-68.9%). We further demonstrated that the reward-adjusted reinforcement learning model and a threshold-based model outperformed naïve supervised learning in various clinical scenarios. Our findings suggest the potential for incorporating human preferences into image-based diagnostic algorithms.
Asunto(s)
Carcinoma Basocelular , Melanoma , Neoplasias Cutáneas , Humanos , Inteligencia Artificial , Algoritmos , Neoplasias Cutáneas/diagnóstico , Neoplasias Cutáneas/patología , Melanoma/diagnóstico , Melanoma/patología , Carcinoma Basocelular/diagnósticoRESUMEN
IMPORTANCE: The use of artificial intelligence (AI) is accelerating in all aspects of medicine and has the potential to transform clinical care and dermatology workflows. However, to develop image-based algorithms for dermatology applications, comprehensive criteria establishing development and performance evaluation standards are required to ensure product fairness, reliability, and safety. OBJECTIVE: To consolidate limited existing literature with expert opinion to guide developers and reviewers of dermatology AI. EVIDENCE REVIEW: In this consensus statement, the 19 members of the International Skin Imaging Collaboration AI working group volunteered to provide a consensus statement. A systematic PubMed search was performed of English-language articles published between December 1, 2008, and August 24, 2021, for "artificial intelligence" and "reporting guidelines," as well as other pertinent studies identified by the expert panel. Factors that were viewed as critical to AI development and performance evaluation were included and underwent 2 rounds of electronic discussion to achieve consensus. FINDINGS: A checklist of items was developed that outlines best practices of image-based AI development and assessment in dermatology. CONCLUSIONS AND RELEVANCE: Clinically effective AI needs to be fair, reliable, and safe; this checklist of best practices will help both developers and reviewers achieve this goal.
Asunto(s)
Inteligencia Artificial , Dermatología , Lista de Verificación , Consenso , Humanos , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy. METHODS: We designed a large dermoscopic image classification challenge to quantify the performance of machine learning algorithms for the task of skin cancer classification from dermoscopic images, and how this performance is affected by shifts in statistical distributions of data, disease categories not represented in training datasets, and imaging or lesion artifacts. Factors that might be beneficial to performance, such as clinical metadata and external training data collected by challenge participants, were also evaluated. 25 331 training images collected from two datasets (in Vienna [HAM10000] and Barcelona [BCN20000]) between Jan 1, 2000, and Dec 31, 2018, across eight skin diseases, were provided to challenge participants to design appropriate algorithms. The trained algorithms were then tested for balanced accuracy against the HAM10000 and BCN20000 test datasets and data from countries not included in the training dataset (Turkey, New Zealand, Sweden, and Argentina). Test datasets contained images of all diagnostic categories available in training plus other diagnoses not included in training data (not trained category). We compared the performance of the algorithms against that of 18 dermatologists in a simulated setting that reflected intended clinical use. FINDINGS: 64 teams submitted 129 state-of-the-art algorithm predictions on a test set of 8238 images. The best performing algorithm achieved 58·8% balanced accuracy on the BCN20000 data, which was designed to better reflect realistic clinical scenarios, compared with 82·0% balanced accuracy on HAM10000, which was used in a previously published benchmark. Shifted statistical distributions and disease categories not included in training data contributed to decreases in accuracy. Image artifacts, including hair, pen markings, ulceration, and imaging source institution, decreased accuracy in a complex manner that varied based on the underlying diagnosis. When comparing algorithms to expert dermatologists (2460 ratings on 1269 images), algorithms performed better than experts in most categories, except for actinic keratoses (similar accuracy on average) and images from categories not included in training data (26% correct for experts vs 6% correct for algorithms, p<0·0001). For the top 25 submitted algorithms, 47·1% of the images from categories not included in training data were misclassified as malignant diagnoses, which would lead to a substantial number of unnecessary biopsies if current state-of-the-art AI technologies were clinically deployed. INTERPRETATION: We have identified specific deficiencies and safety issues in AI diagnostic systems for skin cancer that should be addressed in future diagnostic evaluation protocols to improve safety and reliability in clinical practice. FUNDING: Melanoma Research Alliance and La Marató de TV3.
Asunto(s)
Melanoma , Neoplasias Cutáneas , Inteligencia Artificial , Dermoscopía/métodos , Humanos , Melanoma/diagnóstico por imagen , Melanoma/patología , Reproducibilidad de los Resultados , Neoplasias Cutáneas/diagnóstico por imagen , Neoplasias Cutáneas/patologíaRESUMEN
A generalized autocalibrating partially parallel acquisition (GRAPPA) method for radial k-space sampling is presented that calculates GRAPPA weights without synthesized or acquired calibration data. Instead, GRAPPA weights are fitted to the undersampled data as if they were the calibration data. Because the relative k-space shifts associated with these GRAPPA weights vary for a radial trajectory, new GRAPPA weights can be resampled for arbitrary shifts through interpolation, which are then used to generate missing projections between the acquired projections. The method is demonstrated in phantoms and in abdominal and brain imaging. Image quality is similar to radial GRAPPA using fully sampled calibration data, and improved relative to a previously described self-calibrated radial GRAPPA technique.
Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Adulto , Algoritmos , Encéfalo/anatomía & histología , Mapeo Encefálico/métodos , Calibración , Femenino , Humanos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Masculino , Persona de Mediana Edad , Fantasmas de Imagen , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Adulto JovenRESUMEN
Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another. This patient-level contextual information is frequently used by clinicians to diagnose melanoma and is especially useful in ruling out false positives in patients with many atypical nevi. The dataset represents 2,056 patients (20.8% with at least one melanoma, 79.2% with zero melanomas) from three continents with an average of 16 lesions per patient, consisting of 33,126 dermoscopic images and 584 (1.8%) histopathologically confirmed melanomas compared with benign melanoma mimickers.
Asunto(s)
Melanoma , Neoplasias Cutáneas , Inteligencia Artificial , Humanos , Melanoma/diagnóstico por imagen , Melanoma/patología , Melanoma/fisiopatología , Metadatos , Piel/patología , Neoplasias Cutáneas/diagnóstico por imagen , Neoplasias Cutáneas/patología , Neoplasias Cutáneas/fisiopatologíaRESUMEN
A respiratory and cardiac self-gated free-breathing three-dimensional cine steady-state free precession imaging method using multiecho hybrid radial sampling is presented. Cartesian mapping of the k-space center along the slice encoding direction provides intensity-weighted position information, from which both respiratory and cardiac motions are derived. With in plan radial sampling acquired at every pulse repetition time, no extra scan time is required for sampling the k-space center. Temporal filtering based on density compensation is used for radial reconstruction to achieve high signal-to-noise ratio and contrast-to-noise ratio. High correlation between the self-gating signals and external gating signals is demonstrated. This respiratory and cardiac self-gated, free-breathing, three-dimensional, radial cardiac cine imaging technique provides image quality comparable to that acquired with the multiple breath-hold two-dimensional Cartesian steady-state free precession technique in short-axis, four-chamber, and two-chamber orientations. Functional measurements from the three-dimensional cardiac short axis cine images are found to be comparable to those obtained using the standard two-dimensional technique.
Asunto(s)
Algoritmos , Técnicas de Imagen Sincronizada Cardíacas/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Imagen por Resonancia Cinemagnética/métodos , Técnicas de Imagen Sincronizada Respiratorias/métodos , Adulto , Femenino , Humanos , Masculino , Reproducibilidad de los Resultados , Mecánica Respiratoria , Sensibilidad y EspecificidadRESUMEN
PURPOSE: To evaluate the clinical performance of a novel automated left ventricle (LV) segmentation algorithm (LV-METRIC) that involves no geometric assumptions. MATERIALS AND METHODS: LV-METRIC and manual tracing (MT) were used independently to quantify LV volumes and LVEF (ejection fraction) for 151 consecutive patients who underwent cine-CMR (steady-state free precession). Phase contrast imaging was used to independently measure stroke volume. RESULTS: LV-METRIC was successful in all cases. Mean LVEF was within 1 point of MT (Delta 0.6 +/- 2.3%, P < 0.05), with smaller differences among patients with (0.5 +/- 2.5%) versus those without (0.9 +/- 2.3%; P = 0.01) advanced systolic dysfunction (LVEF Asunto(s)
Ventrículos Cardíacos/patología
, Imagen por Resonancia Magnética/métodos
, Miocardio/patología
, Función Ventricular Izquierda
, Adulto
, Anciano
, Algoritmos
, Automatización
, Femenino
, Ventrículos Cardíacos/anatomía & histología
, Humanos
, Procesamiento de Imagen Asistido por Computador
, Masculino
, Persona de Mediana Edad
, Reproducibilidad de los Resultados
RESUMEN
OBJECTIVES: To examine relationships between severity of echocardiography (echo) -evidenced diastolic dysfunction (DD) and volumetric filling by automated processing of routine cine cardiovascular magnetic resonance (CMR). BACKGROUND: Cine-CMR provides high-resolution assessment of left ventricular (LV) chamber volumes. Automated segmentation (LV-METRIC) yields LV filling curves by segmenting all short-axis images across all temporal phases. This study used cine-CMR to assess filling changes that occur with progressive DD. METHODS: 115 post-MI patients underwent CMR and echo within 1 day. LV-METRIC yielded multiple diastolic indices - E:A ratio, peak filling rate (PFR), time to peak filling rate (TPFR), and diastolic volume recovery (DVR80 - proportion of diastole required to recover 80% stroke volume). Echo was the reference for DD. RESULTS: LV-METRIC successfully generated LV filling curves in all patients. CMR indices were reproducible (< or = 1% inter-reader differences) and required minimal processing time (175 +/- 34 images/exam, 2:09 +/- 0:51 minutes). CMR E:A ratio decreased with grade 1 and increased with grades 2-3 DD. Diastolic filling intervals, measured by DVR80 or TPFR, prolonged with grade 1 and shortened with grade 3 DD, paralleling echo deceleration time (p < 0.001). PFR by CMR increased with DD grade, similar to E/e' (p < 0.001). Prolonged DVR80 identified 71% of patients with echo-evidenced grade 1 but no patients with grade 3 DD, and stroke-volume adjusted PFR identified 67% with grade 3 but none with grade 1 DD (matched specificity = 83%). The combination of DVR80 and PFR identified 53% of patients with grade 2 DD. Prolonged DVR80 was associated with grade 1 (OR 2.79, CI 1.65-4.05, p = 0.001) with a similar trend for grade 2 (OR 1.35, CI 0.98-1.74, p = 0.06), whereas high PFR was associated with grade 3 (OR 1.14, CI 1.02-1.25, p = 0.02) DD. CONCLUSIONS: Automated cine-CMR segmentation can discern LV filling changes that occur with increasing severity of echo-evidenced DD. Impaired relaxation is associated with prolonged filling intervals whereas restrictive filling is characterized by increased filling rates.
Asunto(s)
Imagen por Resonancia Cinemagnética , Infarto del Miocardio/complicaciones , Disfunción Ventricular Izquierda/diagnóstico , Disfunción Ventricular Izquierda/fisiopatología , Anciano , Automatización , Diástole , Femenino , Humanos , Masculino , Persona de Mediana Edad , Infarto del Miocardio/fisiopatología , Índice de Severidad de la Enfermedad , Disfunción Ventricular Izquierda/etiologíaRESUMEN
The rapid increase in telemedicine coupled with recent advances in diagnostic artificial intelligence (AI) create the imperative to consider the opportunities and risks of inserting AI-based support into new paradigms of care. Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows. We find that good quality AI-based support of clinical decision-making improves diagnostic accuracy over that of either AI or physicians alone, and that the least experienced clinicians gain the most from AI-based support. We further find that AI-based multiclass probabilities outperformed content-based image retrieval (CBIR) representations of AI in the mobile technology environment, and AI-based support had utility in simulations of second opinions and of telemedicine triage. In addition to demonstrating the potential benefits associated with good quality AI in the hands of non-expert clinicians, we find that faulty AI can mislead the entire spectrum of clinicians, including experts. Lastly, we show that insights derived from AI class-activation maps can inform improvements in human diagnosis. Together, our approach and findings offer a framework for future studies across the spectrum of image-based diagnostics to improve human-computer collaboration in clinical practice.
Asunto(s)
Inteligencia Artificial , Neoplasias Cutáneas/diagnóstico por imagen , Telemedicina , Interfaz Usuario-Computador , Toma de Decisiones Clínicas , Humanos , Redes Neurales de la Computación , Médicos , Neoplasias Cutáneas/patologíaRESUMEN
Dermoscopy is a non-invasive skin imaging technique that permits visualization of features of pigmented melanocytic neoplasms that are not discernable by examination with the naked eye. While studies on the automated analysis of dermoscopy images date back to the late 1990s, because of various factors (lack of publicly available datasets, open-source software, computational power, etc.), the field progressed rather slowly in its first two decades. With the release of a large public dataset by the International Skin Imaging Collaboration in 2016, development of open-source software for convolutional neural networks, and the availability of inexpensive graphics processing units, dermoscopy image analysis has recently become a very active research field. In this paper, we present a brief overview of this exciting subfield of medical image analysis, primarily focusing on three aspects of it, namely, segmentation, feature extraction, and classification. We then provide future directions for researchers.
Asunto(s)
Dermoscopía , Interpretación de Imagen Asistida por Computador , Humanos , Melanoma/diagnóstico por imagen , Neoplasias Cutáneas/diagnóstico por imagenRESUMEN
In the past decade, machine learning and artificial intelligence have made significant advancements in pattern analysis, including speech and natural language processing, image recognition, object detection, facial recognition, and action categorization. Indeed, in many of these applications, accuracy has reached or exceeded human levels of performance. Subsequently, a multitude of studies have begun to examine the application of these technologies to health care, and in particular, medical image analysis. Perhaps the most difficult subdomain involves skin imaging because of the lack of standards around imaging hardware, technique, color, and lighting conditions. In addition, unlike radiological images, skin image appearance can be significantly affected by skin tone as well as the broad range of diseases. Furthermore, automated algorithm development relies on large high-quality annotated image data sets that incorporate the breadth of this circumstantial and diagnostic variety. These issues, in combination with unique complexities regarding integrating artificial intelligence systems into a clinical workflow, have led to difficulty in using these systems to improve sensitivity and specificity of skin diagnostics in health care networks around the world. In this article, we summarize recent advancements in machine learning, with a focused perspective on the role of public challenges and data sets on the progression of these technologies in skin imaging. In addition, we highlight the remaining hurdles toward effective implementation of technologies to the clinical workflow and discuss how public challenges and data sets can catalyze the development of solutions.
Asunto(s)
Algoritmos , Inteligencia Artificial , Benchmarking , Dermatología , Humanos , Aprendizaje AutomáticoRESUMEN
UNLABELLED: This retrospective analysis of existing patient data had institutional review board approval and was performed in compliance with HIPAA. No informed consent was required. The purpose of the study was to develop and validate an algorithm for automated segmentation of the left ventricular (LV) cavity that accounts for papillary and/or trabecular muscles and partial voxels in cine magnetic resonance (MR) images, an algorithm called LV Myocardial Effusion Threshold Reduction with Intravoxel Computation (LV-METRIC). The algorithm was validated in biologic phantoms, and its results were compared with those of manual tracing, as well as those of a commercial automated segmentation software (MASS [MR Analytical Software System]), in 38 subjects. LV-METRIC accuracy in vitro was 98.7%. Among the 38 subjects studied, LV-METRIC and MASS ejection fraction estimations were highly correlated with manual tracing (R(2) = 0.97 and R(2) = 0.95, respectively). Ventricular volume estimations were smaller with LV-METRIC and larger with MASS than those calculated by using manual tracing, though all results were well correlated (R(2) = 0.99). LV-METRIC volume measurements without partial voxel interpolation were statistically equivalent to manual tracing results (P > .05). LV-METRIC had reduced intraobserver and interobserver variability compared with other methods. MASS required additional manual intervention in 58% of cases, whereas LV-METRIC required no additional corrections. LV-METRIC reliably and reproducibly measured LV volumes. SUPPLEMENTAL MATERIAL: http://radiology.rsnajnls.org/cgi/content/full/248/3/1004/DC1.
Asunto(s)
Ventrículos Cardíacos/patología , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Imagen por Resonancia Magnética/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Disfunción Ventricular Izquierda/diagnóstico , Algoritmos , Inteligencia Artificial , Femenino , Humanos , Masculino , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
BACKGROUND: Accurate quantification of left ventricular mass and ejection fraction is important for patients with left ventricular hypertrophy. Although cardiac magnetic resonance imaging has been proposed as a standard for these indices, prior studies have variably included papillary muscles and trabeculae in myocardial volume. This study investigated the contribution of papillary muscles and trabeculae to left ventricular quantification in relation to the presence and pattern of hypertrophy. METHODS: Cardiac magnetic resonance quantification was performed on patients with concentric or eccentric hypertrophy and normal controls (20 per group) using two established methods that included papillary muscles and trabeculae in myocardium (method 1) or intracavitary (method 2) volumes. RESULTS: Among all patients, papillary muscles and trabeculae accounted for 10.5% of ventricular mass, with greater contribution with left ventricular hypertrophy than normals (12.6 vs. 6.2%, P < 0.001). Papillary muscles and trabeculae mass correlated with ventricular wall mass (r = 0.53) and end-diastolic volume (r = 0.52; P < 0.001). Papillary muscles and trabeculae inclusion in myocardium (method 1) yielded smaller differences with a standard of mass quantification from linear ventricular measurements than did method 2 (P < 0.001). Method 1 in comparison with method 2 yielded differences in left ventricular mass, ejection fraction and volume in all groups, especially in patients with hypertrophy: the difference in ventricular mass index was three-fold to six-fold greater in hypertrophy than normal groups (P < 0.001). Difference in ejection fraction, greatest in concentric hypertrophy (P < 0.001), was independently related to papillary muscles and trabeculae mass, ventricular wall mass, and smaller ventricular volume (R = 0.56, P < 0.001). CONCLUSION: Established cardiac magnetic resonance methods yield differences in left ventricular quantification due to variable exclusion of papillary muscles and trabeculae from myocardium. The relative impact of papillary muscles and trabeculae exclusion on calculated mass and ejection fraction is increased among patients with hypertrophy-associated left ventricular remodeling.
Asunto(s)
Hipertrofia Ventricular Izquierda/patología , Imagen por Resonancia Magnética/métodos , Imagen por Resonancia Magnética/normas , Miocardio/patología , Músculos Papilares/patología , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Remodelación VentricularRESUMEN
This work presents the first segmentation study of both diseased and healthy skin in standard camera photographs from a clinical environment. Challenges arise from varied lighting conditions, skin types, backgrounds, and pathological states. For study, 400 clinical photographs (with skin segmentation masks) representing various pathological states of skin are retrospectively collected from a primary care network. 100 images are used for training and fine-tuning, and 300 are used for evaluation. This distribution between training and test partitions is chosen to reflect the difficulty in amassing large quantities of labeled data in this domain. A deep learning approach is used, and 3 public segmentation datasets of healthy skin are collected to study the potential benefits of pretraining. Two variants of U-Net are evaluated: U-Net and Dense Residual U-Net. We find that Dense Residual U-Nets have a 7.8% improvement in Jaccard, compared to classical U-Net architectures (0.55 vs. 0.51 Jaccard), for direct transfer, where fine-tuning data is not utilized. However, U-Net outperforms Dense Residual U-Net for both direct training (0.83 vs. 0.80) and fine-tuning (0.89 vs. 0.88). The stark performance improvement with fine-tuning compared to direct transfer and direct training emphasizes both the need for adequate representative data of diseased skin, and the utility of other publicly available data sources for this task.