Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

Video-based AI for beat-to-beat assessment of cardiac function.

Ouyang, David; He, Bryan; Ghorbani, Amirata; Yuan, Neal; Ebinger, Joseph; Langlotz, Curtis P; Heidenreich, Paul A; Harrington, Robert A; Liang, David H; Ashley, Euan A; Zou, James Y.

Nature ; 580(7802): 252-256, 2020 04.

Artículo en Inglés | MEDLINE | ID: mdl-32269341

RESUMEN

Accurate assessment of cardiac function is crucial for the diagnosis of cardiovascular disease1, screening for cardiotoxicity2 and decisions regarding the clinical management of patients with a critical illness3. However, human assessment of cardiac function focuses on a limited sampling of cardiac cycles and has considerable inter-observer variability despite years of training4,5. Here, to overcome this challenge, we present a video-based deep learning algorithm-EchoNet-Dynamic-that surpasses the performance of human experts in the critical tasks of segmenting the left ventricle, estimating ejection fraction and assessing cardiomyopathy. Trained on echocardiogram videos, our model accurately segments the left ventricle with a Dice similarity coefficient of 0.92, predicts ejection fraction with a mean absolute error of 4.1% and reliably classifies heart failure with reduced ejection fraction (area under the curve of 0.97). In an external dataset from another healthcare system, EchoNet-Dynamic predicts the ejection fraction with a mean absolute error of 6.0% and classifies heart failure with reduced ejection fraction with an area under the curve of 0.96. Prospective evaluation with repeated human measurements confirms that the model has variance that is comparable to or less than that of human experts. By leveraging information across multiple cardiac cycles, our model can rapidly identify subtle changes in ejection fraction, is more reproducible than human evaluation and lays the foundation for precise diagnosis of cardiovascular disease in real time. As a resource to promote further innovation, we also make publicly available a large dataset of 10,030 annotated echocardiogram videos.

Asunto(s)

Aprendizaje Profundo , Cardiopatías/diagnóstico , Cardiopatías/fisiopatología , Corazón/fisiología , Corazón/fisiopatología , Modelos Cardiovasculares , Grabación en Video , Fibrilación Atrial , Conjuntos de Datos como Asunto , Ecocardiografía , Insuficiencia Cardíaca/fisiopatología , Hospitales , Humanos , Estudios Prospectivos , Reproducibilidad de los Resultados , Función Ventricular Izquierda/fisiología

2.

External validation, radiological evaluation, and development of deep learning automatic lung segmentation in contrast-enhanced chest CT.

Dwivedi, Krit; Sharkey, Michael; Alabed, Samer; Langlotz, Curtis P; Swift, Andy J; Bluethgen, Christian.

Eur Radiol ; 34(4): 2727-2737, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-37775589

RESUMEN

OBJECTIVES: There is a need for CT pulmonary angiography (CTPA) lung segmentation models. Clinical translation requires radiological evaluation of model outputs, understanding of limitations, and identification of failure points. This multicentre study aims to develop an accurate CTPA lung segmentation model, with evaluation of outputs in two diverse patient cohorts with pulmonary hypertension (PH) and interstitial lung disease (ILD). METHODS: This retrospective study develops an nnU-Net-based segmentation model using data from two specialist centres (UK and USA). Model was trained (n = 37), tested (n = 12), and clinically evaluated (n = 176) on a diverse 'real-world' cohort of 225 PH patients with volumetric CTPAs. Dice score coefficient (DSC) and normalised surface distance (NSD) were used for testing. Clinical evaluation of outputs was performed by two radiologists who assessed clinical significance of errors. External validation was performed on heterogenous contrast and non-contrast scans from 28 ILD patients. RESULTS: A total of 225 PH and 28 ILD patients with diverse demographic and clinical characteristics were evaluated. Mean accuracy, DSC, and NSD scores were 0.998 (95% CI 0.9976, 0.9989), 0.990 (0.9840, 0.9962), and 0.983 (0.9686, 0.9972) respectively. There were no segmentation failures. On radiological review, 82% and 71% of internal and external cases respectively had no errors. Eighteen percent and 25% respectively had clinically insignificant errors. Peripheral atelectasis and consolidation were common causes for suboptimal segmentation. One external case (0.5%) with patulous oesophagus had a clinically significant error. CONCLUSION: State-of-the-art CTPA lung segmentation model provides accurate outputs with minimal clinical errors on evaluation across two diverse cohorts with PH and ILD. CLINICAL RELEVANCE: Clinical translation of artificial intelligence models requires radiological review and understanding of model limitations. This study develops an externally validated state-of-the-art model with robust radiological review. Intended clinical use is in techniques such as lung volume or parenchymal disease quantification. KEY POINTS: â¢ Accurate, externally validated CT pulmonary angiography (CTPA) lung segmentation model tested in two large heterogeneous clinical cohorts (pulmonary hypertension and interstitial lung disease). â¢ No segmentation failures and robust review of model outputs by radiologists found 1 (0.5%) clinically significant segmentation error. â¢ Intended clinical use of this model is a necessary step in techniques such as lung volume, parenchymal disease quantification, or pulmonary vessel analysis.

Asunto(s)

Aprendizaje Profundo , Hipertensión Pulmonar , Enfermedades Pulmonares Intersticiales , Humanos , Hipertensión Pulmonar/diagnóstico por imagen , Inteligencia Artificial , Estudios Retrospectivos , Tomografía Computarizada por Rayos X , Enfermedades Pulmonares Intersticiales/diagnóstico por imagen , Pulmón

3.

Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports.

Chambon, Pierre; Cook, Tessa S; Langlotz, Curtis P.

J Digit Imaging ; 36(1): 164-177, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36323915

RESUMEN

Building a document-level classifier for COVID-19 on radiology reports could help assist providers in their daily clinical routine, as well as create large numbers of labels for computer vision models. We have developed such a classifier by fine-tuning a BERT-like model initialized from RadBERT, its continuous pre-training on radiology reports that can be used on all radiology-related tasks. RadBERT outperforms all biomedical pre-trainings on this COVID-19 task (P<0.01) and helps our fine-tuned model achieve an 88.9 macro-averaged F1-score, when evaluated on both X-ray and CT reports. To build this model, we rely on a multi-institutional dataset re-sampled and enriched with concurrent lung diseases, helping the model to resist to distribution shifts. In addition, we explore a variety of fine-tuning and hyperparameter optimization techniques that accelerate fine-tuning convergence, stabilize performance, and improve accuracy, especially when data or computational resources are limited. Finally, we provide a set of visualization tools and explainability methods to better understand the performance of the model, and support its practical use in the clinical setting. Our approach offers a ready-to-use COVID-19 classifier and can be applied similarly to other radiology report classification tasks.

Asunto(s)

COVID-19 , Radiología , Humanos , Informe de Investigación , Procesamiento de Lenguaje Natural

4.

Implementation of Clinical Artificial Intelligence in Radiology: Who Decides and How?

Daye, Dania; Wiggins, Walter F; Lungren, Matthew P; Alkasab, Tarik; Kottler, Nina; Allen, Bibb; Roth, Christopher J; Bizzo, Bernardo C; Durniak, Kimberly; Brink, James A; Larson, David B; Dreyer, Keith J; Langlotz, Curtis P.

Radiology ; 305(3): 555-563, 2022 12.

Artículo en Inglés | MEDLINE | ID: mdl-35916673

RESUMEN

As the role of artificial intelligence (AI) in clinical practice evolves, governance structures oversee the implementation, maintenance, and monitoring of clinical AI algorithms to enhance quality, manage resources, and ensure patient safety. In this article, a framework is established for the infrastructure required for clinical AI implementation and presents a road map for governance. The road map answers four key questions: Who decides which tools to implement? What factors should be considered when assessing an application for implementation? How should applications be implemented in clinical practice? Finally, how should tools be monitored and maintained after clinical implementation? Among the many challenges for the implementation of AI in clinical practice, devising flexible governance structures that can quickly adapt to a changing environment will be essential to ensure quality patient care and practice improvement objectives.

Asunto(s)

Inteligencia Artificial , Radiología , Humanos , Radiografía , Algoritmos , Calidad de la Atención de Salud

5.

Artificial Intelligence Algorithm Improves Radiologist Performance in Skeletal Age Assessment: A Prospective Multicenter Randomized Controlled Trial.

Eng, David K; Khandwala, Nishith B; Long, Jin; Fefferman, Nancy R; Lala, Shailee V; Strubel, Naomi A; Milla, Sarah S; Filice, Ross W; Sharp, Susan E; Towbin, Alexander J; Francavilla, Michael L; Kaplan, Summer L; Ecklund, Kirsten; Prabhu, Sanjay P; Dillon, Brian J; Everist, Brian M; Anton, Christopher G; Bittman, Mark E; Dennis, Rebecca; Larson, David B; Seekins, Jayne M; Silva, Cicero T; Zandieh, Arash R; Langlotz, Curtis P; Lungren, Matthew P; Halabi, Safwan S.

Radiology ; 301(3): 692-699, 2021 12.

Artículo en Inglés | MEDLINE | ID: mdl-34581608

RESUMEN

Background Previous studies suggest that use of artificial intelligence (AI) algorithms as diagnostic aids may improve the quality of skeletal age assessment, though these studies lack evidence from clinical practice. Purpose To compare the accuracy and interpretation time of skeletal age assessment on hand radiograph examinations with and without the use of an AI algorithm as a diagnostic aid. Materials and Methods In this prospective randomized controlled trial, the accuracy of skeletal age assessment on hand radiograph examinations was performed with (n = 792) and without (n = 739) the AI algorithm as a diagnostic aid. For examinations with the AI algorithm, the radiologist was shown the AI interpretation as part of their routine clinical work and was permitted to accept or modify it. Hand radiographs were interpreted by 93 radiologists from six centers. The primary efficacy outcome was the mean absolute difference between the skeletal age dictated into the radiologists' signed report and the average interpretation of a panel of four radiologists not using a diagnostic aid. The secondary outcome was the interpretation time. A linear mixed-effects regression model with random center- and radiologist-level effects was used to compare the two experimental groups. Results Overall mean absolute difference was lower when radiologists used the AI algorithm compared with when they did not (5.36 months vs 5.95 months; P = .04). The proportions at which the absolute difference exceeded 12 months (9.3% vs 13.0%, P = .02) and 24 months (0.5% vs 1.8%, P = .02) were lower with the AI algorithm than without it. Median radiologist interpretation time was lower with the AI algorithm than without it (102 seconds vs 142 seconds, P = .001). Conclusion Use of an artificial intelligence algorithm improved skeletal age assessment accuracy and reduced interpretation times for radiologists, although differences were observed between centers. Clinical trial registration no. NCT03530098 © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Rubin in this issue.

Asunto(s)

Determinación de la Edad por el Esqueleto/métodos , Inteligencia Artificial , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía/métodos , Adolescente , Adulto , Niño , Preescolar , Femenino , Humanos , Lactante , Masculino , Estudios Prospectivos , Radiólogos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

6.

Prospective Deployment of Deep Learning in MRI: A Framework for Important Considerations, Challenges, and Recommendations for Best Practices.

Chaudhari, Akshay S; Sandino, Christopher M; Cole, Elizabeth K; Larson, David B; Gold, Garry E; Vasanawala, Shreyas S; Lungren, Matthew P; Hargreaves, Brian A; Langlotz, Curtis P.

J Magn Reson Imaging ; 54(2): 357-371, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-32830874

RESUMEN

Artificial intelligence algorithms based on principles of deep learning (DL) have made a large impact on the acquisition, reconstruction, and interpretation of MRI data. Despite the large number of retrospective studies using DL, there are fewer applications of DL in the clinic on a routine basis. To address this large translational gap, we review the recent publications to determine three major use cases that DL can have in MRI, namely, that of model-free image synthesis, model-based image reconstruction, and image or pixel-level classification. For each of these three areas, we provide a framework for important considerations that consist of appropriate model training paradigms, evaluation of model robustness, downstream clinical utility, opportunities for future advances, as well recommendations for best current practices. We draw inspiration for this framework from advances in computer vision in natural imaging as well as additional healthcare fields. We further emphasize the need for reproducibility of research studies through the sharing of datasets and software. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY STAGE: 2.

Asunto(s)

Inteligencia Artificial , Aprendizaje Profundo , Algoritmos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Redes Neurales de la Computación , Estudios Prospectivos , Reproducibilidad de los Resultados , Estudios Retrospectivos

7.

Ethics of Using and Sharing Clinical Imaging Data for Artificial Intelligence: A Proposed Framework.

Larson, David B; Magnus, David C; Lungren, Matthew P; Shah, Nigam H; Langlotz, Curtis P.

Radiology ; 295(3): 675-682, 2020 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-32208097

RESUMEN

In this article, the authors propose an ethical framework for using and sharing clinical data for the development of artificial intelligence (AI) applications. The philosophical premise is as follows: when clinical data are used to provide care, the primary purpose for acquiring the data is fulfilled. At that point, clinical data should be treated as a form of public good, to be used for the benefit of future patients. In their 2013 article, Faden et al argued that all who participate in the health care system, including patients, have a moral obligation to contribute to improving that system. The authors extend that framework to questions surrounding the secondary use of clinical data for AI applications. Specifically, the authors propose that all individuals and entities with access to clinical data become data stewards, with fiduciary (or trust) responsibilities to patients to carefully safeguard patient privacy, and to the public to ensure that the data are made widely available for the development of knowledge and tools to benefit future patients. According to this framework, the authors maintain that it is unethical for providers to "sell" clinical data to other parties by granting access to clinical data, especially under exclusive arrangements, in exchange for monetary or in-kind payments that exceed costs. The authors also propose that patient consent is not required before the data are used for secondary purposes when obtaining such consent is prohibitively costly or burdensome, as long as mechanisms are in place to ensure that ethical standards are strictly followed. Rather than debate whether patients or provider organizations "own" the data, the authors propose that clinical data are not owned at all in the traditional sense, but rather that all who interact with or control the data have an obligation to ensure that the data are used for the benefit of future patients and society.

Asunto(s)

Inteligencia Artificial/ética , Diagnóstico por Imagen/ética , Ética Médica , Difusión de la Información/ética , Humanos

8.

Human-AI Symbiosis: A Path Forward to Improve Chest Radiography and the Role of Radiologists in Patient Care.

Gefter, Warren B; Prokop, Mathias; Seo, Joon Beom; Raoof, Suhail; Langlotz, Curtis P; Hatabu, Hiroto.

Radiology ; 310(1): e232778, 2024 01.

Artículo en Inglés | MEDLINE | ID: mdl-38259206

Asunto(s)

Atención al Paciente , Simbiosis , Humanos , Radiografía , Radiólogos

9.

Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs.

Dunnmon, Jared A; Yi, Darvin; Langlotz, Curtis P; Ré, Christopher; Rubin, Daniel L; Lungren, Matthew P.

Radiology ; 290(2): 537-544, 2019 02.

Artículo en Inglés | MEDLINE | ID: mdl-30422093

RESUMEN

Purpose To assess the ability of convolutional neural networks (CNNs) to enable high-performance automated binary classification of chest radiographs. Materials and Methods In a retrospective study, 216 431 frontal chest radiographs obtained between 1998 and 2012 were procured, along with associated text reports and a prospective label from the attending radiologist. This data set was used to train CNNs to classify chest radiographs as normal or abnormal before evaluation on a held-out set of 533 images hand-labeled by expert radiologists. The effects of development set size, training set size, initialization strategy, and network architecture on end performance were assessed by using standard binary classification metrics; detailed error analysis, including visualization of CNN activations, was also performed. Results Average area under the receiver operating characteristic curve (AUC) was 0.96 for a CNN trained with 200 000 images. This AUC value was greater than that observed when the same model was trained with 2000 images (AUC = 0.84, P < .005) but was not significantly different from that observed when the model was trained with 20 000 images (AUC = 0.95, P > .05). Averaging the CNN output score with the binary prospective label yielded the best-performing classifier, with an AUC of 0.98 (P < .005). Analysis of specific radiographs revealed that the model was heavily influenced by clinically relevant spatial regions but did not reliably generalize beyond thoracic disease. Conclusion CNNs trained with a modestly sized collection of prospectively labeled chest radiographs achieved high diagnostic performance in the classification of chest radiographs as normal or abnormal; this function may be useful for automated prioritization of abnormal chest radiographs. © RSNA, 2018 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.

Asunto(s)

Redes Neurales de la Computación , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía Torácica/métodos , Femenino , Humanos , Pulmón/diagnóstico por imagen , Masculino , Curva ROC , Radiólogos , Estudios Retrospectivos

10.

A Roadmap for Foundational Research on Artificial Intelligence in Medical Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop.

Langlotz, Curtis P; Allen, Bibb; Erickson, Bradley J; Kalpathy-Cramer, Jayashree; Bigelow, Keith; Cook, Tessa S; Flanders, Adam E; Lungren, Matthew P; Mendelson, David S; Rudie, Jeffrey D; Wang, Ge; Kandarpa, Krishna.

Radiology ; 291(3): 781-791, 2019 06.

Artículo en Inglés | MEDLINE | ID: mdl-30990384

RESUMEN

Imaging research laboratories are rapidly creating machine learning systems that achieve expert human performance using open-source methods and tools. These artificial intelligence systems are being developed to improve medical image reconstruction, noise reduction, quality assurance, triage, segmentation, computer-aided detection, computer-aided classification, and radiogenomics. In August 2018, a meeting was held in Bethesda, Maryland, at the National Institutes of Health to discuss the current state of the art and knowledge gaps and to develop a roadmap for future research initiatives. Key research priorities include: 1, new image reconstruction methods that efficiently produce images suitable for human interpretation from source data; 2, automated image labeling and annotation methods, including information extraction from the imaging report, electronic phenotyping, and prospective structured image reporting; 3, new machine learning methods for clinical imaging data, such as tailored, pretrained model architectures, and federated machine learning methods; 4, machine learning methods that can explain the advice they provide to human users (so-called explainable artificial intelligence); and 5, validated methods for image de-identification and data sharing to facilitate wide availability of clinical imaging data sets. This research roadmap is intended to identify and prioritize these needs for academic research laboratories, funding agencies, professional societies, and industry.

Asunto(s)

Inteligencia Artificial , Investigación Biomédica , Diagnóstico por Imagen , Interpretación de Imagen Asistida por Computador , Algoritmos , Humanos , Aprendizaje Automático

11.

Effect of Clinical Decision Support-Generated Report Cards Versus Real-Time Alerts on Primary Care Provider Guideline Adherence for Low Back Pain Outpatient Lumbar Spine MRI Orders.

Zafar, Hanna M; Ip, Ivan K; Mills, Angela M; Raja, Ali S; Langlotz, Curtis P; Khorasani, Ramin.

AJR Am J Roentgenol ; 212(2): 386-394, 2019 02.

Artículo en Inglés | MEDLINE | ID: mdl-30476451

RESUMEN

OBJECTIVE: The purpose of this study is to determine whether the type of feedback on evidence-based guideline adherence influences adult primary care provider (PCP) lumbar spine (LS) MRI orders for low back pain (LBP). MATERIALS AND METHODS: Four types of guideline adherence feedback were tested on eight tertiary health care system outpatient PCP practices: no feedback during baseline (March 1, 2012-October 4, 2012), randomization by practice to either clinical decision support (CDS)-generated report cards comparing providers to peers only or real-time CDS alerts at order entry during intervention 1 (February 6, 2013-December 31, 2013), and both feedback types for all practices during intervention 2 (January 14, 2014-June 20, 2014, and September 4, 2014-January 21, 2015). International Classification of Disease codes identified LBP visits (excluding Medicare fee-for-service). The primary outcome of the likelihood of LS MRI order being made on the day of or 1-30 days after the outpatient LBP visit was adjusted by feedback type (none, report cards only, real-time alerts only, or both); patient age, sex, race, and insurance status; and provider sex and experience. RESULTS: Half of PCPs (54/108) remained for all three periods, conducting 9394 of 107,938 (8.7%) outpatient LBP visits. The proportion of LBP visits increased over the course of the study (p = 0.0001). In multilevel hierarchic regression, report cards resulted in a lower likelihood of LS MRI orders made the day of and 1-30 days after the visit versus baseline: 38% (p = 0.009) and 37% (p = 0.006) for report cards alone, and 27% (p = 0.020) and 27% (p = 0.016) with alerts, respectively. Real-time alerts alone did not affect MRI orders made the day of (p = 0.585) or 1-30 days after (p = 0.650) the visit. No patient or provider variables were associated with LS MRI orders being generated on the day of or 1-30 days after the LBP visit. CONCLUSION: CDS-generated evidence-based report cards can substantially reduce outpatient PCP LS MRI orders on the day of and 1-30 days after the LBP visit. Real-time CDS alerts do not.

Asunto(s)

Atención Ambulatoria , Toma de Decisiones Clínicas/métodos , Sistemas de Apoyo a Decisiones Clínicas , Adhesión a Directriz/estadística & datos numéricos , Dolor de la Región Lumbar/diagnóstico por imagen , Imagen por Resonancia Magnética/estadística & datos numéricos , Pautas de la Práctica en Medicina/estadística & datos numéricos , Prescripciones/estadística & datos numéricos , Atención Primaria de Salud , Columna Vertebral/diagnóstico por imagen , Sistemas de Computación , Retroalimentación , Femenino , Humanos , Masculino , Persona de Mediana Edad

12.

Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet.

Bien, Nicholas; Rajpurkar, Pranav; Ball, Robyn L; Irvin, Jeremy; Park, Allison; Jones, Erik; Bereket, Michael; Patel, Bhavik N; Yeom, Kristen W; Shpanskaya, Katie; Halabi, Safwan; Zucker, Evan; Fanton, Gary; Amanatullah, Derek F; Beaulieu, Christopher F; Riley, Geoffrey M; Stewart, Russell J; Blankenberg, Francis G; Larson, David B; Jones, Ricky H; Langlotz, Curtis P; Ng, Andrew Y; Lungren, Matthew P.

PLoS Med ; 15(11): e1002699, 2018 11.

Artículo en Inglés | MEDLINE | ID: mdl-30481176

RESUMEN

BACKGROUND: Magnetic resonance imaging (MRI) of the knee is the preferred method for diagnosing knee injuries. However, interpretation of knee MRI is time-intensive and subject to diagnostic error and variability. An automated system for interpreting knee MRI could prioritize high-risk patients and assist clinicians in making diagnoses. Deep learning methods, in being able to automatically learn layers of features, are well suited for modeling the complex relationships between medical images and their interpretations. In this study we developed a deep learning model for detecting general abnormalities and specific diagnoses (anterior cruciate ligament [ACL] tears and meniscal tears) on knee MRI exams. We then measured the effect of providing the model's predictions to clinical experts during interpretation. METHODS AND FINDINGS: Our dataset consisted of 1,370 knee MRI exams performed at Stanford University Medical Center between January 1, 2001, and December 31, 2012 (mean age 38.0 years; 569 [41.5%] female patients). The majority vote of 3 musculoskeletal radiologists established reference standard labels on an internal validation set of 120 exams. We developed MRNet, a convolutional neural network for classifying MRI series and combined predictions from 3 series per exam using logistic regression. In detecting abnormalities, ACL tears, and meniscal tears, this model achieved area under the receiver operating characteristic curve (AUC) values of 0.937 (95% CI 0.895, 0.980), 0.965 (95% CI 0.938, 0.993), and 0.847 (95% CI 0.780, 0.914), respectively, on the internal validation set. We also obtained a public dataset of 917 exams with sagittal T1-weighted series and labels for ACL injury from Clinical Hospital Centre Rijeka, Croatia. On the external validation set of 183 exams, the MRNet trained on Stanford sagittal T2-weighted series achieved an AUC of 0.824 (95% CI 0.757, 0.892) in the detection of ACL injuries with no additional training, while an MRNet trained on the rest of the external data achieved an AUC of 0.911 (95% CI 0.864, 0.958). We additionally measured the specificity, sensitivity, and accuracy of 9 clinical experts (7 board-certified general radiologists and 2 orthopedic surgeons) on the internal validation set both with and without model assistance. Using a 2-sided Pearson's chi-squared test with adjustment for multiple comparisons, we found no significant differences between the performance of the model and that of unassisted general radiologists in detecting abnormalities. General radiologists achieved significantly higher sensitivity in detecting ACL tears (p-value = 0.002; q-value = 0.019) and significantly higher specificity in detecting meniscal tears (p-value = 0.003; q-value = 0.019). Using a 1-tailed t test on the change in performance metrics, we found that providing model predictions significantly increased clinical experts' specificity in identifying ACL tears (p-value < 0.001; q-value = 0.006). The primary limitations of our study include lack of surgical ground truth and the small size of the panel of clinical experts. CONCLUSIONS: Our deep learning model can rapidly generate accurate clinical pathology classifications of knee MRI exams from both internal and external datasets. Moreover, our results support the assertion that deep learning models can improve the performance of clinical experts during medical imaging interpretation. Further research is needed to validate the model prospectively and to determine its utility in the clinical setting.

Asunto(s)

Lesiones del Ligamento Cruzado Anterior/diagnóstico por imagen , Aprendizaje Profundo , Diagnóstico por Computador/métodos , Interpretación de Imagen Asistida por Computador/métodos , Rodilla/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Lesiones de Menisco Tibial/diagnóstico por imagen , Adulto , Automatización , Bases de Datos Factuales , Femenino , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Estudios Retrospectivos , Adulto Joven

13.

Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.

Rajpurkar, Pranav; Irvin, Jeremy; Ball, Robyn L; Zhu, Kaylie; Yang, Brandon; Mehta, Hershel; Duan, Tony; Ding, Daisy; Bagul, Aarti; Langlotz, Curtis P; Patel, Bhavik N; Yeom, Kristen W; Shpanskaya, Katie; Blankenberg, Francis G; Seekins, Jayne; Amrhein, Timothy J; Mong, David A; Halabi, Safwan S; Zucker, Evan J; Ng, Andrew Y; Lungren, Matthew P.

PLoS Med ; 15(11): e1002686, 2018 11.

Artículo en Inglés | MEDLINE | ID: mdl-30457988

RESUMEN

BACKGROUND: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists. METHODS AND FINDINGS: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution. CONCLUSIONS: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.

Asunto(s)

Competencia Clínica , Aprendizaje Profundo , Diagnóstico por Computador/métodos , Neumonía/diagnóstico por imagen , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía Torácica/métodos , Radiólogos , Humanos , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Estudios Retrospectivos

14.

The Future of AI and Informatics in Radiology: 10 Predictions.

Langlotz, Curtis P.

Radiology ; 309(1): e231114, 2023 10.

Artículo en Inglés | MEDLINE | ID: mdl-37874234

Asunto(s)

Radiología , Humanos , Radiografía , Informática

15.

Performance of a Deep-Learning Neural Network Model in Assessing Skeletal Maturity on Pediatric Hand Radiographs.

Larson, David B; Chen, Matthew C; Lungren, Matthew P; Halabi, Safwan S; Stence, Nicholas V; Langlotz, Curtis P.

Radiology ; 287(1): 313-322, 2018 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-29095675

RESUMEN

Purpose To compare the performance of a deep-learning bone age assessment model based on hand radiographs with that of expert radiologists and that of existing automated models. Materials and Methods The institutional review board approved the study. A total of 14 036 clinical hand radiographs and corresponding reports were obtained from two children's hospitals to train and validate the model. For the first test set, composed of 200 examinations, the mean of bone age estimates from the clinical report and three additional human reviewers was used as the reference standard. Overall model performance was assessed by comparing the root mean square (RMS) and mean absolute difference (MAD) between the model estimates and the reference standard bone ages. Ninety-five percent limits of agreement were calculated in a pairwise fashion for all reviewers and the model. The RMS of a second test set composed of 913 examinations from the publicly available Digital Hand Atlas was compared with published reports of an existing automated model. Results The mean difference between bone age estimates of the model and of the reviewers was 0 years, with a mean RMS and MAD of 0.63 and 0.50 years, respectively. The estimates of the model, the clinical report, and the three reviewers were within the 95% limits of agreement. RMS for the Digital Hand Atlas data set was 0.73 years, compared with 0.61 years of a previously reported model. Conclusion A deep-learning convolutional neural network model can estimate skeletal maturity with accuracy similar to that of an expert radiologist and to that of existing automated models. © RSNA, 2017 An earlier incorrect version of this article appeared online. This article was corrected on January 19, 2018.

Asunto(s)

Determinación de la Edad por el Esqueleto/métodos , Mano/anatomía & histología , Aprendizaje Automático , Redes Neurales de la Computación , Radiografía/métodos , Adolescente , Adulto , Niño , Preescolar , Femenino , Mano/diagnóstico por imagen , Humanos , Lactante , Masculino , Adulto Joven

16.

Deep Learning to Classify Radiology Free-Text Reports.

Chen, Matthew C; Ball, Robyn L; Yang, Lingyao; Moradzadeh, Nathaniel; Chapman, Brian E; Larson, David B; Langlotz, Curtis P; Amrhein, Timothy J; Lungren, Matthew P.

Radiology ; 286(3): 845-852, 2018 03.

Artículo en Inglés | MEDLINE | ID: mdl-29135365

RESUMEN

Purpose To evaluate the performance of a deep learning convolutional neural network (CNN) model compared with a traditional natural language processing (NLP) model in extracting pulmonary embolism (PE) findings from thoracic computed tomography (CT) reports from two institutions. Materials and Methods Contrast material-enhanced CT examinations of the chest performed between January 1, 1998, and January 1, 2016, were selected. Annotations by two human radiologists were made for three categories: the presence, chronicity, and location of PE. Classification of performance of a CNN model with an unsupervised learning algorithm for obtaining vector representations of words was compared with the open-source application PeFinder. Sensitivity, specificity, accuracy, and F1 scores for both the CNN model and PeFinder in the internal and external validation sets were determined. Results The CNN model demonstrated an accuracy of 99% and an area under the curve value of 0.97. For internal validation report data, the CNN model had a statistically significant larger F1 score (0.938) than did PeFinder (0.867) when classifying findings as either PE positive or PE negative, but no significant difference in sensitivity, specificity, or accuracy was found. For external validation report data, no statistical difference between the performance of the CNN model and PeFinder was found. Conclusion A deep learning CNN model can classify radiology free-text reports with accuracy equivalent to or beyond that of an existing traditional NLP model. © RSNA, 2017 Online supplemental material is available for this article.

Asunto(s)

Aprendizaje Automático , Redes Neurales de la Computación , Embolia Pulmonar/diagnóstico por imagen , Algoritmos , Humanos , Procesamiento de Lenguaje Natural , Curva ROC , Radiografía Torácica/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Tomografía Computarizada por Rayos X/métodos

17.

Truth and Transformation: RSNA's Journey Toward Equity.

Langlotz, Curtis P; Mauro, Matthew A; Mahmood, Umar; Klein, Jeffrey S; Meltzer, Carolyn C; Bhalla, Sanjeev; Heller, Richard E; Scott, Jinel A; Flanders, Adam E; Pandharipande, Pari V.

Radiology ; 307(3): e239008, 2023 05.

Artículo en Inglés | MEDLINE | ID: mdl-36862088

Asunto(s)

Sistemas de Información Radiológica , Radiología , Humanos

18.

Truth and Transformation: RSNA's Journey Toward Equity.

Langlotz, Curtis P; Mauro, Matthew A; Mahmood, Umar; Klein, Jeffrey S; Meltzer, Carolyn C; Bhalla, Sanjeev; Heller, Richard E; Scott, Jinel A; Flanders, Adam E; Pandharipande, Pari V.

Radiographics ; 43(4): e239005, 2023 04.

Artículo en Inglés | MEDLINE | ID: mdl-36862085

19.

Clinical decision support increases diagnostic yield of computed tomography for suspected pulmonary embolism.

Mills, Angela M; Ip, Ivan K; Langlotz, Curtis P; Raja, Ali S; Zafar, Hanna M; Khorasani, Ramin.

Am J Emerg Med ; 36(4): 540-544, 2018 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-28970024

RESUMEN

OBJECTIVE: Determine effects of evidence-based clinical decision support (CDS) on the use and yield of computed tomographic pulmonary angiography for suspected pulmonary embolism (CTPE) in Emergency Department (ED) patients. METHODS: This multi-site prospective quality improvement intervention conducted in three urban EDs used a pre/post design. For ED patients aged 18+years with suspected PE, CTPE use and yield were compared 19months pre- and 32months post-implementation of CDS intervention based on the Wells criteria, provided at the time of CTPE order, deployed in April 2012. Primary outcome was the yield (percentage of studies positive for acute PE). Secondary outcome was utilization (number of studies/100 ED visits) of CTPE. Chi-square and statistical process control chart assessed pre- and post-intervention differences. An interrupted time series analysis was also performed. RESULTS: Of 558,795 patients presenting October 2010-December 2014, 7987 (1.4%) underwent CTPE (mean age 52±17.5years, 66% female, 60.1% black); 34.7% of patients presented pre- and 65.3% post-CDS implementation. Overall CTPE diagnostic yield was 9.8% (779/7987 studies positive for PE). Yield increased a relative 30.8% after CDS implementation (8.1% vs. 10.6%; p=0.0003). There was no statistically significant change in CTPE utilization (1.4% pre- vs. 1.4% post-implementation; p=0.25). A statistical process control chart demonstrated immediate and sustained improvement in CTPE yield post-implementation. Interrupted time series analysis demonstrated the slope of PE findings versus time to be unchanged before and after the intervention (p=0.9). However, there was a trend that the intervention was associated with a 50% increased probability of PE finding (p=0.08), suggesting an immediate rather than gradual change after the intervention. CONCLUSIONS: Implementing evidence-based CDS in the ED was associated with an immediate, significant and sustained increase in CTPE yield without a measurable decrease in CTPE utilization. Further studies will be needed to assess whether stronger interventions could further improve appropriate use of CTPE.

Asunto(s)

Angiografía por Tomografía Computarizada , Sistemas de Apoyo a Decisiones Clínicas/normas , Embolia Pulmonar/diagnóstico por imagen , Adulto , Anciano , Servicio de Urgencia en Hospital , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Mejoramiento de la Calidad

20.

Using Natural Language Processing of Free-Text Radiology Reports to Identify Type 1 Modic Endplate Changes.

Huhdanpaa, Hannu T; Tan, W Katherine; Rundell, Sean D; Suri, Pradeep; Chokshi, Falgun H; Comstock, Bryan A; Heagerty, Patrick J; James, Kathryn T; Avins, Andrew L; Nedeljkovic, Srdjan S; Nerenz, David R; Kallmes, David F; Luetmer, Patrick H; Sherman, Karen J; Organ, Nancy L; Griffith, Brent; Langlotz, Curtis P; Carrell, David; Hassanpour, Saeed; Jarvik, Jeffrey G.

J Digit Imaging ; 31(1): 84-90, 2018 02.

Artículo en Inglés | MEDLINE | ID: mdl-28808792

RESUMEN

Electronic medical record (EMR) systems provide easy access to radiology reports and offer great potential to support quality improvement efforts and clinical research. Harnessing the full potential of the EMR requires scalable approaches such as natural language processing (NLP) to convert text into variables used for evaluation or analysis. Our goal was to determine the feasibility of using NLP to identify patients with Type 1 Modic endplate changes using clinical reports of magnetic resonance (MR) imaging examinations of the spine. Identifying patients with Type 1 Modic change who may be eligible for clinical trials is important as these findings may be important targets for intervention. Four annotators identified all reports that contained Type 1 Modic change, using N = 458 randomly selected lumbar spine MR reports. We then implemented a rule-based NLP algorithm in Java using regular expressions. The prevalence of Type 1 Modic change in the annotated dataset was 10%. Results were recall (sensitivity) 35/50 = 0.70 (95% confidence interval (C.I.) 0.52-0.82), specificity 404/408 = 0.99 (0.97-1.0), precision (positive predictive value) 35/39 = 0.90 (0.75-0.97), negative predictive value 404/419 = 0.96 (0.94-0.98), and F1-score 0.79 (0.43-1.0). Our evaluation shows the efficacy of rule-based NLP approach for identifying patients with Type 1 Modic change if the emphasis is on identifying only relevant cases with low concern regarding false negatives. As expected, our results show that specificity is higher than recall. This is due to the inherent difficulty of eliciting all possible keywords given the enormous variability of lumbar spine reporting, which decreases recall, while availability of good negation algorithms improves specificity.

Asunto(s)

Dolor de la Región Lumbar/patología , Vértebras Lumbares/diagnóstico por imagen , Vértebras Lumbares/patología , Imagen por Resonancia Magnética/métodos , Procesamiento de Lenguaje Natural , Informe de Investigación , Humanos , Estudios Prospectivos , Radiología , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA