Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 112
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 580(7802): 252-256, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32269341

RESUMEN

Accurate assessment of cardiac function is crucial for the diagnosis of cardiovascular disease1, screening for cardiotoxicity2 and decisions regarding the clinical management of patients with a critical illness3. However, human assessment of cardiac function focuses on a limited sampling of cardiac cycles and has considerable inter-observer variability despite years of training4,5. Here, to overcome this challenge, we present a video-based deep learning algorithm-EchoNet-Dynamic-that surpasses the performance of human experts in the critical tasks of segmenting the left ventricle, estimating ejection fraction and assessing cardiomyopathy. Trained on echocardiogram videos, our model accurately segments the left ventricle with a Dice similarity coefficient of 0.92, predicts ejection fraction with a mean absolute error of 4.1% and reliably classifies heart failure with reduced ejection fraction (area under the curve of 0.97). In an external dataset from another healthcare system, EchoNet-Dynamic predicts the ejection fraction with a mean absolute error of 6.0% and classifies heart failure with reduced ejection fraction with an area under the curve of 0.96. Prospective evaluation with repeated human measurements confirms that the model has variance that is comparable to or less than that of human experts. By leveraging information across multiple cardiac cycles, our model can rapidly identify subtle changes in ejection fraction, is more reproducible than human evaluation and lays the foundation for precise diagnosis of cardiovascular disease in real time. As a resource to promote further innovation, we also make publicly available a large dataset of 10,030 annotated echocardiogram videos.


Asunto(s)
Aprendizaje Profundo , Cardiopatías/diagnóstico , Cardiopatías/fisiopatología , Corazón/fisiología , Corazón/fisiopatología , Modelos Cardiovasculares , Grabación en Video , Fibrilación Atrial , Conjuntos de Datos como Asunto , Ecocardiografía , Insuficiencia Cardíaca/fisiopatología , Hospitales , Humanos , Estudios Prospectivos , Reproducibilidad de los Resultados , Función Ventricular Izquierda/fisiología
2.
Eur Radiol ; 34(4): 2727-2737, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37775589

RESUMEN

OBJECTIVES: There is a need for CT pulmonary angiography (CTPA) lung segmentation models. Clinical translation requires radiological evaluation of model outputs, understanding of limitations, and identification of failure points. This multicentre study aims to develop an accurate CTPA lung segmentation model, with evaluation of outputs in two diverse patient cohorts with pulmonary hypertension (PH) and interstitial lung disease (ILD). METHODS: This retrospective study develops an nnU-Net-based segmentation model using data from two specialist centres (UK and USA). Model was trained (n = 37), tested (n = 12), and clinically evaluated (n = 176) on a diverse 'real-world' cohort of 225 PH patients with volumetric CTPAs. Dice score coefficient (DSC) and normalised surface distance (NSD) were used for testing. Clinical evaluation of outputs was performed by two radiologists who assessed clinical significance of errors. External validation was performed on heterogenous contrast and non-contrast scans from 28 ILD patients. RESULTS: A total of 225 PH and 28 ILD patients with diverse demographic and clinical characteristics were evaluated. Mean accuracy, DSC, and NSD scores were 0.998 (95% CI 0.9976, 0.9989), 0.990 (0.9840, 0.9962), and 0.983 (0.9686, 0.9972) respectively. There were no segmentation failures. On radiological review, 82% and 71% of internal and external cases respectively had no errors. Eighteen percent and 25% respectively had clinically insignificant errors. Peripheral atelectasis and consolidation were common causes for suboptimal segmentation. One external case (0.5%) with patulous oesophagus had a clinically significant error. CONCLUSION: State-of-the-art CTPA lung segmentation model provides accurate outputs with minimal clinical errors on evaluation across two diverse cohorts with PH and ILD. CLINICAL RELEVANCE: Clinical translation of artificial intelligence models requires radiological review and understanding of model limitations. This study develops an externally validated state-of-the-art model with robust radiological review. Intended clinical use is in techniques such as lung volume or parenchymal disease quantification. KEY POINTS: • Accurate, externally validated CT pulmonary angiography (CTPA) lung segmentation model tested in two large heterogeneous clinical cohorts (pulmonary hypertension and interstitial lung disease). • No segmentation failures and robust review of model outputs by radiologists found 1 (0.5%) clinically significant segmentation error. • Intended clinical use of this model is a necessary step in techniques such as lung volume, parenchymal disease quantification, or pulmonary vessel analysis.


Asunto(s)
Aprendizaje Profundo , Hipertensión Pulmonar , Enfermedades Pulmonares Intersticiales , Humanos , Hipertensión Pulmonar/diagnóstico por imagen , Inteligencia Artificial , Estudios Retrospectivos , Tomografía Computarizada por Rayos X , Enfermedades Pulmonares Intersticiales/diagnóstico por imagen , Pulmón
3.
J Digit Imaging ; 36(1): 164-177, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36323915

RESUMEN

Building a document-level classifier for COVID-19 on radiology reports could help assist providers in their daily clinical routine, as well as create large numbers of labels for computer vision models. We have developed such a classifier by fine-tuning a BERT-like model initialized from RadBERT, its continuous pre-training on radiology reports that can be used on all radiology-related tasks. RadBERT outperforms all biomedical pre-trainings on this COVID-19 task (P<0.01) and helps our fine-tuned model achieve an 88.9 macro-averaged F1-score, when evaluated on both X-ray and CT reports. To build this model, we rely on a multi-institutional dataset re-sampled and enriched with concurrent lung diseases, helping the model to resist to distribution shifts. In addition, we explore a variety of fine-tuning and hyperparameter optimization techniques that accelerate fine-tuning convergence, stabilize performance, and improve accuracy, especially when data or computational resources are limited. Finally, we provide a set of visualization tools and explainability methods to better understand the performance of the model, and support its practical use in the clinical setting. Our approach offers a ready-to-use COVID-19 classifier and can be applied similarly to other radiology report classification tasks.


Asunto(s)
COVID-19 , Radiología , Humanos , Informe de Investigación , Procesamiento de Lenguaje Natural
4.
Radiology ; 305(3): 555-563, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-35916673

RESUMEN

As the role of artificial intelligence (AI) in clinical practice evolves, governance structures oversee the implementation, maintenance, and monitoring of clinical AI algorithms to enhance quality, manage resources, and ensure patient safety. In this article, a framework is established for the infrastructure required for clinical AI implementation and presents a road map for governance. The road map answers four key questions: Who decides which tools to implement? What factors should be considered when assessing an application for implementation? How should applications be implemented in clinical practice? Finally, how should tools be monitored and maintained after clinical implementation? Among the many challenges for the implementation of AI in clinical practice, devising flexible governance structures that can quickly adapt to a changing environment will be essential to ensure quality patient care and practice improvement objectives.


Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Radiografía , Algoritmos , Calidad de la Atención de Salud
5.
Radiology ; 301(3): 692-699, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34581608

RESUMEN

Background Previous studies suggest that use of artificial intelligence (AI) algorithms as diagnostic aids may improve the quality of skeletal age assessment, though these studies lack evidence from clinical practice. Purpose To compare the accuracy and interpretation time of skeletal age assessment on hand radiograph examinations with and without the use of an AI algorithm as a diagnostic aid. Materials and Methods In this prospective randomized controlled trial, the accuracy of skeletal age assessment on hand radiograph examinations was performed with (n = 792) and without (n = 739) the AI algorithm as a diagnostic aid. For examinations with the AI algorithm, the radiologist was shown the AI interpretation as part of their routine clinical work and was permitted to accept or modify it. Hand radiographs were interpreted by 93 radiologists from six centers. The primary efficacy outcome was the mean absolute difference between the skeletal age dictated into the radiologists' signed report and the average interpretation of a panel of four radiologists not using a diagnostic aid. The secondary outcome was the interpretation time. A linear mixed-effects regression model with random center- and radiologist-level effects was used to compare the two experimental groups. Results Overall mean absolute difference was lower when radiologists used the AI algorithm compared with when they did not (5.36 months vs 5.95 months; P = .04). The proportions at which the absolute difference exceeded 12 months (9.3% vs 13.0%, P = .02) and 24 months (0.5% vs 1.8%, P = .02) were lower with the AI algorithm than without it. Median radiologist interpretation time was lower with the AI algorithm than without it (102 seconds vs 142 seconds, P = .001). Conclusion Use of an artificial intelligence algorithm improved skeletal age assessment accuracy and reduced interpretation times for radiologists, although differences were observed between centers. Clinical trial registration no. NCT03530098 © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Rubin in this issue.


Asunto(s)
Determinación de la Edad por el Esqueleto/métodos , Inteligencia Artificial , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía/métodos , Adolescente , Adulto , Niño , Preescolar , Femenino , Humanos , Lactante , Masculino , Estudios Prospectivos , Radiólogos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
6.
J Magn Reson Imaging ; 54(2): 357-371, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-32830874

RESUMEN

Artificial intelligence algorithms based on principles of deep learning (DL) have made a large impact on the acquisition, reconstruction, and interpretation of MRI data. Despite the large number of retrospective studies using DL, there are fewer applications of DL in the clinic on a routine basis. To address this large translational gap, we review the recent publications to determine three major use cases that DL can have in MRI, namely, that of model-free image synthesis, model-based image reconstruction, and image or pixel-level classification. For each of these three areas, we provide a framework for important considerations that consist of appropriate model training paradigms, evaluation of model robustness, downstream clinical utility, opportunities for future advances, as well recommendations for best current practices. We draw inspiration for this framework from advances in computer vision in natural imaging as well as additional healthcare fields. We further emphasize the need for reproducibility of research studies through the sharing of datasets and software. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY STAGE: 2.


Asunto(s)
Inteligencia Artificial , Aprendizaje Profundo , Algoritmos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Redes Neurales de la Computación , Estudios Prospectivos , Reproducibilidad de los Resultados , Estudios Retrospectivos
7.
Radiology ; 295(3): 675-682, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-32208097

RESUMEN

In this article, the authors propose an ethical framework for using and sharing clinical data for the development of artificial intelligence (AI) applications. The philosophical premise is as follows: when clinical data are used to provide care, the primary purpose for acquiring the data is fulfilled. At that point, clinical data should be treated as a form of public good, to be used for the benefit of future patients. In their 2013 article, Faden et al argued that all who participate in the health care system, including patients, have a moral obligation to contribute to improving that system. The authors extend that framework to questions surrounding the secondary use of clinical data for AI applications. Specifically, the authors propose that all individuals and entities with access to clinical data become data stewards, with fiduciary (or trust) responsibilities to patients to carefully safeguard patient privacy, and to the public to ensure that the data are made widely available for the development of knowledge and tools to benefit future patients. According to this framework, the authors maintain that it is unethical for providers to "sell" clinical data to other parties by granting access to clinical data, especially under exclusive arrangements, in exchange for monetary or in-kind payments that exceed costs. The authors also propose that patient consent is not required before the data are used for secondary purposes when obtaining such consent is prohibitively costly or burdensome, as long as mechanisms are in place to ensure that ethical standards are strictly followed. Rather than debate whether patients or provider organizations "own" the data, the authors propose that clinical data are not owned at all in the traditional sense, but rather that all who interact with or control the data have an obligation to ensure that the data are used for the benefit of future patients and society.


Asunto(s)
Inteligencia Artificial/ética , Diagnóstico por Imagen/ética , Ética Médica , Difusión de la Información/ética , Humanos
8.
Bioinformatics ; 35(10): 1745-1752, 2019 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-30307536

RESUMEN

MOTIVATION: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases. Although recent studies explored using neural network models for BioNER to free experts from manual feature engineering, the performance remains limited by the available training data for each entity type. RESULTS: We propose a multi-task learning framework for BioNER to collectively use the training data of different types of entities and improve the performance on each of them. In experiments on 15 benchmark BioNER datasets, our multi-task model achieves substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Further analysis shows that the large performance gains come from sharing character- and word-level information among relevant biomedical entities across differently labeled corpora. AVAILABILITY AND IMPLEMENTATION: Our source code is available at https://github.com/yuzhimanhua/lm-lstm-crf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Benchmarking , Programas Informáticos
9.
Eur Radiol ; 30(6): 3576-3584, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-32064565

RESUMEN

Artificial intelligence (AI) has the potential to significantly disrupt the way radiology will be practiced in the near future, but several issues need to be resolved before AI can be widely implemented in daily practice. These include the role of the different stakeholders in the development of AI for imaging, the ethical development and use of AI in healthcare, the appropriate validation of each developed AI algorithm, the development of effective data sharing mechanisms, regulatory hurdles for the clearance of AI algorithms, and the development of AI educational resources for both practicing radiologists and radiology trainees. This paper details these issues and presents possible solutions based on discussions held at the 2019 meeting of the International Society for Strategic Studies in Radiology. KEY POINTS: • Radiologists should be aware of the different types of bias commonly encountered in AI studies, and understand their possible effects. • Methods for effective data sharing to train, validate, and test AI algorithms need to be developed. • It is essential for all radiologists to gain an understanding of the basic principles, potentials, and limits of AI.


Asunto(s)
Inteligencia Artificial , Radiología , Algoritmos , Aprendizaje Profundo , Predicción , Humanos , Difusión de la Información , Aprendizaje Automático , Radiólogos , Reproducibilidad de los Resultados , Estudios de Validación como Asunto
11.
Radiology ; 290(2): 537-544, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30422093

RESUMEN

Purpose To assess the ability of convolutional neural networks (CNNs) to enable high-performance automated binary classification of chest radiographs. Materials and Methods In a retrospective study, 216 431 frontal chest radiographs obtained between 1998 and 2012 were procured, along with associated text reports and a prospective label from the attending radiologist. This data set was used to train CNNs to classify chest radiographs as normal or abnormal before evaluation on a held-out set of 533 images hand-labeled by expert radiologists. The effects of development set size, training set size, initialization strategy, and network architecture on end performance were assessed by using standard binary classification metrics; detailed error analysis, including visualization of CNN activations, was also performed. Results Average area under the receiver operating characteristic curve (AUC) was 0.96 for a CNN trained with 200 000 images. This AUC value was greater than that observed when the same model was trained with 2000 images (AUC = 0.84, P < .005) but was not significantly different from that observed when the model was trained with 20 000 images (AUC = 0.95, P > .05). Averaging the CNN output score with the binary prospective label yielded the best-performing classifier, with an AUC of 0.98 (P < .005). Analysis of specific radiographs revealed that the model was heavily influenced by clinically relevant spatial regions but did not reliably generalize beyond thoracic disease. Conclusion CNNs trained with a modestly sized collection of prospectively labeled chest radiographs achieved high diagnostic performance in the classification of chest radiographs as normal or abnormal; this function may be useful for automated prioritization of abnormal chest radiographs. © RSNA, 2018 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.


Asunto(s)
Redes Neurales de la Computación , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía Torácica/métodos , Femenino , Humanos , Pulmón/diagnóstico por imagen , Masculino , Curva ROC , Radiólogos , Estudios Retrospectivos
12.
Radiology ; 291(3): 781-791, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30990384

RESUMEN

Imaging research laboratories are rapidly creating machine learning systems that achieve expert human performance using open-source methods and tools. These artificial intelligence systems are being developed to improve medical image reconstruction, noise reduction, quality assurance, triage, segmentation, computer-aided detection, computer-aided classification, and radiogenomics. In August 2018, a meeting was held in Bethesda, Maryland, at the National Institutes of Health to discuss the current state of the art and knowledge gaps and to develop a roadmap for future research initiatives. Key research priorities include: 1, new image reconstruction methods that efficiently produce images suitable for human interpretation from source data; 2, automated image labeling and annotation methods, including information extraction from the imaging report, electronic phenotyping, and prospective structured image reporting; 3, new machine learning methods for clinical imaging data, such as tailored, pretrained model architectures, and federated machine learning methods; 4, machine learning methods that can explain the advice they provide to human users (so-called explainable artificial intelligence); and 5, validated methods for image de-identification and data sharing to facilitate wide availability of clinical imaging data sets. This research roadmap is intended to identify and prioritize these needs for academic research laboratories, funding agencies, professional societies, and industry.


Asunto(s)
Inteligencia Artificial , Investigación Biomédica , Diagnóstico por Imagen , Interpretación de Imagen Asistida por Computador , Algoritmos , Humanos , Aprendizaje Automático
13.
AJR Am J Roentgenol ; 212(2): 386-394, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30476451

RESUMEN

OBJECTIVE: The purpose of this study is to determine whether the type of feedback on evidence-based guideline adherence influences adult primary care provider (PCP) lumbar spine (LS) MRI orders for low back pain (LBP). MATERIALS AND METHODS: Four types of guideline adherence feedback were tested on eight tertiary health care system outpatient PCP practices: no feedback during baseline (March 1, 2012-October 4, 2012), randomization by practice to either clinical decision support (CDS)-generated report cards comparing providers to peers only or real-time CDS alerts at order entry during intervention 1 (February 6, 2013-December 31, 2013), and both feedback types for all practices during intervention 2 (January 14, 2014-June 20, 2014, and September 4, 2014-January 21, 2015). International Classification of Disease codes identified LBP visits (excluding Medicare fee-for-service). The primary outcome of the likelihood of LS MRI order being made on the day of or 1-30 days after the outpatient LBP visit was adjusted by feedback type (none, report cards only, real-time alerts only, or both); patient age, sex, race, and insurance status; and provider sex and experience. RESULTS: Half of PCPs (54/108) remained for all three periods, conducting 9394 of 107,938 (8.7%) outpatient LBP visits. The proportion of LBP visits increased over the course of the study (p = 0.0001). In multilevel hierarchic regression, report cards resulted in a lower likelihood of LS MRI orders made the day of and 1-30 days after the visit versus baseline: 38% (p = 0.009) and 37% (p = 0.006) for report cards alone, and 27% (p = 0.020) and 27% (p = 0.016) with alerts, respectively. Real-time alerts alone did not affect MRI orders made the day of (p = 0.585) or 1-30 days after (p = 0.650) the visit. No patient or provider variables were associated with LS MRI orders being generated on the day of or 1-30 days after the LBP visit. CONCLUSION: CDS-generated evidence-based report cards can substantially reduce outpatient PCP LS MRI orders on the day of and 1-30 days after the LBP visit. Real-time CDS alerts do not.


Asunto(s)
Atención Ambulatoria , Toma de Decisiones Clínicas/métodos , Sistemas de Apoyo a Decisiones Clínicas , Adhesión a Directriz/estadística & datos numéricos , Dolor de la Región Lumbar/diagnóstico por imagen , Imagen por Resonancia Magnética/estadística & datos numéricos , Pautas de la Práctica en Medicina/estadística & datos numéricos , Prescripciones/estadística & datos numéricos , Atención Primaria de Salud , Columna Vertebral/diagnóstico por imagen , Sistemas de Computación , Retroalimentación , Femenino , Humanos , Masculino , Persona de Mediana Edad
14.
Radiographics ; 44(11): e249008, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39480702
15.
PLoS Med ; 15(11): e1002699, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30481176

RESUMEN

BACKGROUND: Magnetic resonance imaging (MRI) of the knee is the preferred method for diagnosing knee injuries. However, interpretation of knee MRI is time-intensive and subject to diagnostic error and variability. An automated system for interpreting knee MRI could prioritize high-risk patients and assist clinicians in making diagnoses. Deep learning methods, in being able to automatically learn layers of features, are well suited for modeling the complex relationships between medical images and their interpretations. In this study we developed a deep learning model for detecting general abnormalities and specific diagnoses (anterior cruciate ligament [ACL] tears and meniscal tears) on knee MRI exams. We then measured the effect of providing the model's predictions to clinical experts during interpretation. METHODS AND FINDINGS: Our dataset consisted of 1,370 knee MRI exams performed at Stanford University Medical Center between January 1, 2001, and December 31, 2012 (mean age 38.0 years; 569 [41.5%] female patients). The majority vote of 3 musculoskeletal radiologists established reference standard labels on an internal validation set of 120 exams. We developed MRNet, a convolutional neural network for classifying MRI series and combined predictions from 3 series per exam using logistic regression. In detecting abnormalities, ACL tears, and meniscal tears, this model achieved area under the receiver operating characteristic curve (AUC) values of 0.937 (95% CI 0.895, 0.980), 0.965 (95% CI 0.938, 0.993), and 0.847 (95% CI 0.780, 0.914), respectively, on the internal validation set. We also obtained a public dataset of 917 exams with sagittal T1-weighted series and labels for ACL injury from Clinical Hospital Centre Rijeka, Croatia. On the external validation set of 183 exams, the MRNet trained on Stanford sagittal T2-weighted series achieved an AUC of 0.824 (95% CI 0.757, 0.892) in the detection of ACL injuries with no additional training, while an MRNet trained on the rest of the external data achieved an AUC of 0.911 (95% CI 0.864, 0.958). We additionally measured the specificity, sensitivity, and accuracy of 9 clinical experts (7 board-certified general radiologists and 2 orthopedic surgeons) on the internal validation set both with and without model assistance. Using a 2-sided Pearson's chi-squared test with adjustment for multiple comparisons, we found no significant differences between the performance of the model and that of unassisted general radiologists in detecting abnormalities. General radiologists achieved significantly higher sensitivity in detecting ACL tears (p-value = 0.002; q-value = 0.019) and significantly higher specificity in detecting meniscal tears (p-value = 0.003; q-value = 0.019). Using a 1-tailed t test on the change in performance metrics, we found that providing model predictions significantly increased clinical experts' specificity in identifying ACL tears (p-value < 0.001; q-value = 0.006). The primary limitations of our study include lack of surgical ground truth and the small size of the panel of clinical experts. CONCLUSIONS: Our deep learning model can rapidly generate accurate clinical pathology classifications of knee MRI exams from both internal and external datasets. Moreover, our results support the assertion that deep learning models can improve the performance of clinical experts during medical imaging interpretation. Further research is needed to validate the model prospectively and to determine its utility in the clinical setting.


Asunto(s)
Lesiones del Ligamento Cruzado Anterior/diagnóstico por imagen , Aprendizaje Profundo , Diagnóstico por Computador/métodos , Interpretación de Imagen Asistida por Computador/métodos , Rodilla/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Lesiones de Menisco Tibial/diagnóstico por imagen , Adulto , Automatización , Bases de Datos Factuales , Femenino , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Estudios Retrospectivos , Adulto Joven
16.
PLoS Med ; 15(11): e1002686, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30457988

RESUMEN

BACKGROUND: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists. METHODS AND FINDINGS: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution. CONCLUSIONS: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.


Asunto(s)
Competencia Clínica , Aprendizaje Profundo , Diagnóstico por Computador/métodos , Neumonía/diagnóstico por imagen , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía Torácica/métodos , Radiólogos , Humanos , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Estudios Retrospectivos
17.
Radiology ; 309(1): e231114, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37874234
18.
Radiology ; 287(1): 313-322, 2018 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-29095675

RESUMEN

Purpose To compare the performance of a deep-learning bone age assessment model based on hand radiographs with that of expert radiologists and that of existing automated models. Materials and Methods The institutional review board approved the study. A total of 14 036 clinical hand radiographs and corresponding reports were obtained from two children's hospitals to train and validate the model. For the first test set, composed of 200 examinations, the mean of bone age estimates from the clinical report and three additional human reviewers was used as the reference standard. Overall model performance was assessed by comparing the root mean square (RMS) and mean absolute difference (MAD) between the model estimates and the reference standard bone ages. Ninety-five percent limits of agreement were calculated in a pairwise fashion for all reviewers and the model. The RMS of a second test set composed of 913 examinations from the publicly available Digital Hand Atlas was compared with published reports of an existing automated model. Results The mean difference between bone age estimates of the model and of the reviewers was 0 years, with a mean RMS and MAD of 0.63 and 0.50 years, respectively. The estimates of the model, the clinical report, and the three reviewers were within the 95% limits of agreement. RMS for the Digital Hand Atlas data set was 0.73 years, compared with 0.61 years of a previously reported model. Conclusion A deep-learning convolutional neural network model can estimate skeletal maturity with accuracy similar to that of an expert radiologist and to that of existing automated models. © RSNA, 2017 An earlier incorrect version of this article appeared online. This article was corrected on January 19, 2018.


Asunto(s)
Determinación de la Edad por el Esqueleto/métodos , Mano/anatomía & histología , Aprendizaje Automático , Redes Neurales de la Computación , Radiografía/métodos , Adolescente , Adulto , Niño , Preescolar , Femenino , Mano/diagnóstico por imagen , Humanos , Lactante , Masculino , Adulto Joven
19.
Radiology ; 286(3): 845-852, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29135365

RESUMEN

Purpose To evaluate the performance of a deep learning convolutional neural network (CNN) model compared with a traditional natural language processing (NLP) model in extracting pulmonary embolism (PE) findings from thoracic computed tomography (CT) reports from two institutions. Materials and Methods Contrast material-enhanced CT examinations of the chest performed between January 1, 1998, and January 1, 2016, were selected. Annotations by two human radiologists were made for three categories: the presence, chronicity, and location of PE. Classification of performance of a CNN model with an unsupervised learning algorithm for obtaining vector representations of words was compared with the open-source application PeFinder. Sensitivity, specificity, accuracy, and F1 scores for both the CNN model and PeFinder in the internal and external validation sets were determined. Results The CNN model demonstrated an accuracy of 99% and an area under the curve value of 0.97. For internal validation report data, the CNN model had a statistically significant larger F1 score (0.938) than did PeFinder (0.867) when classifying findings as either PE positive or PE negative, but no significant difference in sensitivity, specificity, or accuracy was found. For external validation report data, no statistical difference between the performance of the CNN model and PeFinder was found. Conclusion A deep learning CNN model can classify radiology free-text reports with accuracy equivalent to or beyond that of an existing traditional NLP model. © RSNA, 2017 Online supplemental material is available for this article.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Embolia Pulmonar/diagnóstico por imagen , Algoritmos , Humanos , Procesamiento de Lenguaje Natural , Curva ROC , Radiografía Torácica/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Tomografía Computarizada por Rayos X/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA