Pesquisa | Portal Regional da BVS

1.

Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs.

Zhou, Yiliang; Ong, Hanley; Kennedy, Patrick; Wu, Carol C; Kazam, Jacob; Hentel, Keith; Flanders, Adam; Shih, George; Peng, Yifan.

Radiology ; 311(2): e233270, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38713028

RESUMO

Background Generating radiologic findings from chest radiographs is pivotal in medical image analysis. The emergence of OpenAI's generative pretrained transformer, GPT-4 with vision (GPT-4V), has opened new perspectives on the potential for automated image-text pair generation. However, the application of GPT-4V to real-world chest radiography is yet to be thoroughly examined. Purpose To investigate the capability of GPT-4V to generate radiologic findings from real-world chest radiographs. Materials and Methods In this retrospective study, 100 chest radiographs with free-text radiology reports were annotated by a cohort of radiologists, two attending physicians and three residents, to establish a reference standard. Of 100 chest radiographs, 50 were randomly selected from the National Institutes of Health (NIH) chest radiographic data set, and 50 were randomly selected from the Medical Imaging and Data Resource Center (MIDRC). The performance of GPT-4V at detecting imaging findings from each chest radiograph was assessed in the zero-shot setting (where it operates without prior examples) and few-shot setting (where it operates with two examples). Its outcomes were compared with the reference standard with regards to clinical conditions and their corresponding codes in the International Statistical Classification of Diseases, Tenth Revision (ICD-10), including the anatomic location (hereafter, laterality). Results In the zero-shot setting, in the task of detecting ICD-10 codes alone, GPT-4V attained an average positive predictive value (PPV) of 12.3%, average true-positive rate (TPR) of 5.8%, and average F1 score of 7.3% on the NIH data set, and an average PPV of 25.0%, average TPR of 16.8%, and average F1 score of 18.2% on the MIDRC data set. When both the ICD-10 codes and their corresponding laterality were considered, GPT-4V produced an average PPV of 7.8%, average TPR of 3.5%, and average F1 score of 4.5% on the NIH data set, and an average PPV of 10.9%, average TPR of 4.9%, and average F1 score of 6.4% on the MIDRC data set. With few-shot learning, GPT-4V showed improved performance on both data sets. When contrasting zero-shot and few-shot learning, there were improved average TPRs and F1 scores in the few-shot setting, but there was not a substantial increase in the average PPV. Conclusion Although GPT-4V has shown promise in understanding natural images, it had limited effectiveness in interpreting real-world chest radiographs. © RSNA, 2024 Supplemental material is available for this article.

Assuntos

Radiografia Torácica , Humanos , Radiografia Torácica/métodos , Estudos Retrospectivos , Feminino , Masculino , Pessoa de Meia-Idade , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Idoso , Adulto

2.

Lessons Learned in Building Expertly Annotated Multi-Institution Datasets and Hosting the RSNA AI Challenges.

Kitamura, Felipe C; Prevedello, Luciano M; Colak, Errol; Halabi, Safwan S; Lungren, Matthew P; Ball, Robyn L; Kalpathy-Cramer, Jayashree; Kahn, Charles E; Richards, Tyler; Talbott, Jason F; Shih, George; Lin, Hui Ming; Andriole, Katherine P; Vazirabad, Maryam; Erickson, Bradley J; Flanders, Adam E; Mongan, John.

Radiol Artif Intell ; 6(3): e230227, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38477659

RESUMO

The Radiological Society of North America (RSNA) has held artificial intelligence competitions to tackle real-world medical imaging problems at least annually since 2017. This article examines the challenges and processes involved in organizing these competitions, with a specific emphasis on the creation and curation of high-quality datasets. The collection of diverse and representative medical imaging data involves dealing with issues of patient privacy and data security. Furthermore, ensuring quality and consistency in data, which includes expert labeling and accounting for various patient and imaging characteristics, necessitates substantial planning and resources. Overcoming these obstacles requires meticulous project management and adherence to strict timelines. The article also highlights the potential of crowdsourced annotation to progress medical imaging research. Through the RSNA competitions, an effective global engagement has been realized, resulting in innovative solutions to complex medical imaging problems, thus potentially transforming health care by enhancing diagnostic accuracy and patient outcomes. Keywords: Use of AI in Education, Artificial Intelligence © RSNA, 2024.

Assuntos

Inteligência Artificial , Radiologia , Humanos , Diagnóstico por Imagem/métodos , Sociedades Médicas , América do Norte

3.

Machine Learning Classification of Body Part, Imaging Axis, and Intravenous Contrast Enhancement on CT Imaging.

Li, Wuqi; Lin, Hui Ming; Lin, Amy; Napoleone, Marc; Moreland, Robert; Murari, Alexis; Stepanov, Maxim; Ivanov, Eric; Prasad, Abhinav Sanjeeva; Shih, George; Hu, Zixuan; Zulbayar, Suvd; Sejdic, Ervin; Colak, Errol.

Can Assoc Radiol J ; 75(1): 82-91, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37439250

RESUMO

Purpose: The development and evaluation of machine learning models that automatically identify the body part(s) imaged, axis of imaging, and the presence of intravenous contrast material of a CT series of images. Methods: This retrospective study included 6955 series from 1198 studies (501 female, 697 males, mean age 56.5 years) obtained between January 2010 and September 2021. Each series was annotated by a trained board-certified radiologist with labels consisting of 16 body parts, 3 imaging axes, and whether an intravenous contrast agent was used. The studies were randomly assigned to the training, validation and testing sets with a proportion of 70%, 20% and 10%, respectively, to develop a 3D deep neural network for each classification task. External validation was conducted with a total of 35,272 series from 7 publicly available datasets. The classification accuracy for each series was independently assessed for each task to evaluate model performance. Results: The accuracies for identifying the body parts, imaging axes, and the presence of intravenous contrast were 96.0% (95% CI: 94.6%, 97.2%), 99.2% (95% CI: 98.5%, 99.7%), and 97.5% (95% CI: 96.4%, 98.5%) respectively. The generalizability of the models was demonstrated through external validation with accuracies of 89.7 - 97.8%, 98.6 - 100%, and 87.8 - 98.6% for the same tasks. Conclusions: The developed models demonstrated high performance on both internal and external testing in identifying key aspects of a CT series.

Assuntos

Aprendizado Profundo , Masculino , Humanos , Feminino , Pessoa de Meia-Idade , Estudos Retrospectivos , Corpo Humano , Aprendizado de Máquina , Tomografia Computadorizada por Raios X/métodos , Meios de Contraste

4.

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge.

Holste, Gregory; Zhou, Yiliang; Wang, Song; Jaiswal, Ajay; Lin, Mingquan; Zhuge, Sherry; Yang, Yuzhe; Kim, Dongkyun; Nguyen-Mau, Trong-Hieu; Tran, Minh-Triet; Jeong, Jaehyup; Park, Wongi; Ryu, Jongbin; Hong, Feng; Verma, Arsh; Yamagishi, Yosuke; Kim, Changhyun; Seo, Hyeryeong; Kang, Myungjoo; Celi, Leo Anthony; Lu, Zhiyong; Summers, Ronald M; Shih, George; Wang, Zhangyang; Peng, Yifan.

ArXiv ; 2024 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-37986726

RESUMO

Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" - there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.

5.

Test Retest Reproducibility of Organ Volume Measurements in ADPKD Using 3D Multimodality Deep Learning.

He, Xinzi; Hu, Zhongxiu; Dev, Hreedi; Romano, Dominick J; Sharbatdaran, Arman; Raza, Syed I; Wang, Sophie J; Teichman, Kurt; Shih, George; Chevalier, James M; Shimonov, Daniil; Blumenfeld, Jon D; Goel, Akshay; Sabuncu, Mert R; Prince, Martin R.

Acad Radiol ; 31(3): 889-899, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-37798206

RESUMO

RATIONALE AND OBJECTIVES: Following autosomal dominant polycystic kidney disease (ADPKD) progression by measuring organ volumes requires low measurement variability. The objective of this study is to reduce organ volume measurement variability on MRI of ADPKD patients by utilizing all pulse sequences to obtain multiple measurements which allows outlier analysis to find errors and averaging to reduce variability. MATERIALS AND METHODS: In order to make measurements on multiple pulse sequences practical, a 3D multi-modality multi-class segmentation model based on nnU-net was trained/validated using T1, T2, SSFP, DWI and CT from 413 subjects. Reproducibility was assessed with test-re-test methodology on ADPKD subjects (n = 19) scanned twice within a 3-week interval correcting outliers and averaging the measurements across all sequences. Absolute percent differences in organ volumes were compared to paired students t-test. RESULTS: Dice similarlity coefficient > 97%, Jaccard Index > 0.94, mean surface distance < 1 mm and mean Hausdorff Distance < 2 cm for all three organs and all five sequences were found on internal (n = 25), external (n = 37) and test-re-test reproducibility assessment (38 scans in 19 subjects). When averaging volumes measured from five MRI sequences, the model automatically segmented kidneys with test-re-test reproducibility (percent absolute difference between exam 1 and exam 2) of 1.3% which was better than all five expert observers. It reliably stratified ADPKD into Mayo Imaging Classification (area under the curve=100%) compared to radiologist. CONCLUSION: 3D deep learning measures organ volumes on five MRI sequences leveraging the power of outlier analysis and averaging to achieve 1.3% total kidney test-re-test reproducibility.

Assuntos

Aprendizado Profundo , Rim Policístico Autossômico Dominante , Humanos , Rim Policístico Autossômico Dominante/diagnóstico por imagem , Tamanho do Órgão , Reprodutibilidade dos Testes , Rim/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos

6.

Improving model fairness in image-based computer-aided diagnosis.

Lin, Mingquan; Li, Tianhao; Yang, Yifan; Holste, Gregory; Ding, Ying; Van Tassel, Sarah H; Kovacs, Kyle; Shih, George; Wang, Zhangyang; Lu, Zhiyong; Wang, Fei; Peng, Yifan.

Nat Commun ; 14(1): 6261, 2023 10 06.

Artigo em Inglês | MEDLINE | ID: mdl-37803009

RESUMO

Deep learning has become a popular tool for computer-aided diagnosis using medical images, sometimes matching or exceeding the performance of clinicians. However, these models can also reflect and amplify human bias, potentially resulting inaccurate missed diagnoses. Despite this concern, the problem of improving model fairness in medical image classification by deep learning has yet to be fully studied. To address this issue, we propose an algorithm that leverages the marginal pairwise equal opportunity to reduce bias in medical image classification. Our evaluations across four tasks using four independent large-scale cohorts demonstrate that our proposed algorithm not only improves fairness in individual and intersectional subgroups but also maintains overall performance. Specifically, the relative change in pairwise fairness difference between our proposed model and the baseline model was reduced by over 35%, while the relative change in AUC value was typically within 1%. By reducing the bias generated by deep learning models, our proposed approach can potentially alleviate concerns about the fairness and reliability of image-based computer-aided diagnosis.

Assuntos

Algoritmos , Diagnóstico por Computador , Humanos , Reprodutibilidade dos Testes , Diagnóstico por Computador/métodos , Computadores

7.

How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers?

Holste, Gregory; Jiang, Ziyu; Jaiswal, Ajay; Hanna, Maria; Minkowitz, Shlomo; Legasto, Alan C; Escalon, Joanna G; Steinberger, Sharon; Bittman, Mark; Shen, Thomas C; Ding, Ying; Summers, Ronald M; Shih, George; Peng, Yifan; Wang, Zhangyang.

ArXiv ; 2023 Aug 17.

Artigo em Inglês | MEDLINE | ID: mdl-37791108

RESUMO

Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance. However, the nuanced ways in which pruning impacts model behavior are not well understood, particularly for long-tailed, multi-label datasets commonly found in clinical settings. This knowledge gap could have dangerous implications when deploying a pruned model for diagnosis, where unexpected model behavior could impact patient well-being. To fill this gap, we perform the first analysis of pruning's effect on neural networks trained to diagnose thorax diseases from chest X-rays (CXRs). On two large CXR datasets, we examine which diseases are most affected by pruning and characterize class "forgettability" based on disease frequency and co-occurrence behavior. Further, we identify individual CXRs where uncompressed and heavily pruned models disagree, known as pruning-identified exemplars (PIEs), and conduct a human reader study to evaluate their unifying qualities. We find that radiologists perceive PIEs as having more label noise, lower image quality, and higher diagnosis difficulty. This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification. All code, model weights, and data access instructions can be found at https://github.com/VITA-Group/PruneCXR.

8.

Beyond the AJR: Don't Believe Everything You Read-Nearly One-Fifth of Cleared Artificial Intelligence Devices May Have Discordant Documentation.

Shih, George; Flanders, Adam E.

AJR Am J Roentgenol ; 2023 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-37818958

9.

How Does Pruning Impact Long-Tailed Multi-label Medical Image Classifiers?

Holste, Gregory; Jiang, Ziyu; Jaiswal, Ajay; Hanna, Maria; Minkowitz, Shlomo; Legasto, Alan C; Escalon, Joanna G; Steinberger, Sharon; Bittman, Mark; Shen, Thomas C; Ding, Ying; Summers, Ronald M; Shih, George; Peng, Yifan; Wang, Zhangyang.

Med Image Comput Comput Assist Interv ; 14224: 663-673, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37829549

RESUMO

Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance. However, the nuanced ways in which pruning impacts model behavior are not well understood, particularly for long-tailed, multi-label datasets commonly found in clinical settings. This knowledge gap could have dangerous implications when deploying a pruned model for diagnosis, where unexpected model behavior could impact patient well-being. To fill this gap, we perform the first analysis of pruning's effect on neural networks trained to diagnose thorax diseases from chest X-rays (CXRs). On two large CXR datasets, we examine which diseases are most affected by pruning and characterize class "forgettability" based on disease frequency and co-occurrence behavior. Further, we identify individual CXRs where uncompressed and heavily pruned models disagree, known as pruning-identified exemplars (PIEs), and conduct a human reader study to evaluate their unifying qualities. We find that radiologists perceive PIEs as having more label noise, lower image quality, and higher diagnosis difficulty. This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification. All code, model weights, and data access instructions can be found at https://github.com/VITA-Group/PruneCXR.

10.

Clinical Quality Control of MRI Total Kidney Volume Measurements in Autosomal Dominant Polycystic Kidney Disease.

Zhu, Chenglin; Dev, Hreedi; Sharbatdaran, Arman; He, Xinzi; Shimonov, Daniil; Chevalier, James M; Blumenfeld, Jon D; Wang, Yi; Teichman, Kurt; Shih, George; Goel, Akshay; Prince, Martin R.

Tomography ; 9(4): 1341-1355, 2023 07 12.

Artigo em Inglês | MEDLINE | ID: mdl-37489475

RESUMO

Total kidney volume measured on MRI is an important biomarker for assessing the progression of autosomal dominant polycystic kidney disease and response to treatment. However, we have noticed that there can be substantial differences in the kidney volume measurements obtained from the various pulse sequences commonly included in an MRI exam. Here we examine kidney volume measurement variability among five commonly acquired MRI pulse sequences in abdominal MRI exams in 105 patients with ADPKD. Right and left kidney volumes were independently measured by three expert observers using model-assisted segmentation for axial T2, coronal T2, axial single-shot fast spin echo (SSFP), coronal SSFP, and axial 3D T1 images obtained on a single MRI from ADPKD patients. Outlier measurements were analyzed for data acquisition errors. Most of the outlier values (88%) were due to breathing during scanning causing slice misregistration with gaps or duplication of imaging slices (n = 35), slice misregistration from using multiple breath holds during acquisition (n = 25), composing of two overlapping acquisitions (n = 17), or kidneys not entirely within the field of view (n = 4). After excluding outlier measurements, the coefficient of variation among the five measurements decreased from 4.6% pre to 3.2%. Compared to the average of all sequences without errors, TKV measured on axial and coronal T2 weighted imaging were 1.2% and 1.8% greater, axial SSFP was 0.4% greater, coronal SSFP was 1.7% lower and axial T1 was 1.5% lower than the mean, indicating intrinsic measurement biases related to the different MRI contrast mechanisms. In conclusion, MRI data acquisition errors are common but can be identified using outlier analysis and excluded to improve organ volume measurement consistency. Bias toward larger volume measurements on T2 sequences and smaller volumes on axial T1 sequences can also be mitigated by averaging data from all error-free sequences acquired.

Assuntos

Rim Policístico Autossômico Dominante , Humanos , Rim , Imageamento por Ressonância Magnética , Controle de Qualidade

11.

Evaluating GPT4 on Impressions Generation in Radiology Reports.

Sun, Zhaoyi; Ong, Hanley; Kennedy, Patrick; Tang, Liyan; Chen, Shirley; Elias, Jonathan; Lucas, Eugene; Shih, George; Peng, Yifan.

Radiology ; 307(5): e231259, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37367439

Assuntos

Inteligência Artificial , Radiologia , Humanos

12.

Enhancing thoracic disease detection using chest X-rays from PubMed Central Open Access.

Lin, Mingquan; Hou, Bojian; Mishra, Swati; Yao, Tianyuan; Huo, Yuankai; Yang, Qian; Wang, Fei; Shih, George; Peng, Yifan.

Comput Biol Med ; 159: 106962, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37094464

RESUMO

Large chest X-rays (CXR) datasets have been collected to train deep learning models to detect thorax pathology on CXR. However, most CXR datasets are from single-center studies and the collected pathologies are often imbalanced. The aim of this study was to automatically construct a public, weakly-labeled CXR database from articles in PubMed Central Open Access (PMC-OA) and to assess model performance on CXR pathology classification by using this database as additional training data. Our framework includes text extraction, CXR pathology verification, subfigure separation, and image modality classification. We have extensively validated the utility of the automatically generated image database on thoracic disease detection tasks, including Hernia, Lung Lesion, Pneumonia, and pneumothorax. We pick these diseases due to their historically poor performance in existing datasets: the NIH-CXR dataset (112,120 CXR) and the MIMIC-CXR dataset (243,324 CXR). We find that classifiers fine-tuned with additional PMC-CXR extracted by the proposed framework consistently and significantly achieved better performance than those without (e.g., Hernia: 0.9335 vs 0.9154; Lung Lesion: 0.7394 vs. 0.7207; Pneumonia: 0.7074 vs. 0.6709; Pneumothorax 0.8185 vs. 0.7517, all in AUC with p< 0.0001) for CXR pathology detection. In contrast to previous approaches that manually submit the medical images to the repository, our framework can automatically collect figures and their accompanied figure legends. Compared to previous studies, the proposed framework improved subfigure segmentation and incorporates our advanced self-developed NLP technique for CXR pathology verification. We hope it complements existing resources and improves our ability to make biomedical image data findable, accessible, interoperable, and reusable.

Assuntos

Pneumonia , Pneumotórax , Doenças Torácicas , Humanos , Pneumotórax/diagnóstico por imagem , Radiografia Torácica/métodos , Raios X , Acesso à Informação , Pneumonia/diagnóstico por imagem

13.

Effect of Averaging Measurements From Multiple MRI Pulse Sequences on Kidney Volume Reproducibility in Autosomal Dominant Polycystic Kidney Disease.

Dev, Hreedi; Zhu, Chenglin; Sharbatdaran, Arman; Raza, Syed I; Wang, Sophie J; Romano, Dominick J; Goel, Akshay; Teichman, Kurt; Moghadam, Mina C; Shih, George; Blumenfeld, Jon D; Shimonov, Daniil; Chevalier, James M; Prince, Martin R.

J Magn Reson Imaging ; 58(4): 1153-1160, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-36645114

RESUMO

BACKGROUND: Total kidney volume (TKV) is an important biomarker for assessing kidney function, especially for autosomal dominant polycystic kidney disease (ADPKD). However, TKV measurements from a single MRI pulse sequence have limited reproducibility, ± ~5%, similar to ADPKD annual kidney growth rates. PURPOSE: To improve TKV measurement reproducibility on MRI by extending artificial intelligence algorithms to automatically segment kidneys on T1-weighted, T2-weighted, and steady state free precession (SSFP) sequences in axial and coronal planes and averaging measurements. STUDY TYPE: Retrospective training, prospective testing. SUBJECTS: Three hundred ninety-seven patients (356 with ADPKD, 41 without), 75% for training and 25% for validation, 40 ADPKD patients for testing and 17 ADPKD patients for assessing reproducibility. FIELD STRENGTH/SEQUENCE: T2-weighted single-shot fast spin echo (T2), SSFP, and T1-weighted 3D spoiled gradient echo (T1) at 1.5 and 3T. ASSESSMENT: 2D U-net segmentation algorithm was trained on images from all sequences. Five observers independently measured each kidney volume manually on axial T2 and using model-assisted segmentations on all sequences and image plane orientations for two MRI exams in two sessions separated by 1-3 weeks to assess reproducibility. Manual and model-assisted segmentation times were recorded. STATISTICAL TESTS: Bland-Altman, Schapiro-Wilk (normality assessment), Pearson's chi-squared (categorical variables); Dice similarity coefficient, interclass correlation coefficient, and concordance correlation coefficient for analyzing TKV reproducibility. P-value < 0.05 was considered statistically significant. RESULTS: In 17 ADPKD subjects, model-assisted segmentations of axial T2 images were significantly faster than manual segmentations (2:49 minute vs. 11:34 minute), with no significant absolute percent difference in TKV (5.9% vs. 5.3%, P = 0.88) between scans 1 and 2. Absolute percent differences between the two scans for model-assisted segmentations on other sequences were 5.5% (axial T1), 4.5% (axial SSFP), 4.1% (coronal SSFP), and 3.2% (coronal T2). Averaging measurements from all five model-assisted segmentations significantly reduced absolute percent difference to 2.5%, further improving to 2.1% after excluding an outlier. DATA CONCLUSION: Measuring TKV on multiple MRI pulse sequences in coronal and axial planes is practical with deep learning model-assisted segmentations and can improve TKV measurement reproducibility more than 2-fold in ADPKD. EVIDENCE LEVEL: 2 TECHNICAL EFFICACY: Stage 1.

Assuntos

Rim Policístico Autossômico Dominante , Humanos , Rim Policístico Autossômico Dominante/diagnóstico por imagem , Estudos Retrospectivos , Estudos Prospectivos , Reprodutibilidade dos Testes , Inteligência Artificial , Rim/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos

14.

ChatGPT and Other Large Language Models Are Double-edged Swords.

Shen, Yiqiu; Heacock, Laura; Elias, Jonathan; Hentel, Keith D; Reig, Beatriu; Shih, George; Moy, Linda.

Radiology ; 307(2): e230163, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36700838

Assuntos

Escrita Médica , Humanos , Previsões

15.

Few-Shot Learning Geometric Ensemble for Multi-label Classification of Chest X-Rays.

Moukheiber, Dana; Mahindre, Saurabh; Moukheiber, Lama; Moukheiber, Mira; Wang, Song; Ma, Chunwei; Shih, George; Peng, Yifan; Gao, Mingchen.

Data Augment Label Imperfections (2022) ; 13567: 112-122, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-36383493

RESUMO

This paper aims to identify uncommon cardiothoracic diseases and patterns on chest X-ray images. Training a machine learning model to classify rare diseases with multi-label indications is challenging without sufficient labeled training samples. Our model leverages the information from common diseases and adapts to perform on less common mentions. We propose to use multi-label few-shot learning (FSL) schemes including neighborhood component analysis loss, generating additional samples using distribution calibration and fine-tuning based on multi-label classification loss. We utilize the fact that the widely adopted nearest neighbor-based FSL schemes like ProtoNet are Voronoi diagrams in feature space. In our method, the Voronoi diagrams in the features space generated from multi-label schemes are combined into our geometric DeepVoro Multi-label ensemble. The improved performance in multi-label few-shot classification using the multi-label ensemble is demonstrated in our experiments (The code is publicly available at https://github.com/Saurabh7/Few-shot-learning-multilabel-cxray).

16.

Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study.

Holste, Gregory; Wang, Song; Jiang, Ziyu; Shen, Thomas C; Shih, George; Summers, Ronald M; Peng, Yifan; Wang, Zhangyang.

Data Augment Label Imperfections (2022) ; 13567: 22-32, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-36318048

RESUMO

Imaging exams, such as chest radiography, will yield a small set of common findings and a much larger set of uncommon findings. While a trained radiologist can learn the visual presentation of rare conditions by studying a few representative examples, teaching a machine to learn from such a "long-tailed" distribution is much more difficult, as standard methods would be easily biased toward the most frequent classes. In this paper, we present a comprehensive benchmark study of the long-tailed learning problem in the specific domain of thorax diseases on chest X-rays. We focus on learning from naturally distributed chest X-ray data, optimizing classification accuracy over not only the common "head" classes, but also the rare yet critical "tail" classes. To accomplish this, we introduce a challenging new long-tailed chest X-ray benchmark to facilitate research on developing long-tailed learning methods for medical image classification. The benchmark consists of two chest X-ray datasets for 19- and 20-way thorax disease classification, containing classes with as many as 53,000 and as few as 7 labeled training images. We evaluate both standard and state-of-the-art long-tailed learning methods on this new benchmark, analyzing which aspects of these methods are most beneficial for long-tailed medical image classification and summarizing insights for future algorithm design. The datasets, trained models, and code are available at https://github.com/VITA-Group/LongTailCXR.

17.

Radiology Text Analysis System (RadText): Architecture and Evaluation.

Wang, Song; Lin, Mingquan; Ding, Ying; Shih, George; Lu, Zhiyong; Peng, Yifan.

IEEE Int Conf Healthc Inform ; 2022: 288-296, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-36128510

RESUMO

Analyzing radiology reports is a time-consuming and error-prone task, which raises the need for an efficient automated radiology report analysis system to alleviate the workloads of radiologists and encourage precise diagnosis. In this work, we present RadText, a high-performance open-source Python radiology text analysis system. RadText offers an easy-to-use text analysis pipeline, including de-identification, section segmentation, sentence split and word tokenization, named entity recognition, parsing, and negation detection. Superior to existing widely used toolkits, RadText features a hybrid text processing schema, supports raw text processing and local processing, which enables higher accuracy, better usability and improved data privacy. RadText adopts BioC as the unified interface, and also standardizes the output into a structured representation that is compatible with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), which allows for a more systematic approach to observational research across multiple, disparate data sources. We evaluated RadText on the MIMIC-CXR dataset, with five new disease labels that we annotated for this work. RadText demonstrates highly accurate classification performances, with a 0.91 average precision, 0.94 average recall and 0.92 average F-1 score. We also annotated a test set for the five new disease labels to facilitate future research or applications. We have made our code, documentations, examples and the test set available at https://github.com/bionlplab/radtext.

18.

Deep Learning Automation of Kidney, Liver, and Spleen Segmentation for Organ Volume Measurements in Autosomal Dominant Polycystic Kidney Disease.

Sharbatdaran, Arman; Romano, Dominick; Teichman, Kurt; Dev, Hreedi; Raza, Syed I; Goel, Akshay; Moghadam, Mina C; Blumenfeld, Jon D; Chevalier, James M; Shimonov, Daniil; Shih, George; Wang, Yi; Prince, Martin R.

Tomography ; 8(4): 1804-1819, 2022 07 13.

Artigo em Inglês | MEDLINE | ID: mdl-35894017

RESUMO

Organ volume measurements are a key metric for managing ADPKD (the most common inherited renal disease). However, measuring organ volumes is tedious and involves manually contouring organ outlines on multiple cross-sectional MRI or CT images. The automation of kidney contouring using deep learning has been proposed, as it has small errors compared to manual contouring. Here, a deployed open-source deep learning ADPKD kidney segmentation pipeline is extended to also measure liver and spleen volumes, which are also important. This 2D U-net deep learning approach was developed with radiologist labeled T2-weighted images from 215 ADPKD subjects (70% training = 151, 30% validation = 64). Additional ADPKD subjects were utilized for prospective (n = 30) and external (n = 30) validations for a total of 275 subjects. Image cropping previously optimized for kidneys was included in training but removed for the validation and inference to accommodate the liver which is closer to the image border. An effective algorithm was developed to adjudicate overlap voxels that are labeled as more than one organ. Left kidney, right kidney, liver and spleen labels had average errors of 3%, 7%, 3%, and 1%, respectively, on external validation and 5%, 6%, 5%, and 1% on prospective validation. Dice scores also showed that the deep learning model was close to the radiologist contouring, measuring 0.98, 0.96, 0.97 and 0.96 on external validation and 0.96, 0.96, 0.96 and 0.95 on prospective validation for left kidney, right kidney, liver and spleen, respectively. The time required for manual correction of deep learning segmentation errors was only 19:17 min compared to 33:04 min for manual segmentations, a 42% time saving (p = 0.004). Standard deviation of model assisted segmentations was reduced to 7, 5, 11, 5 mL for right kidney, left kidney, liver and spleen respectively from 14, 10, 55 and 14 mL for manual segmentations. Thus, deep learning reduces the radiologist time required to perform multiorgan segmentations in ADPKD and reduces measurement variability.

Assuntos

Aprendizado Profundo , Rim Policístico Autossômico Dominante , Automação , Estudos Transversais , Humanos , Rim/diagnóstico por imagem , Fígado/diagnóstico por imagem , Tamanho do Órgão , Rim Policístico Autossômico Dominante/diagnóstico por imagem , Baço/diagnóstico por imagem

19.

Trustworthy assertion classification through prompting.

Wang, Song; Tang, Liyan; Majety, Akash; Rousseau, Justin F; Shih, George; Ding, Ying; Peng, Yifan.

J Biomed Inform ; 132: 104139, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-35811026

RESUMO

Accurate identification of the presence, absence or possibility of relevant entities in clinical notes is important for healthcare professionals to quickly understand crucial clinical information. This introduces the task of assertion classification - to correctly identify the assertion status of an entity in the unstructured clinical notes. Recent rule-based and machine-learning approaches suffer from labor-intensive pattern engineering and severe class bias toward majority classes. To solve this problem, in this study, we propose a prompt-based learning approach, which treats the assertion classification task as a masked language auto-completion problem. We evaluated the model on six datasets. Our prompt-based method achieved a micro-averaged F-1 of 0.954 on the i2b2 2010 assertion dataset, with â¼1.8% improvements over previous works. In particular, our model showed excellence in detecting classes with few instances (few-shot). Evaluations on five external datasets showcase the outstanding generalizability of the prompt-based method to unseen data. To examine the rationality of our model, we further introduced two rationale faithfulness metrics: comprehensiveness and sufficiency. The results reveal that compared to the "pre-train, fine-tune" procedure, our prompt-based model has a stronger capability of identifying the comprehensive (â¼63.93%) and sufficient (â¼11.75%) linguistic features from free text. We further evaluated the model-agnostic explanations using LIME. The results imply a better rationale agreement between our model and human beings (â¼71.93% in average F-1), which demonstrates the superior trustworthiness of our model.

Assuntos

Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Linguística , Aprendizado de Máquina

20.

Prior Knowledge Enhances Radiology Report Generation.

Wang, Song; Tang, Liyan; Lin, Mingquan; Shih, George; Ding, Ying; Peng, Yifan.

AMIA Jt Summits Transl Sci Proc ; 2022: 486-495, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35854760

RESUMO

Radiology report generation aims to produce computer-aided diagnoses to alleviate the workload of radiologists and has drawn increasing attention recently. However, previous deep learning methods tend to neglect the mutual influences between medical findings, which can be the bottleneck that limits the quality of generated reports. In this work, we propose to mine and represent the associations among medical findings in an informative knowledge graph and incorporate this prior knowledge with radiology report generation to help improve the quality of generated reports. Experiment results demonstrate the superior performance of our proposed method on the IU X-ray dataset with a ROUGE-L of 0.384±0.007 and CIDEr of 0.340±0.011. Compared with previous works, our model achieves an average of 1.6% improvement (2.0% and 1.5% improvements in CIDEr and ROUGE-L, respectively). The experiments suggest that prior knowledge can bring performance gains to accurate radiology report generation. We will make the code publicly available at https://github.com/bionlplab/report_generation_amia2022.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA