Búsqueda | Portal de Búsqueda de la BVS España

1.

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax.

Huemann, Zachary; Tie, Xin; Hu, Junjie; Bradshaw, Tyler J.

J Imaging Inform Med ; 2024 Mar 14.

Artículo en Inglés | MEDLINE | ID: mdl-38485899

RESUMEN

Radiology narrative reports often describe characteristics of a patient's disease, including its location, size, and shape. Motivated by the recent success of multimodal learning, we hypothesized that this descriptive text could guide medical image analysis algorithms. We proposed a novel vision-language model, ConTEXTual Net, for the task of pneumothorax segmentation on chest radiographs. ConTEXTual Net extracts language features from physician-generated free-form radiology reports using a pre-trained language model. We then introduced cross-attention between the language features and the intermediate embeddings of an encoder-decoder convolutional neural network to enable language guidance for image analysis. ConTEXTual Net was trained on the CANDID-PTX dataset consisting of 3196 positive cases of pneumothorax with segmentation annotations from 6 different physicians as well as clinical radiology reports. Using cross-validation, ConTEXTual Net achieved a Dice score of 0.716±0.016, which was similar to the degree of inter-reader variability (0.712±0.044) computed on a subset of the data. It outperformed vision-only models (Swin UNETR: 0.670±0.015, ResNet50 U-Net: 0.677±0.015, GLoRIA: 0.686±0.014, and nnUNet 0.694±0.016) and a competing vision-language model (LAVT: 0.706±0.009). Ablation studies confirmed that it was the text information that led to the performance gains. Additionally, we show that certain augmentation methods degraded ConTEXTual Net's segmentation performance by breaking the image-text concordance. We also evaluated the effects of using different language models and activation functions in the cross-attention module, highlighting the efficacy of our chosen architectural design.

2.

Personalized Impression Generation for PET Reports Using Large Language Models.

Tie, Xin; Shin, Muheon; Pirasteh, Ali; Ibrahim, Nevein; Huemann, Zachary; Castellino, Sharon M; Kelly, Kara M; Garrett, John; Hu, Junjie; Cho, Steve Y; Bradshaw, Tyler J.

J Imaging Inform Med ; 37(2): 471-488, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38308070

RESUMEN

Large language models (LLMs) have shown promise in accelerating radiology reporting by summarizing clinical findings into impressions. However, automatic impression generation for whole-body PET reports presents unique challenges and has received little attention. Our study aimed to evaluate whether LLMs can create clinically useful impressions for PET reporting. To this end, we fine-tuned twelve open-source language models on a corpus of 37,370 retrospective PET reports collected from our institution. All models were trained using the teacher-forcing algorithm, with the report findings and patient information as input and the original clinical impressions as reference. An extra input token encoded the reading physician's identity, allowing models to learn physician-specific reporting styles. To compare the performances of different models, we computed various automatic evaluation metrics and benchmarked them against physician preferences, ultimately selecting PEGASUS as the top LLM. To evaluate its clinical utility, three nuclear medicine physicians assessed the PEGASUS-generated impressions and original clinical impressions across 6 quality dimensions (3-point scales) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. When physicians assessed LLM impressions generated in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08/5. On average, physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P = 0.41). In summary, our study demonstrated that personalized impressions generated by PEGASUS were clinically useful in most cases, highlighting its potential to expedite PET reporting by automatically drafting impressions.

3.

Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network.

Tie, Xin; Shin, Muheon; Lee, Changhee; Perlman, Scott B; Huemann, Zachary; Weisman, Amy J; Castellino, Sharon M; Kelly, Kara M; McCarten, Kathleen M; Alazraki, Adina L; Hu, Junjie; Cho, Steve Y; Bradshaw, Tyler J.

ArXiv ; 2024 Apr 12.

Artículo en Inglés | MEDLINE | ID: mdl-38659641

RESUMEN

Purpose: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. Materials and Methods: This retrospective study included baseline (PET1) and interim (PET2) PET/CT images from 297 patients enrolled in two Children's Oncology Group clinical trials (AHOD1331 and AHOD0831). LAS-Net incorporates longitudinal cross-attention, allowing relevant features from PET1 to inform the analysis of PET2. Model performance was evaluated using Dice coefficients for PET1 and detection F1 scores for PET2. Additionally, we extracted and compared quantitative PET metrics, including metabolic tumor volume (MTV) and total lesion glycolysis (TLG) in PET1, as well as qPET and ΔSUVmax in PET2, against physician measurements. We quantified their agreement using Spearman's ρ correlations and employed bootstrap resampling for statistical analysis. Results: LAS-Net detected residual lymphoma in PET2 with an F1 score of 0.606 (precision/recall: 0.615/0.600), outperforming all comparator methods (P<0.01). For baseline segmentation, LAS-Net achieved a mean Dice score of 0.772. In PET quantification, LAS-Net's measurements of qPET, ΔSUVmax, MTV and TLG were strongly correlated with physician measurements, with Spearman's ρ of 0.78, 0.80, 0.93 and 0.96, respectively. The performance remained high, with a slight decrease, in an external testing cohort. Conclusion: LAS-Net achieved high performance in quantifying PET metrics across serial scans, highlighting the value of longitudinal awareness in evaluating multi-time-point imaging datasets.

4.

A Guide to Cross-Validation for Artificial Intelligence in Medical Imaging.

Bradshaw, Tyler J; Huemann, Zachary; Hu, Junjie; Rahmim, Arman.

Radiol Artif Intell ; 5(4): e220232, 2023 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-37529208

RESUMEN

Artificial intelligence (AI) is being increasingly used to automate and improve technologies within the field of medical imaging. A critical step in the development of an AI algorithm is estimating its prediction error through cross-validation (CV). The use of CV can help prevent overoptimism in AI algorithms and can mitigate certain biases associated with hyperparameter tuning and algorithm selection. This article introduces the principles of CV and provides a practical guide on the use of CV for AI algorithm development in medical imaging. Different CV techniques are described, as well as their advantages and disadvantages under different scenarios. Common pitfalls in prediction error estimation and guidance on how to avoid them are also discussed. Keywords: Education, Research Design, Technical Aspects, Statistics, Supervised Learning, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2023.

5.

Domain-adapted Large Language Models for Classifying Nuclear Medicine Reports.

Huemann, Zachary; Lee, Changhee; Hu, Junjie; Cho, Steve Y; Bradshaw, Tyler J.

Radiol Artif Intell ; 5(6): e220281, 2023 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-38074793

RESUMEN

Purpose: To evaluate the impact of domain adaptation on the performance of language models in predicting five-point Deauville scores on the basis of clinical fluorine 18 fluorodeoxyglucose PET/CT reports. Materials and Methods: The authors retrospectively retrieved 4542 text reports and images for fluorodeoxyglucose PET/CT lymphoma examinations from 2008 to 2018 in the University of Wisconsin-Madison institutional clinical imaging database. Of these total reports, 1664 had Deauville scores that were extracted from the reports and served as training labels. The bidirectional encoder representations from transformers (BERT) model and initialized BERT models BioClinicalBERT, RadBERT, and RoBERTa were adapted to the nuclear medicine domain by pretraining using masked language modeling. These domain-adapted models were then compared with the non-domain-adapted versions on the task of five-point Deauville score prediction. The language models were compared against vision models, multimodal vision-language models, and a nuclear medicine physician, with sevenfold Monte Carlo cross-validation. Means and SDs for accuracy are reported, with P values from paired t testing. Results: Domain adaptation improved the performance of all language models (P = .01). For example, BERT improved from 61.3% ± 2.9 (SD) five-class accuracy to 65.7% ± 2.2 (P = .01) following domain adaptation. Domain-adapted RoBERTa (named DA RoBERTa) performed best, achieving 77.4% ± 3.4 five-class accuracy; this model performed similarly to its multimodal counterpart (named Multimodal DA RoBERTa) (77.2% ± 3.2) and outperformed the best vision-only model (48.1% ± 3.5, P ≤ .001). A physician given the task on a subset of the data had a five-class accuracy of 66%. Conclusion: Domain adaptation improved the performance of large language models in predicting Deauville scores in PET/CT reports.Keywords Lymphoma, PET, PET/CT, Transfer Learning, Unsupervised Learning, Convolutional Neural Network (CNN), Nuclear Medicine, Deauville, Natural Language Processing, Multimodal Learning, Artificial Intelligence, Machine Learning, Language Modeling Supplemental material is available for this article. © RSNA, 2023See also the commentary by Abajian in this issue.

6.

Automatic Personalized Impression Generation for PET Reports Using Large Language Models.

Tie, Xin; Shin, Muheon; Pirasteh, Ali; Ibrahim, Nevein; Huemann, Zachary; Castellino, Sharon M; Kelly, Kara M; Garrett, John; Hu, Junjie; Cho, Steve Y; Bradshaw, Tyler J.

ArXiv ; 2023 Oct 17.

Artículo en Inglés | MEDLINE | ID: mdl-37904738

RESUMEN

Purpose: To determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. Materials and Methods: Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician's identity, allowing models to learn physician-specific reporting styles. Our corpus comprised 37,370 retrospective PET reports collected from our institution between 2010 and 2022. To identify the best LLM, 30 evaluation metrics were benchmarked against quality scores from two nuclear medicine (NM) physicians, with the most aligned metrics selecting the model for expert evaluation. In a subset of data, model-generated impressions and original clinical impressions were assessed by three NM physicians according to 6 quality dimensions (3-point scale) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. Bootstrap resampling was used for statistical analysis. Results: Of all evaluation metrics, domain-adapted BARTScore and PEGASUSScore showed the highest Spearman's ρ correlations (ρ=0.568 and 0.563) with physician preferences. Based on these metrics, the fine-tuned PEGASUS model was selected as the top LLM. When physicians reviewed PEGASUS-generated impressions in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08 out of 5. Physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P=0.41). Conclusion: Personalized impressions generated by PEGASUS were clinically useful, highlighting its potential to expedite PET reporting.

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA