Búsqueda | Portal de Búsqueda de la BVS España

Personalized Impression Generation for PET Reports Using Large Language Models.

Tie, Xin; Shin, Muheon; Pirasteh, Ali; Ibrahim, Nevein; Huemann, Zachary; Castellino, Sharon M; Kelly, Kara M; Garrett, John; Hu, Junjie; Cho, Steve Y; Bradshaw, Tyler J.

J Imaging Inform Med ; 37(2): 471-488, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38308070

RESUMEN

Large language models (LLMs) have shown promise in accelerating radiology reporting by summarizing clinical findings into impressions. However, automatic impression generation for whole-body PET reports presents unique challenges and has received little attention. Our study aimed to evaluate whether LLMs can create clinically useful impressions for PET reporting. To this end, we fine-tuned twelve open-source language models on a corpus of 37,370 retrospective PET reports collected from our institution. All models were trained using the teacher-forcing algorithm, with the report findings and patient information as input and the original clinical impressions as reference. An extra input token encoded the reading physician's identity, allowing models to learn physician-specific reporting styles. To compare the performances of different models, we computed various automatic evaluation metrics and benchmarked them against physician preferences, ultimately selecting PEGASUS as the top LLM. To evaluate its clinical utility, three nuclear medicine physicians assessed the PEGASUS-generated impressions and original clinical impressions across 6 quality dimensions (3-point scales) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. When physicians assessed LLM impressions generated in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08/5. On average, physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P = 0.41). In summary, our study demonstrated that personalized impressions generated by PEGASUS were clinically useful in most cases, highlighting its potential to expedite PET reporting by automatically drafting impressions.

Automatic Personalized Impression Generation for PET Reports Using Large Language Models.

Tie, Xin; Shin, Muheon; Pirasteh, Ali; Ibrahim, Nevein; Huemann, Zachary; Castellino, Sharon M; Kelly, Kara M; Garrett, John; Hu, Junjie; Cho, Steve Y; Bradshaw, Tyler J.

ArXiv ; 2023 Oct 17.

Artículo en Inglés | MEDLINE | ID: mdl-37904738

RESUMEN

Purpose: To determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. Materials and Methods: Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician's identity, allowing models to learn physician-specific reporting styles. Our corpus comprised 37,370 retrospective PET reports collected from our institution between 2010 and 2022. To identify the best LLM, 30 evaluation metrics were benchmarked against quality scores from two nuclear medicine (NM) physicians, with the most aligned metrics selecting the model for expert evaluation. In a subset of data, model-generated impressions and original clinical impressions were assessed by three NM physicians according to 6 quality dimensions (3-point scale) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. Bootstrap resampling was used for statistical analysis. Results: Of all evaluation metrics, domain-adapted BARTScore and PEGASUSScore showed the highest Spearman's ρ correlations (ρ=0.568 and 0.563) with physician preferences. Based on these metrics, the fine-tuned PEGASUS model was selected as the top LLM. When physicians reviewed PEGASUS-generated impressions in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08 out of 5. Physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P=0.41). Conclusion: Personalized impressions generated by PEGASUS were clinically useful, highlighting its potential to expedite PET reporting.

The sensitivity and specificity of F-DOPA PET in a movement disorder clinic.

Ibrahim, Nevein; Kusmirek, Joanna; Struck, Aaron F; Floberg, John M; Perlman, Scott B; Gallagher, Catherine; Hall, Lance T.

Am J Nucl Med Mol Imaging ; 6(1): 102-9, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27069770

RESUMEN

Idiopathic Parkinson's disease (PD) is the second most common neurodegenerative disorder. Early PD may present a diagnostic challenge with broad differential diagnoses that are not associated with nigral degeneration or striatal dopamine deficiency. Therefore, the early clinical diagnosis alone may not be accurate and this reinforces the importance of functional imaging targeting the pathophysiology of the disease process. (18)F-DOPA L-6-[(18)F] fluoro-3,4-dihydroxyphenylalnine ((18)F-DOPA) is a positron emission tomography (PET) agent that measures the uptake of dopamine precursors for assessment of presynaptic dopaminergic integrity and has been shown to accurately reflect the monoaminergic disturbances in PD. In this study, we aim to illustrate our local experience to determine the accuracy of (18)F-DOPA PET for diagnosis of PD. We studied a total of 27 patients. A retrospective analysis was carried out for all patients that underwent (18)F-DOPA PET brain scan for motor symptoms suspicious for PD between 2001-2008. Both qualitative and semi-quantitative analyses of the scans were performed. The patient's medical records were then assessed for length of follow-up, response to levodopa, clinical course of illness, and laterality of symptoms at time of (18)F-DOPA PET. The eventual diagnosis by the referring neurologist, movement disorder specialist, was used as the reference standard for further analysis. Of the 28 scans, we found that one was a false negative, 20 were true positives, and 7 were true negatives. The resultant values are Sensitivity 95.4% (95% CI: 100%-75.3%), Specificity 100% (95% CI: 100%-59.0%), PPV 100% (95% CI 100%-80.7%), and NPV 87.5% (95% CI: 99.5%-50.5%).

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA