Your browser doesn't support javascript.
loading
Personalized Impression Generation for PET Reports Using Large Language Models.
Tie, Xin; Shin, Muheon; Pirasteh, Ali; Ibrahim, Nevein; Huemann, Zachary; Castellino, Sharon M; Kelly, Kara M; Garrett, John; Hu, Junjie; Cho, Steve Y; Bradshaw, Tyler J.
Affiliation
  • Tie X; Department of Radiology, School of Medicine and Public Health, University of Wissconsin, Madison, WI, USA.
  • Shin M; Department of Medical Physics, School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA.
  • Pirasteh A; Department of Radiology, School of Medicine and Public Health, University of Wissconsin, Madison, WI, USA.
  • Ibrahim N; Department of Radiology, School of Medicine and Public Health, University of Wissconsin, Madison, WI, USA.
  • Huemann Z; Department of Medical Physics, School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA.
  • Castellino SM; Department of Radiology, School of Medicine and Public Health, University of Wissconsin, Madison, WI, USA.
  • Kelly KM; Department of Radiology, School of Medicine and Public Health, University of Wissconsin, Madison, WI, USA.
  • Garrett J; Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA.
  • Hu J; Aflac Cancer and Blood Disorders Center, Childrens Healthcare of Atlanta, Atlanta, GA, USA.
  • Cho SY; Department of Pediatric Oncology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.
  • Bradshaw TJ; Department of Pediatrics, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, Buffalo, NY, USA.
J Imaging Inform Med ; 37(2): 471-488, 2024 Apr.
Article in En | MEDLINE | ID: mdl-38308070
ABSTRACT
Large language models (LLMs) have shown promise in accelerating radiology reporting by summarizing clinical findings into impressions. However, automatic impression generation for whole-body PET reports presents unique challenges and has received little attention. Our study aimed to evaluate whether LLMs can create clinically useful impressions for PET reporting. To this end, we fine-tuned twelve open-source language models on a corpus of 37,370 retrospective PET reports collected from our institution. All models were trained using the teacher-forcing algorithm, with the report findings and patient information as input and the original clinical impressions as reference. An extra input token encoded the reading physician's identity, allowing models to learn physician-specific reporting styles. To compare the performances of different models, we computed various automatic evaluation metrics and benchmarked them against physician preferences, ultimately selecting PEGASUS as the top LLM. To evaluate its clinical utility, three nuclear medicine physicians assessed the PEGASUS-generated impressions and original clinical impressions across 6 quality dimensions (3-point scales) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. When physicians assessed LLM impressions generated in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08/5. On average, physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P = 0.41). In summary, our study demonstrated that personalized impressions generated by PEGASUS were clinically useful in most cases, highlighting its potential to expedite PET reporting by automatically drafting impressions.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies Language: En Journal: J Imaging Inform Med Year: 2024 Document type: Article Affiliation country:

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies Language: En Journal: J Imaging Inform Med Year: 2024 Document type: Article Affiliation country: