Pesquisa | Portal Regional da BVS

Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer.

Bhayana, Rajesh; Nanda, Bipin; Dehkharghanian, Taher; Deng, Yangqing; Bhambra, Nishaant; Elias, Gavin; Datta, Daksh; Kambadakone, Avinash; Shwaartz, Chaya G; Moulton, Carol-Anne; Henault, David; Gallinger, Steven; Krishna, Satheesh.

Radiology ; 311(3): e233117, 2024 06.

Artigo em Inglês | MEDLINE | ID: mdl-38888478

RESUMO

Background Structured radiology reports for pancreatic ductal adenocarcinoma (PDAC) improve surgical decision-making over free-text reports, but radiologist adoption is variable. Resectability criteria are applied inconsistently. Purpose To evaluate the performance of large language models (LLMs) in automatically creating PDAC synoptic reports from original reports and to explore performance in categorizing tumor resectability. Materials and Methods In this institutional review board-approved retrospective study, 180 consecutive PDAC staging CT reports on patients referred to the authors' European Society for Medical Oncology-designated cancer center from January to December 2018 were included. Reports were reviewed by two radiologists to establish the reference standard for 14 key findings and National Comprehensive Cancer Network (NCCN) resectability category. GPT-3.5 and GPT-4 (accessed September 18-29, 2023) were prompted to create synoptic reports from original reports with the same 14 features, and their performance was evaluated (recall, precision, F1 score). To categorize resectability, three prompting strategies (default knowledge, in-context knowledge, chain-of-thought) were used for both LLMs. Hepatopancreaticobiliary surgeons reviewed original and artificial intelligence (AI)-generated reports to determine resectability, with accuracy and review time compared. The McNemar test, t test, Wilcoxon signed-rank test, and mixed effects logistic regression models were used where appropriate. Results GPT-4 outperformed GPT-3.5 in the creation of synoptic reports (F1 score: 0.997 vs 0.967, respectively). Compared with GPT-3.5, GPT-4 achieved equal or higher F1 scores for all 14 extracted features. GPT-4 had higher precision than GPT-3.5 for extracting superior mesenteric artery involvement (100% vs 88.8%, respectively). For categorizing resectability, GPT-4 outperformed GPT-3.5 for each prompting strategy. For GPT-4, chain-of-thought prompting was most accurate, outperforming in-context knowledge prompting (92% vs 83%, respectively; P = .002), which outperformed the default knowledge strategy (83% vs 67%, P < .001). Surgeons were more accurate in categorizing resectability using AI-generated reports than original reports (83% vs 76%, respectively; P = .03), while spending less time on each report (58%; 95% CI: 0.53, 0.62). Conclusion GPT-4 created near-perfect PDAC synoptic reports from original reports. GPT-4 with chain-of-thought achieved high accuracy in categorizing resectability. Surgeons were more accurate and efficient using AI-generated reports. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Chang in this issue.

Assuntos

Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Neoplasias Pancreáticas/cirurgia , Neoplasias Pancreáticas/diagnóstico por imagem , Neoplasias Pancreáticas/patologia , Estudos Retrospectivos , Carcinoma Ductal Pancreático/cirurgia , Carcinoma Ductal Pancreático/diagnóstico por imagem , Carcinoma Ductal Pancreático/patologia , Feminino , Masculino , Idoso , Pessoa de Meia-Idade , Tomografia Computadorizada por Raios X/métodos , Processamento de Linguagem Natural , Inteligência Artificial , Idoso de 80 Anos ou mais

Reply to "Zero-, Single-, and Few-Shot Learning in Large Language Models to Identify Incidental Findings From Radiology Reports".

Bhayana, Rajesh; Elias, Gavin; Datta, Daksh; Bhambra, Nishaant; Deng, Yangqing; Krishna, Satheesh.

AJR Am J Roentgenol ; 222(3): e2431060, 2024 03.

Artigo em Inglês | MEDLINE | ID: mdl-38447023

Assuntos

Achados Incidentais , Radiologia , Humanos , Radiografia , Idioma

Use of GPT-4 With Single-Shot Learning to Identify Incidental Findings in Radiology Reports.

Bhayana, Rajesh; Elias, Gavin; Datta, Daksh; Bhambra, Nishaant; Deng, Yangqing; Krishna, Satheesh.

AJR Am J Roentgenol ; 222(3): e2330651, 2024 03.

Artigo em Inglês | MEDLINE | ID: mdl-38197759

RESUMO

GPT-4 identified incidental adrenal nodules, pancreatic cystic lesions, and vascular calcifications in radiology reports with F1 scores of 1.00, 0.91, and 0.99, respectively. The findings indicate a potential role for large language models to help improve recognition and management of incidental imaging findings and to be applied flexibly in a medical context.

Assuntos

Achados Incidentais , Radiologia , Humanos , Tomografia Computadorizada por Raios X , Aprendizagem

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA