Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 1 de 1
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Radiology ; 311(3): e233117, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38888478

RESUMEN

Background Structured radiology reports for pancreatic ductal adenocarcinoma (PDAC) improve surgical decision-making over free-text reports, but radiologist adoption is variable. Resectability criteria are applied inconsistently. Purpose To evaluate the performance of large language models (LLMs) in automatically creating PDAC synoptic reports from original reports and to explore performance in categorizing tumor resectability. Materials and Methods In this institutional review board-approved retrospective study, 180 consecutive PDAC staging CT reports on patients referred to the authors' European Society for Medical Oncology-designated cancer center from January to December 2018 were included. Reports were reviewed by two radiologists to establish the reference standard for 14 key findings and National Comprehensive Cancer Network (NCCN) resectability category. GPT-3.5 and GPT-4 (accessed September 18-29, 2023) were prompted to create synoptic reports from original reports with the same 14 features, and their performance was evaluated (recall, precision, F1 score). To categorize resectability, three prompting strategies (default knowledge, in-context knowledge, chain-of-thought) were used for both LLMs. Hepatopancreaticobiliary surgeons reviewed original and artificial intelligence (AI)-generated reports to determine resectability, with accuracy and review time compared. The McNemar test, t test, Wilcoxon signed-rank test, and mixed effects logistic regression models were used where appropriate. Results GPT-4 outperformed GPT-3.5 in the creation of synoptic reports (F1 score: 0.997 vs 0.967, respectively). Compared with GPT-3.5, GPT-4 achieved equal or higher F1 scores for all 14 extracted features. GPT-4 had higher precision than GPT-3.5 for extracting superior mesenteric artery involvement (100% vs 88.8%, respectively). For categorizing resectability, GPT-4 outperformed GPT-3.5 for each prompting strategy. For GPT-4, chain-of-thought prompting was most accurate, outperforming in-context knowledge prompting (92% vs 83%, respectively; P = .002), which outperformed the default knowledge strategy (83% vs 67%, P < .001). Surgeons were more accurate in categorizing resectability using AI-generated reports than original reports (83% vs 76%, respectively; P = .03), while spending less time on each report (58%; 95% CI: 0.53, 0.62). Conclusion GPT-4 created near-perfect PDAC synoptic reports from original reports. GPT-4 with chain-of-thought achieved high accuracy in categorizing resectability. Surgeons were more accurate and efficient using AI-generated reports. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Chang in this issue.


Asunto(s)
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Neoplasias Pancreáticas/cirugía , Neoplasias Pancreáticas/diagnóstico por imagen , Neoplasias Pancreáticas/patología , Estudios Retrospectivos , Carcinoma Ductal Pancreático/cirugía , Carcinoma Ductal Pancreático/diagnóstico por imagen , Carcinoma Ductal Pancreático/patología , Femenino , Masculino , Anciano , Persona de Mediana Edad , Tomografía Computarizada por Rayos X/métodos , Procesamiento de Lenguaje Natural , Inteligencia Artificial , Anciano de 80 o más Años
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA