RESUMO
INTRODUCTION: GPT-4 is a large language model with potential for multiple applications in urology. Our study sought to evaluate GPT-4's performance in data extraction from renal surgery operative notes. METHODS: GPT-4 was queried to extract information on laterality, surgery, approach, estimated blood loss, and ischemia time from deidentified operative notes. Match rates were determined by the number of "matched" data points between GPT-4 and human-curated extraction. Accuracy rates were calculated after manually reviewing "not matched" data points. Cohen's kappa and the intraclass coefficient were used to evaluate interrater agreement/reliability. RESULTS: Our cohort consisted of 1498 renal surgeries from 2003 to 2023. Match rates were high for laterality (94.4%), surgery (92.5%), and approach (89.4%), but lower for estimated blood loss (77.1%) and ischemia time (25.6%). GPT-4 was more accurate for estimated blood loss (90.3% vs 85.5% human curated) and similarly accurate for laterality (95.2% vs 95.3% human curated). Human-curated accuracy rates were higher for surgery (99.3% vs 93% GPT-4), approach (97.9% vs 90.8% GPT-4), and ischemia time (95.6% vs 30.7% GPT-4). Cohen's kappa was 0.96 for laterality, 0.83 for approach, and 0.71 for surgery. The intraclass coefficient was 0.62 for estimated blood loss and 0.09 for ischemia time. CONCLUSIONS: Match and accuracy rates were higher for categorical variables. GPT-4 data extraction was particularly error prone for variables with heterogenous documentation styles. The role of a standard operative template to aid data extraction will be explored in the future. GPT-4 can be utilized as a helpful and efficient data extraction tool with manual feedback.
Assuntos
Estudos de Viabilidade , Humanos , Rim/cirurgia , Feminino , Masculino , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Nefrectomia/métodosRESUMO
Background and objective: Focal therapy (FT) is increasingly recognized as a promising approach for managing localized prostate cancer (PCa), notably reducing treatment-related morbidities. However, post-treatment anatomical changes present significant challenges for surveillance using current imaging techniques. This study aimed to evaluate the inter-reader agreement and efficacy of the Prostate Imaging after Focal Ablation (PI-FAB) scoring system in detecting clinically significant prostate cancer (csPCa) on post-FT multiparametric magnetic resonance imaging (mpMRI). Methods: A retrospective cohort study was conducted involving patients who underwent primary FT for localized csPCa between 2013 and 2023, followed by post-FT mpMRI and a prostate biopsy. Two expert genitourinary radiologists retrospectively evaluated post-FT mpMRI using PI-FAB. The key measures included inter-reader agreement of PI-FAB scores, assessed by quadratic weighted Cohen's kappa (κ), and the system's efficacy in predicting in-field recurrence of csPCa, with a PI-FAB score cutoff of 3. Additional diagnostic metrics including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall accuracy were also evaluated. Key findings and limitations: Scans from 38 patients were analyzed, revealing a moderate level of agreement in PI-FAB scoring (κ = 0.56). Both radiologists achieved sensitivity of 93% in detecting csPCa, although specificity, PPVs, NPVs, and accuracy varied. Conclusions and clinical implications: The PI-FAB scoring system exhibited high sensitivity with moderate inter-reader agreement in detecting in-field recurrence of csPCa. Despite promising results, its low specificity and PPV necessitate further refinement. These findings underscore the need for larger studies to validate the clinical utility of PI-FAB, potentially aiding in standardizing post-treatment surveillance. Patient summary: Focal therapy has emerged as a promising approach for managing localized prostate cancer, but limitations in current imaging techniques present significant challenges for post-treatment surveillance. The Prostate Imaging after Focal Ablation (PI-FAB) scoring system showed high sensitivity for detecting in-field recurrence of clinically significant prostate cancer. However, its low specificity and positive predictive value necessitate further refinement. Larger, more comprehensive studies are needed to fully validate its clinical utility.