Your browser doesn't support javascript.
loading
Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods.
Bhattarai, Kriti; Oh, Inez Y; Sierra, Jonathan Moran; Tang, Jonathan; Payne, Philip R O; Abrams, Zach; Lai, Albert M.
Afiliação
  • Bhattarai K; Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, United States.
  • Oh IY; Department of Computer Science, Washington University in St Louis, St. Louis, MO 63110, United States.
  • Sierra JM; Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, United States.
  • Tang J; Medical Scientist Training Program, Washington University School of Medicine, St. Louis, MO 63110, United States.
  • Payne PRO; Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, United States.
  • Abrams Z; Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, United States.
  • Lai AM; Department of Computer Science, Washington University in St Louis, St. Louis, MO 63110, United States.
JAMIA Open ; 7(3): ooae060, 2024 Oct.
Article em En | MEDLINE | ID: mdl-38962662
ABSTRACT

Objective:

Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and

Methods:

Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores.

Results:

GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and

Conclusion:

GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article