Your browser doesn't support javascript.
loading
Structuring medication signeturs as a language regression task: comparison of zero- and few-shot GPT with fine-tuned models.
Garcia-Agundez, Augusto; Kay, Julia L; Li, Jing; Gianfrancesco, Milena; Rai, Baljeet; Hu, Angela; Schmajuk, Gabriela; Yazdany, Jinoos.
Afiliação
  • Garcia-Agundez A; Division of Rheumatology, University of California San Francisco, San Francisco, CA 94110, United States.
  • Kay JL; Division of Rheumatology, University of California San Francisco, San Francisco, CA 94110, United States.
  • Li J; Division of Rheumatology, University of California San Francisco, San Francisco, CA 94110, United States.
  • Gianfrancesco M; Division of Rheumatology, University of California San Francisco, San Francisco, CA 94110, United States.
  • Rai B; Division of Rheumatology, University of California San Francisco, San Francisco, CA 94110, United States.
  • Hu A; Division of Rheumatology, University of California San Francisco, San Francisco, CA 94110, United States.
  • Schmajuk G; Division of Rheumatology, University of California San Francisco, San Francisco, CA 94110, United States.
  • Yazdany J; Division of Rheumatology, University of California San Francisco, San Francisco, CA 94110, United States.
JAMIA Open ; 7(2): ooae051, 2024 Jul.
Article em En | MEDLINE | ID: mdl-38915730
ABSTRACT
Importance Electronic health record textual sources such as medication signeturs (sigs) contain valuable information that is not always available in structured form. Commonly processed through manual annotation, this repetitive and time-consuming task could be fully automated using large language models (LLMs). While most sigs include simple instructions, some include complex patterns.

Objectives:

We aimed to compare the performance of GPT-3.5 and GPT-4 with smaller fine-tuned models (ClinicalBERT, BlueBERT) in extracting the average daily dose of 2 immunomodulating medications with frequent complex sigs hydroxychloroquine, and prednisone.

Methods:

Using manually annotated sigs as the gold standard, we compared the performance of these models in 702 hydroxychloroquine and 22 104 prednisone prescriptions.

Results:

GPT-4 vastly outperformed all other models for this task at any level of in-context learning. With 100 in-context examples, the model correctly annotates 94% of hydroxychloroquine and 95% of prednisone sigs to within 1 significant digit. Error analysis conducted by 2 additional manual annotators on annotator-model disagreements suggests that the vast majority of disagreements are model errors. Many model errors relate to ambiguous sigs on which there was also frequent annotator disagreement.

Discussion:

Paired with minimal manual annotation, GPT-4 achieved excellent performance for language regression of complex medication sigs and vastly outperforms GPT-3.5, ClinicalBERT, and BlueBERT. However, the number of in-context examples needed to reach maximum performance was similar to GPT-3.5.

Conclusion:

LLMs show great potential to rapidly extract structured data from sigs in no-code fashion for clinical and research applications.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article