Your browser doesn't support javascript.
loading
Can GPT-3.5 generate and code discharge summaries?
Falis, Matús; Gema, Aryo Pradipta; Dong, Hang; Daines, Luke; Basetti, Siddharth; Holder, Michael; Penfold, Rose S; Birch, Alexandra; Alex, Beatrice.
Afiliação
  • Falis M; School of Informatics, The University of Edinburgh, Edinburgh EH8 9AB, United Kingdom.
  • Gema AP; School of Informatics, The University of Edinburgh, Edinburgh EH8 9AB, United Kingdom.
  • Dong H; Department of Computer Science, University of Exeter, Exeter EX4 4QF, United Kingdom.
  • Daines L; Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
  • Basetti S; Department of Research, Development and Innovation, National Health Service Highland, Inverness IV2 3JH, United Kingdom.
  • Holder M; Centre for Population Health Sciences, Usher Institute, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
  • Penfold RS; Ageing and Health, Usher Institute, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
  • Birch A; Advanced Care Research Centre, The University of Edinburgh, Edinburgh EH16 4UX, United Kingdom.
  • Alex B; School of Informatics, The University of Edinburgh, Edinburgh EH8 9AB, United Kingdom.
J Am Med Inform Assoc ; 31(10): 2284-2293, 2024 Oct 01.
Article em En | MEDLINE | ID: mdl-39271171
ABSTRACT

OBJECTIVES:

The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels. MATERIALS AND

METHODS:

Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents.

RESULTS:

Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative. DISCUSSION AND

CONCLUSION:

While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Classificação Internacional de Doenças / Codificação Clínica / Sumários de Alta do Paciente Hospitalar Limite: Humans Idioma: En Revista: J Am Med Inform Assoc Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Classificação Internacional de Doenças / Codificação Clínica / Sumários de Alta do Paciente Hospitalar Limite: Humans Idioma: En Revista: J Am Med Inform Assoc Ano de publicação: 2024 Tipo de documento: Article