Your browser doesn't support javascript.
loading
Estimating redundancy in clinical text.
Searle, Thomas; Ibrahim, Zina; Teo, James; Dobson, Richard.
Afiliação
  • Searle T; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK. Electronic address: thomas.searle@kcl.ac.uk.
  • Ibrahim Z; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK. Electronic address: zina.ibrahim@kcl.ac.uk.
  • Teo J; King's College Hospital NHS Foundation Trust, London, UK. Electronic address: jamesteo@nhs.net.
  • Dobson R; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Institute of Health Informatics, University College London, London, UK. Electronic address: richard.j.dobson@kcl.ac.uk.
J Biomed Inform ; 124: 103938, 2021 12.
Article em En | MEDLINE | ID: mdl-34695581
ABSTRACT
The current mode of use of Electronic Health Records (EHR) elicits text redundancy. Clinicians often populate new documents by duplicating existing notes, then updating accordingly. Data duplication can lead to propagation of errors, inconsistencies and misreporting of care. Therefore, measures to quantify information redundancy play an essential role in evaluating innovations that operate on clinical narratives. This work is a quantitative examination of information redundancy in EHR notes. We present and evaluate two methods to measure redundancy an information-theoretic approach and a lexicosyntactic and semantic model. Our first measure trains large Transformer-based language models using clinical text from a large openly available US-based ICU dataset and a large multi-site UK based Hospital. By comparing the information-theoretic efficient encoding of clinical text against open-domain corpora, we find that clinical text is ∼1.5× to ∼3× less efficient than open-domain corpora at conveying information. Our second measure, evaluates automated summarisation metrics Rouge and BERTScore to evaluate successive note pairs demonstrating lexicosyntactic and semantic redundancy, with averages from ∼43 to ∼65%.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Registros Eletrônicos de Saúde Tipo de estudo: Prognostic_studies / Qualitative_research Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Registros Eletrônicos de Saúde Tipo de estudo: Prognostic_studies / Qualitative_research Idioma: En Ano de publicação: 2021 Tipo de documento: Article