Estimating redundancy in clinical text.

Searle, Thomas; Ibrahim, Zina; Teo, James; Dobson, Richard

Searle, Thomas; Ibrahim, Zina; Teo, James; Dobson, Richard.

Afiliação

Searle T; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK. Electronic address: thomas.searle@kcl.ac.uk.
Ibrahim Z; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK. Electronic address: zina.ibrahim@kcl.ac.uk.
Teo J; King's College Hospital NHS Foundation Trust, London, UK. Electronic address: jamesteo@nhs.net.
Dobson R; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Institute of Health Informatics, University College London, London, UK. Electronic address: richard.j.dobson@kcl.ac.uk.

J Biomed Inform ; 124: 103938, 2021 12.

Article em En | MEDLINE | ID: mdl-34695581

ABSTRACT

ABSTRACT

The current mode of use of Electronic Health Records (EHR) elicits text redundancy. Clinicians often populate new documents by duplicating existing notes, then updating accordingly. Data duplication can lead to propagation of errors, inconsistencies and misreporting of care. Therefore, measures to quantify information redundancy play an essential role in evaluating innovations that operate on clinical narratives. This work is a quantitative examination of information redundancy in EHR notes. We present and evaluate two methods to measure redundancy an information-theoretic approach and a lexicosyntactic and semantic model. Our first measure trains large Transformer-based language models using clinical text from a large openly available US-based ICU dataset and a large multi-site UK based Hospital. By comparing the information-theoretic efficient encoding of clinical text against open-domain corpora, we find that clinical text is â¼1.5× to â¼3× less efficient than open-domain corpora at conveying information. Our second measure, evaluates automated summarisation metrics Rouge and BERTScore to evaluate successive note pairs demonstrating lexicosyntactic and semantic redundancy, with averages from â¼43 to â¼65%.

Assuntos

Registros Eletrônicos de Saúde; Processamento de Linguagem Natural; Idioma; Narração; Semântica

Palavras-chave

Deep transfer learning for language modelling of clinical text; Natural language processing methods to estimate redundancy of clinical text

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Registros Eletrônicos de Saúde Tipo de estudo: Prognostic_studies / Qualitative_research Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google