Generation and evaluation of artificial mental health records for Natural Language Processing.

Ive, Julia; Viani, Natalia; Kam, Joyce; Yin, Lucia; Verma, Somain; Puntis, Stephen; Cardinal, Rudolf N; Roberts, Angus; Stewart, Robert; Velupillai, Sumithra

Ive, Julia; Viani, Natalia; Kam, Joyce; Yin, Lucia; Verma, Somain; Puntis, Stephen; Cardinal, Rudolf N; Roberts, Angus; Stewart, Robert; Velupillai, Sumithra.

Afiliación

Ive J; 1Department of Computing, Imperial College London, London, SW7 2AZ UK.
Viani N; 2IoPPN, King's College London, SE5 8AF London, UK.
Kam J; 2IoPPN, King's College London, SE5 8AF London, UK.
Yin L; 2IoPPN, King's College London, SE5 8AF London, UK.
Verma S; 2IoPPN, King's College London, SE5 8AF London, UK.
Puntis S; 3Department of Psychiatry, University of Oxford, Warneford Hospital, OX3 7JX Oxford, UK.
Cardinal RN; 4Department of Psychiatry, University of Cambridge, Downing Street, Cambridge, CB2 3EB UK.
Roberts A; 5Cambridge Biomedical Campus, Cambridgeshire and Peterborough NHS Foundation Trust, Box 190, Cambridge, CB2 0QQ UK.
Stewart R; 2IoPPN, King's College London, SE5 8AF London, UK.
Velupillai S; 2IoPPN, King's College London, SE5 8AF London, UK.

NPJ Digit Med ; 3: 69, 2020.

Article en En | MEDLINE | ID: mdl-32435697

ABSTRACT

ABSTRACT

A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

Palabras clave

Medical research; Scientific community

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: NPJ Digit Med Año: 2020 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: NPJ Digit Med Año: 2020 Tipo del documento: Article