Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models.

Richter-Pechanski, Phillip; Geis, Nicolas A; Kiriakou, Christina; Schwab, Dominic M; Dieterich, Christoph

Richter-Pechanski, Phillip; Geis, Nicolas A; Kiriakou, Christina; Schwab, Dominic M; Dieterich, Christoph.

Afiliação

Richter-Pechanski P; Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg, Germany.
Geis NA; Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.
Kiriakou C; German Center for Cardiovascular Research (DZHK) - Partner Site Heidelberg/Mannheim, Mannheim, Germany.
Schwab DM; Informatics for Life, Heidelberg, Germany.
Dieterich C; Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.

Digit Health ; 7: 20552076211057662, 2021.

Article em En | MEDLINE | ID: mdl-34868618

ABSTRACT

ABSTRACT

OBJECTIVE:

A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects.

METHODS:

We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field.

RESULTS:

Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity).

CONCLUSION:

Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages.

Palavras-chave

Deep learning; bidirectional encoder representations from transformer; fine-tuning; medical information extraction; natural language processing; pre-trained language models

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2021 Tipo de documento: Article