Your browser doesn't support javascript.
loading
Assigning clinical codes with data-driven concept representation on Dutch clinical free text.
Scheurwegs, Elyne; Luyckx, Kim; Luyten, Léon; Goethals, Bart; Daelemans, Walter.
Afiliação
  • Scheurwegs E; University of Antwerp, Advanced Database Research and Modelling Research Group (ADReM), Middelheimlaan 1, B-2020 Antwerp, Belgium; University of Antwerp, Computational Linguistics and Psycholinguistics (CLiPS) Research Center, Lange Winkelstraat 40-42, B-2000 Antwerp, Belgium. Electronic address: el
  • Luyckx K; Antwerp University Hospital, ICT Department, Wilrijkstraat 10, B-2650 Edegem, Belgium.
  • Luyten L; Antwerp University Hospital, Medical Information Department, Wilrijkstraat 10, B-2650 Edegem, Belgium.
  • Goethals B; University of Antwerp, Advanced Database Research and Modelling Research Group (ADReM), Middelheimlaan 1, B-2020 Antwerp, Belgium.
  • Daelemans W; University of Antwerp, Computational Linguistics and Psycholinguistics (CLiPS) Research Center, Lange Winkelstraat 40-42, B-2000 Antwerp, Belgium.
J Biomed Inform ; 69: 118-127, 2017 05.
Article em En | MEDLINE | ID: mdl-28400312
ABSTRACT
Clinical codes are used for public reporting purposes, are fundamental to determining public financing for hospitals, and form the basis for reimbursement claims to insurance providers. They are assigned to a patient stay to reflect the diagnosis and performed procedures during that stay. This paper aims to enrich algorithms for automated clinical coding by taking a data-driven approach and by using unsupervised and semi-supervised techniques for the extraction of multi-word expressions that convey a generalisable medical meaning (referred to as concepts). Several methods for extracting concepts from text are compared, two of which are constructed from a large unannotated corpus of clinical free text. A distributional semantic model (i.c. the word2vec skip-gram model) is used to generalize over concepts and retrieve relations between them. These methods are validated on three sets of patient stay data, in the disease areas of urology, cardiology, and gastroenterology. The datasets are in Dutch, which introduces a limitation on available concept definitions from expert-based ontologies (e.g. UMLS). The results show that when expert-based knowledge in ontologies is unavailable, concepts derived from raw clinical texts are a reliable alternative. Both concepts derived from raw clinical texts perform and concepts derived from expert-created dictionaries outperform a bag-of-words approach in clinical code assignment. Adding features based on tokens that appear in a semantically similar context has a positive influence for predicting diagnostic codes. Furthermore, the experiments indicate that a distributional semantics model can find relations between semantically related concepts in texts but also introduces erroneous and redundant relations, which can undermine clinical coding performance.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Semântica / Processamento de Linguagem Natural / Bases de Conhecimento / Codificação Clínica Tipo de estudo: Prognostic_studies Limite: Humans País como assunto: Europa Idioma: En Ano de publicação: 2017 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Semântica / Processamento de Linguagem Natural / Bases de Conhecimento / Codificação Clínica Tipo de estudo: Prognostic_studies Limite: Humans País como assunto: Europa Idioma: En Ano de publicação: 2017 Tipo de documento: Article