Your browser doesn't support javascript.
loading
Determining and assessing characteristics of data element names impacting the performance of annotation using Usagi.
de Groot, Rowdy; Püttmann, Daniel P; Fleuren, Lucas M; Thoral, Patrick J; Elbers, Paul W G; de Keizer, Nicolette F; Cornet, Ronald.
Afiliação
  • de Groot R; Amsterdam UMC Location University of Amsterdam, Department of Medical Informatics, Amsterdam, the Netherlands. Electronic address: rowdy.degroot@amsterdamumc.nl.
  • Püttmann DP; Amsterdam UMC Location University of Amsterdam, Department of Medical Informatics, Amsterdam, the Netherlands.
  • Fleuren LM; Department of Intensive Care Medicine, Center for Critical Care Computation Intelligence (C4i), Amsterdam Medical Data Science (AMDS), Amsterdam Public Health (APH), Amsterdam Cardiovascular Science (ACS), Amsterdam Institute for Infection and Immunity (AII), Amsterdam UMC, Vrije Universiteit, Amste
  • Thoral PJ; Department of Intensive Care Medicine, Center for Critical Care Computation Intelligence (C4i), Amsterdam Medical Data Science (AMDS), Amsterdam Public Health (APH), Amsterdam Cardiovascular Science (ACS), Amsterdam Institute for Infection and Immunity (AII), Amsterdam UMC, Vrije Universiteit, Amste
  • Elbers PWG; Department of Intensive Care Medicine, Center for Critical Care Computation Intelligence (C4i), Amsterdam Medical Data Science (AMDS), Amsterdam Public Health (APH), Amsterdam Cardiovascular Science (ACS), Amsterdam Institute for Infection and Immunity (AII), Amsterdam UMC, Vrije Universiteit, Amste
  • de Keizer NF; Amsterdam UMC Location University of Amsterdam, Department of Medical Informatics, Amsterdam, the Netherlands.
  • Cornet R; Amsterdam UMC Location University of Amsterdam, Department of Medical Informatics, Amsterdam, the Netherlands.
Int J Med Inform ; 178: 105200, 2023 10.
Article em En | MEDLINE | ID: mdl-37703800
ABSTRACT

INTRODUCTION:

Hospitals generate large amounts of data and this data is generally modeled and labeled in a proprietary way, hampering its exchange and integration. Manually annotating data element names to internationally standardized data element identifiers is a time-consuming effort. Tools can support performing this task automatically. This study aimed to determine what factors influence the quality of automatic annotations.

METHODS:

Data element names were used from the Dutch COVID-19 ICU Data Warehouse containing data on intensive care patients with COVID-19 from 25 hospitals in the Netherlands. In this data warehouse, the data had been merged using a proprietary terminology system while also storing the original hospital labels (synonymous names). Usagi, an OHDSI annotation tool, was used to perform the annotation for the data. A gold standard was used to determine if Usagi made correct annotations. Logistic regression was used to determine if the number of characters, number of words, match score (Usagi's certainty) and hospital label origin influenced Usagi's performance to annotate correctly.

RESULTS:

Usagi automatically annotated 30.5% of the data element names correctly and 5.5% of the synonymous names. The match score is the best predictor for Usagi finding the correct annotation. It was determined that the AUC of data element names was 0.651 and 0.752 for the synonymous names respectively. The AUC for the individual hospital label origins varied between 0.460 to 0.905.

DISCUSSION:

The results show that Usagi performed better to annotate the data element names than the synonymous names. The hospital origin in the synonymous names dataset was associated with the amount of correctly annotated concepts. Hospitals that performed better had shorter synonymous names and fewer words. Using shorter data element names or synonymous names should be considered to optimize the automatic annotating process. Overall, the performance of Usagi is too poor to completely rely on for automatic annotation.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: COVID-19 Tipo de estudo: Prognostic_studies Limite: Humans País/Região como assunto: Europa Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: COVID-19 Tipo de estudo: Prognostic_studies Limite: Humans País/Região como assunto: Europa Idioma: En Ano de publicação: 2023 Tipo de documento: Article