Your browser doesn't support javascript.
loading
Evaluating Linkage Quality of Population-Based Administrative Data for Health Service Research.
Kim, Ji-Woo; Choi, Hyojung; Lim, Hyun Jeung; Oh, Miae; Ahn, Jae Joon.
  • Kim JW; Big Data Linkage Division, Health Insurance Review & Assessment Service, Wonju, Korea.
  • Choi H; Digital Medical Technology Listing Division, Health Insurance Review & Assessment Service, Wonju, Korea.
  • Lim HJ; DRG Administration Division, Health Insurance Review & Assessment Service, Wonju, Korea.
  • Oh M; Center for Research on Big Data Information, Korea Institute for Health and Social Affairs, Sejong, Korea.
  • Ahn JJ; Division of Data Science, Yonsei University, Wonju, Korea. ahn2615@yonsei.ac.kr.
J Korean Med Sci ; 39(14): e127, 2024 Apr 15.
Article en En | MEDLINE | ID: mdl-38622936
ABSTRACT

BACKGROUND:

To overcome the limitations of relying on data from a single institution, many researchers have studied data linkage methodologies. Data linkage includes errors owing to legal issues surrounding personal information and technical issues related to data processing. Linkage errors affect selection bias, and external and internal validity. Therefore, quality verification for each connection method with adherence to personal information protection is an important issue. This study evaluated the linkage quality of linked data and analyzed the potential bias resulting from linkage errors.

METHODS:

This study analyzed claims data submitted to the Health Insurance Review and Assessment Service (HIRA DATA). The linkage errors of the two deterministic linkage methods were evaluated based on the use of the match key. The first deterministic linkage uses a unique identification number, and the second deterministic linkage uses the name, gender, and date of birth as a set of partial identifiers. The linkage error included in this deterministic linkage method was compared with the absolute standardized difference (ASD) of Cohen's according to the baseline characteristics, and the linkage quality was evaluated through the following indicators linked rate, false match rate, missed match rate, positive predictive value, sensitivity, specificity, and F1-score.

RESULTS:

For the deterministic linkage method that used the name, gender, and date of birth as a set of partial identifiers, the true match rate was 83.5 and the missed match rate was 16.5. Although there was bias in some characteristics of the data, most of the ASD values were less than 0.1, with no case greater than 0.5. Therefore, it is difficult to determine whether linked data constructed with deterministic linkages have substantial differences.

CONCLUSION:

This study confirms the possibility of building health and medical data at the national level as the first data linkage quality verification study using big data from the HIRA. Analyzing the quality of linkages is crucial for comprehending linkage errors and generating reliable analytical outcomes. Linkers should increase the reliability of linked data by providing linkage error-related information to researchers. The results of this study will serve as reference data to increase the reliability of multicenter data linkage studies.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Registro Médico Coordinado / Almacenamiento y Recuperación de la Información Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Registro Médico Coordinado / Almacenamiento y Recuperación de la Información Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article