SIMpat: A Synthetic Benchmark for Similarity Metrics on Patient Representations.
Stud Health Technol Inform
; 316: 1647-1651, 2024 Aug 22.
Article
en En
| MEDLINE
| ID: mdl-39176526
ABSTRACT
Similarity and clustering tasks based on data extracted from electronic health records on the patient level suffer from the curse of dimensionality and the lack of inter-patient data comparability. Indeed, for many health institutions, there are many more variables, and ways of expressing those variables to represent patients than patients sharing the same set of data. To lower redundancy and increase interoperability one strategy is to map data to semantic-driven representations through medical knowledge graphs such as SNOMED-CT. However, patient similarity metrics based on this knowledge-graph information lack quantitative evaluation and comparisons with pure data-driven methods. The reasons are twofold, firstly, it is hard to conceptually assess and formalize a gold-standard similarity between patients resulting in poor inter-annotator agreement in qualitative evaluations. Secondly, the community has been lacking a clear benchmark to compare existing metrics developed by scientific communities coming from various fields such as ontology, data science, and medical informatics. This study proposes to leverage the known challenges of evaluating patient similarities by proposing SIMpat, a synthetic benchmark to quantitatively evaluate available metrics, based on controlled cohorts, which could later be used to assess their sensibility regarding aspects such as the sparsity of variables or specificities of patient disease patterns.
Palabras clave
Texto completo:
1
Banco de datos:
MEDLINE
Asunto principal:
Benchmarking
/
Registros Electrónicos de Salud
Límite:
Humans
Idioma:
En
Año:
2024
Tipo del documento:
Article