SIMpat: A Synthetic Benchmark for Similarity Metrics on Patient Representations.

Voegeli, Jean-Virgile; Bjelogrlic, Mina; Gaudet-Blavignac, Christophe; Dubos, Richard; Zimmermann, Myriam; Bensahla Talet, Adel; Zheng, Yuanyuan; Ehrsam, Julien; Lovis, Christian

Voegeli, Jean-Virgile; Bjelogrlic, Mina; Gaudet-Blavignac, Christophe; Dubos, Richard; Zimmermann, Myriam; Bensahla Talet, Adel; Zheng, Yuanyuan; Ehrsam, Julien; Lovis, Christian.

Voegeli JV; Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland.
Bjelogrlic M; Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
Gaudet-Blavignac C; Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland.
Dubos R; Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
Zimmermann M; Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland.
Bensahla Talet A; Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
Zheng Y; Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland.
Ehrsam J; Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
Lovis C; Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland.

Stud Health Technol Inform ; 316: 1647-1651, 2024 Aug 22.

Article en En | MEDLINE | ID: mdl-39176526

ABSTRACT

ABSTRACT

Similarity and clustering tasks based on data extracted from electronic health records on the patient level suffer from the curse of dimensionality and the lack of inter-patient data comparability. Indeed, for many health institutions, there are many more variables, and ways of expressing those variables to represent patients than patients sharing the same set of data. To lower redundancy and increase interoperability one strategy is to map data to semantic-driven representations through medical knowledge graphs such as SNOMED-CT. However, patient similarity metrics based on this knowledge-graph information lack quantitative evaluation and comparisons with pure data-driven methods. The reasons are twofold, firstly, it is hard to conceptually assess and formalize a gold-standard similarity between patients resulting in poor inter-annotator agreement in qualitative evaluations. Secondly, the community has been lacking a clear benchmark to compare existing metrics developed by scientific communities coming from various fields such as ontology, data science, and medical informatics. This study proposes to leverage the known challenges of evaluating patient similarities by proposing SIMpat, a synthetic benchmark to quantitatively evaluate available metrics, based on controlled cohorts, which could later be used to assess their sensibility regarding aspects such as the sparsity of variables or specificities of patient disease patterns.

Asunto(s)

Benchmarking; Registros Electrónicos de Salud; Humanos; Systematized Nomenclature of Medicine; Semántica

Palabras clave

Benchmark; Patient representations; similarity metrics

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Benchmarking / Registros Electrónicos de Salud Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google