Búsqueda | Portal Regional de la BVS

Clustering Demographics and Sequences of Diagnosis Codes.

Zhong, Haodi; Loukides, Grigorios; Pissis, Solon P.

IEEE J Biomed Health Inform ; 26(5): 2351-2359, 2022 05.

Artículo en Inglés | MEDLINE | ID: mdl-34797768

RESUMEN

A Relational-Sequential dataset (or RS-dataset for short) contains records comprised of a patient's values in demographic attributes and their sequence of diagnosis codes. The task of clustering an RS-dataset is helpful for analyses ranging from pattern mining to classification. However, existing methods are not appropriate to perform this task. Thus, we initiate a study of how an RS-dataset can be clustered effectively and efficiently. We formalize the task of clustering an RS-dataset as an optimization problem. At the heart of the problem is a distance measure we design to quantify the pairwise similarity between records of an RS-dataset. Our measure uses a tree structure that encodes hierarchical relationships between records, based on their demographics, as well as an edit-distance-like measure that captures both the sequentiality and the semantic similarity of diagnosis codes. We also develop an algorithm which first identifies k representative records (centers), for a given k, and then constructs k clusters, each containing one center and the records that are closer to the center compared to other centers. Experiments using two Electronic Health Record datasets demonstrate that our algorithm constructs compact and well-separated clusters, which preserve meaningful relationships between demographics and sequences of diagnosis codes, while being efficient and scalable.

Asunto(s)

Algoritmos , Registros Electrónicos de Salud , Análisis por Conglomerados , Demografía , Humanos , Semántica

Clustering datasets with demographics and diagnosis codes.

Zhong, Haodi; Loukides, Grigorios; Gwadera, Robert.

J Biomed Inform ; 102: 103360, 2020 02.

Artículo en Inglés | MEDLINE | ID: mdl-31904428

RESUMEN

Clustering data derived from Electronic Health Record (EHR) systems is important to discover relationships between the clinical profiles of patients and as a preprocessing step for analysis tasks, such as classification. However, the heterogeneity of these data makes the application of existing clustering methods difficult and calls for new clustering approaches. In this paper, we propose the first approach for clustering a dataset in which each record contains a patient's values in demographic attributes and their set of diagnosis codes. Our approach represents the dataset in a binary form in which the features are selected demographic values, as well as combinations (patterns) of frequent and correlated diagnosis codes. This representation enables measuring similarity between records using cosine similarity, an effective measure for binary-represented data, and finding compact, well-separated clusters through hierarchical clustering. Our experiments using two publicly available EHR datasets, comprised of over 26,000 and 52,000 records, demonstrate that our approach is able to construct clusters with correlated demographics and diagnosis codes, and that it is efficient and scalable.

Asunto(s)

Registros Electrónicos de Salud , Análisis por Conglomerados , Demografía , Humanos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA