Your browser doesn't support javascript.
loading
Automated grouping of medical codes via multiview banded spectral clustering.
Zhang, Luwan; Zhang, Yichi; Cai, Tianrun; Ahuja, Yuri; He, Zeling; Ho, Yuk-Lam; Beam, Andrew; Cho, Kelly; Carroll, Robert; Denny, Joshua; Kohane, Isaac; Liao, Katherine; Cai, Tianxi.
Afiliación
  • Zhang L; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. Electronic address: lzhang@hsph.harvard.edu.
  • Zhang Y; Department of Computer Science and Statistics, University of Rhode Island, Kingston, RI, USA.
  • Cai T; Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA.
  • Ahuja Y; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
  • He Z; Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA.
  • Ho YL; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA.
  • Beam A; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Cho K; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
  • Carroll R; Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
  • Denny J; Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
  • Kohane I; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Liao K; Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Cai T; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
J Biomed Inform ; 100: 103322, 2019 12.
Article en En | MEDLINE | ID: mdl-31672532
ABSTRACT

OBJECTIVE:

With its increasingly widespread adoption, electronic health records (EHR) have enabled phenotypic information extraction at an unprecedented granularity and scale. However, often a medical concept (e.g. diagnosis, prescription, symptom) is described in various synonyms across different EHR systems, hindering data integration for signal enhancement and complicating dimensionality reduction for knowledge discovery. Despite existing ontologies and hierarchies, tremendous human effort is needed for curation and maintenance - a process that is both unscalable and susceptible to subjective biases. This paper aims to develop a data-driven approach to automate grouping medical terms into clinically relevant concepts by combining multiple up-to-date data sources in an unbiased manner.

METHODS:

We present a novel data-driven grouping approach - multi-view banded spectral clustering (mvBSC) combining summary data from multiple healthcare systems. The proposed method consists of a banding step that leverages the prior knowledge from the existing coding hierarchy, and a combining step that performs spectral clustering on an optimally weighted matrix.

RESULTS:

We apply the proposed method to group ICD-9 and ICD-10-CM codes together by integrating data from two healthcare systems. We show grouping results and hierarchies for 13 representative disease categories. Individual grouping qualities were evaluated using normalized mutual information, adjusted Rand index, and F1-measure, and were found to consistently exhibit great similarity to the existing manual grouping counterpart. The resulting ICD groupings also enjoy comparable interpretability and are well aligned with the current ICD hierarchy.

CONCLUSION:

The proposed approach, by systematically leveraging multiple data sources, is able to overcome bias while maximizing consensus to achieve generalizability. It has the advantage of being efficient, scalable, and adaptive to the evolving human knowledge reflected in the data, showing a significant step toward automating medical knowledge integration.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Clasificación Internacional de Enfermedades / Registros Electrónicos de Salud Tipo de estudio: Guideline Límite: Humans Idioma: En Revista: J Biomed Inform Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Clasificación Internacional de Enfermedades / Registros Electrónicos de Salud Tipo de estudio: Guideline Límite: Humans Idioma: En Revista: J Biomed Inform Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article