Your browser doesn't support javascript.
loading
ICD2Vec: Mathematical representation of diseases.
Lee, Yeong Chan; Jung, Sang-Hyuk; Kumar, Aman; Shim, Injeong; Song, Minku; Kim, Min Seo; Kim, Kyunga; Myung, Woojae; Park, Woong-Yang; Won, Hong-Hee.
Afiliação
  • Lee YC; Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea; Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
  • Jung SH; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.
  • Kumar A; Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India.
  • Shim I; Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea.
  • Song M; Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea.
  • Kim MS; Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea.
  • Kim K; Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea; Statistics and Data Center, Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea.
  • Myung W; Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
  • Park WY; Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea.
  • Won HH; Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea; Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea. Electronic address: wonhh@skku.edu.
J Biomed Inform ; 141: 104361, 2023 05.
Article em En | MEDLINE | ID: mdl-37054960
ABSTRACT

BACKGROUND:

The International Classification of Diseases (ICD) codes represent the global standard for reporting disease conditions. The current ICD codes connote direct human-defined relationships among diseases in a hierarchical tree structure. Representing the ICD codes as mathematical vectors helps to capture nonlinear relationships in medical ontologies across diseases.

METHODS:

We propose a universally applicable framework called "ICD2Vec" designed to provide mathematical representations of diseases by encoding corresponding information. First, we present the arithmetical and semantic relationships between diseases by mapping composite vectors for symptoms or diseases to the most similar ICD codes. Second, we investigated the validity of ICD2Vec by comparing the biological relationships and cosine similarities among the vectorized ICD codes. Third, we propose a new risk score called IRIS, derived from ICD2Vec, and demonstrate its clinical utility with large cohorts from the UK and South Korea.

RESULTS:

Semantic compositionality was qualitatively confirmed between descriptions of symptoms and ICD2Vec. For example, the diseases most similar to COVID-19 were found to be the common cold (ICD-10 J00), unspecified viral hemorrhagic fever (ICD-10 A99), and smallpox (ICD-10 B03). We show the significant associations between the cosine similarities derived from ICD2Vec and the biological relationships using disease-to-disease pairs. Furthermore, we observed significant adjusted hazard ratios (HR) and area under the receiver operating characteristics (AUROC) between IRIS and risks for eight diseases. For instance, the higher IRIS for coronary artery disease (CAD) can be the higher probability for the incidence of CAD (HR 2.15 [95% CI 2.02-2.28] and AUROC 0.587 [95% CI 0.583-0.591]). We identified individuals at substantially increased risk of CAD using IRIS and 10-year atherosclerotic cardiovascular disease risk (adjusted HR 4.26 [95% CI 3.59-5.05]).

CONCLUSIONS:

ICD2Vec, a proposed universal framework for converting qualitatively measured ICD codes into quantitative vectors containing semantic relationships between diseases, exhibited a significant correlation with actual biological significance. In addition, the IRIS was a significant predictor of major diseases in a prospective study using two large-scale datasets. Based on this clinical validity and utility evidence, we suggest that publicly available ICD2Vec can be used in diverse research and clinical practices and has important clinical implications.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Doença da Artéria Coronariana / COVID-19 Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Doença da Artéria Coronariana / COVID-19 Idioma: En Ano de publicação: 2023 Tipo de documento: Article