A unified graph model based on molecular data binning for disease subtyping.

Hassan Zada, Muhammad Sadiq; Yuan, Bo; Khan, Wajahat Ali; Anjum, Ashiq; Reiff-Marganiec, Stephan; Saleem, Rabia

Hassan Zada, Muhammad Sadiq; Yuan, Bo; Khan, Wajahat Ali; Anjum, Ashiq; Reiff-Marganiec, Stephan; Saleem, Rabia.

Afiliação

Hassan Zada MS; School of Computing and Engineering, University of Derby, United Kingdom. Electronic address: m.hassanzada@derby.ac.uk.
Yuan B; School of Computing and Mathematical Sciences, University of Leicester, United Kingdom. Electronic address: b.yuan@leicester.ac.uk.
Khan WA; School of Computing and Engineering, University of Derby, United Kingdom. Electronic address: w.khan@derby.ac.uk.
Anjum A; School of Computing and Mathematical Sciences, University of Leicester, United Kingdom. Electronic address: aa1180@leicester.ac.uk.
Reiff-Marganiec S; School of Computing and Engineering, University of Derby, United Kingdom. Electronic address: s.reiff-marganiec@derby.ac.uk.
Saleem R; School of Computing and Engineering, University of Derby, United Kingdom. Electronic address: r.saleem@derby.ac.uk.

J Biomed Inform ; 134: 104187, 2022 10.

Article em En | MEDLINE | ID: mdl-36055637

RESUMO

Molecular disease subtype discovery from omics data is an important research problem in precision medicine. The biggest challenges are the skewed distribution and data variability in the measurements of omics data. These challenges complicate the efficient identification of molecular disease subtypes defined by clinical differences, such as survival. Existing approaches adopt kernels to construct patient similarity graphs from each view through pairwise matching. However, the distance functions used in kernels are unable to utilize the potentially critical information of extreme values and data variability which leads to the lack of robustness. In this paper, a novel robust distance metric (ROMDEX) is proposed to construct similarity graphs for molecular disease subtypes from omics data, which is able to address the data variability and extreme values challenges. The proposed approach is validated on multiple TCGA cancer datasets, and the results are compared with multiple baseline disease subtyping methods. The evaluation of results is based on Kaplan-Meier survival time analysis, which is validated using statistical tests e.g, Cox-proportional hazard (Cox p-value). We reject the null hypothesis that the cohorts have the same hazard, for the P-values less than 0.05. The proposed approach achieved best P-values of 0.00181, 0.00171, and 0.00758 for Gene Expression, DNA Methylation, and MicroRNA data respectively, which shows significant difference in survival between the cohorts. In the results, the proposed approach outperformed the existing state-of-the-art (MRGC, PINS, SNF, Consensus Clustering and Icluster+) disease subtyping approaches on various individual disease views of multiple TCGA datasets.

Assuntos

MicroRNAs; Neoplasias; Análise por Conglomerados; Humanos; Estimativa de Kaplan-Meier; MicroRNAs/genética; Neoplasias/diagnóstico; Neoplasias/genética; Medicina de Precisão

Palavras-chave

Clustering analysis; Disease subtyping; Graph modelling; Patient similarity; Robust statistics; Similarity kernels

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: MicroRNAs / Neoplasias Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: MicroRNAs / Neoplasias Idioma: En Ano de publicação: 2022 Tipo de documento: Article