Your browser doesn't support javascript.
loading
A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery.
Rintala, Teemu J; Federico, Antonio; Latonen, Leena; Greco, Dario; Fortino, Vittorio.
  • Rintala TJ; Institute of Biomedicine University of Eastern Finland, Yliopistonranta 1 E, 70210 Kuopio, Finland.
  • Federico A; Faculty of Medicine and Health Technology Tampere University, Kalevantie, 4 33100 Tampere, Finland.
  • Latonen L; BioMediTech Institute Tampere University, Kalevantie 4, 33100 Tampere, Finland.
  • Greco D; Institute of Biomedicine University of Eastern Finland, Yliopistonranta 1 E, 70210 Kuopio, Finland.
  • Fortino V; Faculty of Medicine and Health Technology Tampere University, Kalevantie, 4 33100 Tampere, Finland.
Brief Bioinform ; 22(6)2021 11 05.
Article en En | MEDLINE | ID: mdl-34396389
ABSTRACT
Typical clustering analysis for large-scale genomics data combines two unsupervised learning techniques dimensionality reduction and clustering (DR-CL) methods. It has been demonstrated that transforming gene expression to pathway-level information can improve the robustness and interpretability of disease grouping results. This approach, referred to as biological knowledge-driven clustering (BK-CL) approach, is often neglected, due to a lack of tools enabling systematic comparisons with more established DR-based methods. Moreover, classic clustering metrics based on group separability tend to favor the DR-CL paradigm, which may increase the risk of identifying less actionable disease subtypes that have ambiguous biological and clinical explanations. Hence, there is a need for developing metrics that assess biological and clinical relevance. To facilitate the systematic analysis of BK-CL methods, we propose a computational protocol for quantitative analysis of clustering results derived from both DR-CL and BK-CL methods. Moreover, we propose a new BK-CL method that combines prior knowledge of disease relevant genes, network diffusion algorithms and gene set enrichment analysis to generate robust pathway-level information. Benchmarking studies were conducted to compare the grouping results from different DR-CL and BK-CL approaches with respect to standard clustering evaluation metrics, concordance with known subtypes, association with clinical outcomes and disease modules in co-expression networks of genes. No single approach dominated every metric, showing the importance multi-objective evaluation in clustering analysis. However, we demonstrated that, on gene expression data sets derived from TCGA samples, the BK-CL approach can find groupings that provide significant prognostic value in both breast and prostate cancers.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Biomarcadores / Biología Computacional / Susceptibilidad a Enfermedades / Minería de Datos Tipo de estudio: Guideline / Prognostic_studies Límite: Humans Idioma: En Año: 2021 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Biomarcadores / Biología Computacional / Susceptibilidad a Enfermedades / Minería de Datos Tipo de estudio: Guideline / Prognostic_studies Límite: Humans Idioma: En Año: 2021 Tipo del documento: Article