Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale.

Sarkar, Jnanendra Prasad; Saha, Indrajit; Lancucki, Adrian; Ghosh, Nimisha; Wlasnowolski, Michal; Bokota, Grzegorz; Dey, Ashmita; Lipinski, Piotr; Plewczynski, Dariusz

Sarkar, Jnanendra Prasad; Saha, Indrajit; Lancucki, Adrian; Ghosh, Nimisha; Wlasnowolski, Michal; Bokota, Grzegorz; Dey, Ashmita; Lipinski, Piotr; Plewczynski, Dariusz.

Afiliação

Sarkar JP; Data, Analytics & AI, Larsen & Toubro Infotech Ltd., Pune, India.
Saha I; Department of Computer Science & Engineering, Jadavpur University, Kolkata, India.
Lancucki A; Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India.
Ghosh N; Computational Intelligence Research Group, Institute of Computer Science, University of Wroclaw, Wroclaw, Poland.
Wlasnowolski M; Department of Computer Science and Information Technology, SOA University, Bhubaneshwar, India.
Bokota G; Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
Dey A; Institute of Informatics, University of Warsaw, Warsaw, Poland.
Lipinski P; Centre of New Technologies, University of Warsaw, Warsaw, Poland.
Plewczynski D; Department of Computer Science & Engineering, Jadavpur University, Kolkata, India.

Front Genet ; 11: 982, 2020.

Article em En | MEDLINE | ID: mdl-33281862

RESUMO

Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clustering and classification techniques for diverse cancer types with survival analysis via regression to identify miRNAs that can potentially play a crucial role in the prediction of different types of tumors. Our method has two parts. The first part is a feature selection procedure, called the stochastic covariance evolutionary strategy with forward selection (SCES-FS), which is developed by integrating stochastic neighbor embedding (SNE), the covariance matrix adaptation evolutionary strategy (CMA-ES), and classifiers, with the primary objective of selecting biomarkers. SNE is used to reorder the features by performing an implicit clustering with highly correlated neighboring features. A subset of features is selected heuristically to perform multi-class classification for diverse cancer types. In the second part of our method, the most important features identified in the first part are used to perform survival analysis via Cox regression, primarily to examine the effectiveness of the selected features. For this purpose, we have analyzed next generation sequencing data from The Cancer Genome Atlas in form of miRNA expression of 1,707 samples of 10 different cancer types and 333 normal samples. The SCES-FS method is compared with well-known feature selection methods and it is found to perform better in multi-class classification for the 17 selected miRNAs, achieving an accuracy of 96%. Moreover, the biological significance of the selected miRNAs is demonstrated with the help of network analysis, expression analysis using hierarchical clustering, KEGG pathway analysis, GO enrichment analysis, and protein-protein interaction analysis. Overall, the results indicate that the 17 selected miRNAs are associated with many key cancer regulators, such as MYC, VEGFA, AKT1, CDKN1A, RHOA, and PTEN, through their targets. Therefore the selected miRNAs can be regarded as putative biomarkers for 10 types of cancer.

Palavras-chave

KEGG pathway; cancer; cox regression; feature selection; gene ontology; machine learning; next generation sequencing; stochastic neighbor embedding

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article