Your browser doesn't support javascript.
loading
3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection.
Mallik, Saurav; Sarkar, Anasua; Nath, Sagnik; Maulik, Ujjwal; Das, Supantha; Pati, Soumen Kumar; Ghosh, Soumadip; Zhao, Zhongming.
Afiliação
  • Mallik S; Department of Environmental Health, Harvard T H Chan School of public Health, Boston, MA, United States.
  • Sarkar A; Department of Computer Science & Engineering, Jadavpur University, Kolkata, India.
  • Nath S; Department of Computer Science & Engineering, Jadavpur University, Kolkata, India.
  • Maulik U; Department of Computer Science & Engineering, Jadavpur University, Kolkata, India.
  • Das S; Department of Information Technology, Academy of Technology, Hooghly, West Bengal, India.
  • Pati SK; Department of Bioinformatics, Maulana Abul Kalam Azad University, Kolkata, West Bengal, India.
  • Ghosh S; Department of Computer Science & Engineering, Sister Nivedita University, New Town, West Bengal, India.
  • Zhao Z; Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States.
Front Genet ; 14: 1095330, 2023.
Article em En | MEDLINE | ID: mdl-36865387
In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Front Genet Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Front Genet Ano de publicação: 2023 Tipo de documento: Article