Tumor type classification and candidate cancer-specific biomarkers discovery via semi-supervised learning.

Chen, Peng; Li, Zhenlei; Hong, Zhaolin; Zheng, Haoran; Zeng, Rong

Chen, Peng; Li, Zhenlei; Hong, Zhaolin; Zheng, Haoran; Zeng, Rong.

Afiliação

Chen P; School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China.
Li Z; School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China.
Hong Z; School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China.
Zheng H; School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China.
Zeng R; Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Hefei 230026, China.

Biophys Rep ; 9(2): 57-66, 2023 Apr 30.

Article em En | MEDLINE | ID: mdl-37753058

ABSTRACT

ABSTRACT

Identifying cancer-related differentially expressed genes provides significant information for diagnosing tumors, predicting prognoses, and effective treatments. Recently, deep learning methods have been used to perform gene differential expression analysis using microarray-based high-throughput gene profiling and have achieved good results. In this study, we proposed a new robust multiple-datasets-based semi-supervised learning model, MSSL, to perform tumor type classification and candidate cancer-specific biomarkers discovery across multiple tumor types and multiple datasets, which addressed the following long-lasting obstacles (1) the data volume of the existing single dataset is not enough to fully exert the advantages of deep learning; (2) a large number of datasets from different research institutions cannot be effectively used due to inconsistent internal variances and low quality; (3) relatively uncommon cancers have limited effects on deep learning methods. In our article, we applied MSSL to The Cancer Genome Atlas (TCGA) and the Gene Expression Comprehensive Database (GEO) pan-cancer normalized-level3 RNA-seq data and got 97.6% final classification accuracy, which had a significant performance leap compared with previous approaches. Finally, we got the ranking of the importance of the corresponding genes for each cancer type based on classification results and validated that the top genes selected in this way were biologically meaningful for corresponding tumors and some of them had been used as biomarkers, which showed the efficacy of our method.

Palavras-chave

Cancer-specific biomarkers; Deep learning; MSSL; Tumor type classification

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article