Deep learning approach for cancer subtype classification using high-dimensional gene expression data.

Shen, Jiquan; Shi, Jiawei; Luo, Junwei; Zhai, Haixia; Liu, Xiaoyan; Wu, Zhengjiang; Yan, Chaokun; Luo, Huimin

Shen, Jiquan; Shi, Jiawei; Luo, Junwei; Zhai, Haixia; Liu, Xiaoyan; Wu, Zhengjiang; Yan, Chaokun; Luo, Huimin.

Affiliation

Shen J; School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
Shi J; School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
Luo J; School of Software, Henan Polytechnic University, Jiaozuo, 454003, China. luojunwei@hpu.edu.cn.
Zhai H; School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
Liu X; School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
Wu Z; School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
Yan C; School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China.
Luo H; School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China.

BMC Bioinformatics ; 23(1): 430, 2022 Oct 17.

Article in En | MEDLINE | ID: mdl-36253710

ABSTRACT

MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. RESULTS: In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results.

Subject(s)
Key words

Cancer subtype; Classification; Deep learning

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Deep Learning / Neoplasms Type of study: Prognostic_studies Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2022 Document type: Article Affiliation country: China Country of publication: Reino Unido

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google