Your browser doesn't support javascript.
loading
Construction of a diagnostic classifier for cervical intraepithelial neoplasia and cervical cancer based on XGBoost feature selection and random forest model.
Zhang, Jing; Yang, Xiuqing; Chen, Jia; Han, Jing; Chen, Xiaofeng; Fan, Yueping; Zheng, Hui.
Afiliação
  • Zhang J; Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China.
  • Yang X; Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China.
  • Chen J; Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China.
  • Han J; Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China.
  • Chen X; Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China.
  • Fan Y; Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China.
  • Zheng H; Department of Gynaecology and Obstetrics, Jiangsu Xiangshui Hospital of Chinese Medicine, Yancheng, Jiangsu, China.
J Obstet Gynaecol Res ; 49(1): 296-303, 2023 Jan.
Article em En | MEDLINE | ID: mdl-36220631
BACKGROUND: The pathological phenotype of early-stage cervical cancer (CC) is similar to that of cervical intraepithelial neoplasia (CIN), which provides a challenge for the diagnosis of cervical precancerous lesions. Meanwhile, the existing diagnostic methods have certain subjectivity and limitations, resulting in the possibility of misdiagnosis or missed diagnosis. Hence, some methods are needed to assist diagnosis of CC and CIN. METHODS: Based on the data of CIN and CC in gene expression omnibus (GEO) dataset, the eXtreme Gradient Boosting (XGBoost) algorithm was used to screen the feature genes between CIN and CC for constructing the classifier. Incremental feature selection (IFS) curve was also used for screening. The classifier was validated for reliability using principal component analysis (PCA) dimensionality reduction analysis and heat map analysis of gene expression. Then, differentially expressed genes of CIN and CC were intersected with the classifier genes. Genes in the intersection were used as seeds for protein-protein interaction network construction and restart random walk analysis. And the genes with the top 50 affinity coefficients were selected for gene ontology (GO) and kyoto encyclopedia of genes and genome (KEGG) enrichment analyses to observe the biological functions with differences between CIN and CC. RESULTS: The peripheral blood genes of CIN and CC were analyzed, and seven genes were screened. Using this gene for classifier construction, IFS curve screening revealed that the three-feature gene classifier constructed according to the random forest model had the best effect. The results of PCA dimensionality reduction analysis and gene expression heat map analysis showed that the three-gene classifier could effectively distinguish CIN from CC. CONCLUSION: A three-gene diagnostic classifier can effectively distinguish CIN patients from CC patients and provide a reference for the clinical diagnosis of early CC.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Displasia do Colo do Útero / Neoplasias do Colo do Útero Tipo de estudo: Clinical_trials / Diagnostic_studies Limite: Female / Humans Idioma: En Revista: J Obstet Gynaecol Res Assunto da revista: GINECOLOGIA / OBSTETRICIA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Displasia do Colo do Útero / Neoplasias do Colo do Útero Tipo de estudo: Clinical_trials / Diagnostic_studies Limite: Female / Humans Idioma: En Revista: J Obstet Gynaecol Res Assunto da revista: GINECOLOGIA / OBSTETRICIA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: China