Your browser doesn't support javascript.
loading
A hybrid feature selection algorithm and its application in bioinformatics.
Wang, Yangyang; Gao, Xiaoguang; Ru, Xinxin; Sun, Pengzhan; Wang, Jihan.
Afiliação
  • Wang Y; School of Electronics and Information, Northwestern Polytechnical University, Xi'an, Shaanxi, China.
  • Gao X; School of Electronics and Information, Northwestern Polytechnical University, Xi'an, Shaanxi, China.
  • Ru X; School of Electronics and Information, Northwestern Polytechnical University, Xi'an, Shaanxi, China.
  • Sun P; School of Electronics and Information, Northwestern Polytechnical University, Xi'an, Shaanxi, China.
  • Wang J; Institute of Medical Research, Northwestern Polytechnical University, Xi'an, Shaanxi, China.
PeerJ Comput Sci ; 8: e933, 2022.
Article em En | MEDLINE | ID: mdl-35494789
ABSTRACT
Feature selection is an independent technology for high-dimensional datasets that has been widely applied in a variety of fields. With the vast expansion of information, such as bioinformatics data, there has been an urgent need to investigate more effective and accurate methods involving feature selection in recent decades. Here, we proposed the hybrid MMPSO method, by combining the feature ranking method and the heuristic search method, to obtain an optimal subset that can be used for higher classification accuracy. In this study, ten datasets obtained from the UCI Machine Learning Repository were analyzed to demonstrate the superiority of our method. The MMPSO algorithm outperformed other algorithms in terms of classification accuracy while utilizing the same number of features. Then we applied the method to a biological dataset containing gene expression information about liver hepatocellular carcinoma (LIHC) samples obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). On the basis of the MMPSO algorithm, we identified a 18-gene signature that performed well in distinguishing normal samples from tumours. Nine of the 18 differentially expressed genes were significantly up-regulated in LIHC tumour samples, and the area under curves (AUC) of the combination seven genes (ADRA2B, ERAP2, NPC1L1, PLVAP, POMC, PYROXD2, TRIM29) in classifying tumours with normal samples was greater than 0.99. Six genes (ADRA2B, PYROXD2, CACHD1, FKBP1B, PRKD1 and RPL7AP6) were significantly correlated with survival time. The MMPSO algorithm can be used to effectively extract features from a high-dimensional dataset, which will provide new clues for identifying biomarkers or therapeutic targets from biological data and more perspectives in tumor research.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: PeerJ Comput Sci Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: PeerJ Comput Sci Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China