RESUMO
Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.
Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Aprendizado de Máquina , Neoplasias/genética , Animais , Análise por Conglomerados , Bases de Dados Factuais/estatística & dados numéricos , Perfilação da Expressão Gênica/classificação , Regulação Neoplásica da Expressão Gênica , Humanos , Internet , Modelos Genéticos , Mutação , Neoplasias/classificação , Reprodutibilidade dos TestesRESUMO
In biomedical data mining, the gene dimension is often much larger than the sample size. To solve this problem, we need to use a feature selection algorithm to select feature gene subsets with a strong correlation with phenotype to ensure the accuracy of subsequent analysis. This paper presents a new three-stage hybrid feature gene selection method, that combines a variance filter, extremely randomized tree, and whale optimization algorithm. First, a variance filter is used to reduce the dimension of the feature gene space, and an extremely randomized tree is used to further reduce the feature gene set. Finally, the whale optimization algorithm is used to select the optimal feature gene subset. We evaluate the proposed method with three different classifiers in seven published gene expression profile datasets and compare it with other advanced feature selection algorithms. The results show that the proposed method has significant advantages in a variety of evaluation indicators.
Assuntos
Algoritmos , Baleias , Animais , Mineração de Dados , Fenótipo , Tamanho da AmostraRESUMO
Background: Gene expression data are often used to classify cancer genes. In such high-dimensional datasets, however, only a few feature genes are closely related to tumors. Therefore, it is important to accurately select a subset of feature genes with high contributions to cancer classification. Methods: In this article, a new three-stage hybrid gene selection method is proposed that combines a variance filter, extremely randomized tree and Harris Hawks (VEH). In the first stage, we evaluated each gene in the dataset through the variance filter and selected the feature genes that meet the variance threshold. In the second stage, we use extremely randomized tree to further eliminate irrelevant genes. Finally, we used the Harris Hawks algorithm to select the gene subset from the previous two stages to obtain the optimal feature gene subset. Results: We evaluated the proposed method using three different classifiers on eight published microarray gene expression datasets. The results showed a 100% classification accuracy for VEH in gastric cancer, acute lymphoblastic leukemia and ovarian cancer, and an average classification accuracy of 95.33% across a variety of other cancers. Compared with other advanced feature selection algorithms, VEH has obvious advantages when measured by many evaluation criteria.
RESUMO
RNA modification is a key regulatory mechanism involved in tumorigenesis, tumor progression, and the immune response. However, the potential role of RNA modification "writer" genes in the immune microenvironment of gliomas and their effect on the response to immunotherapy remains unclear. The purpose of this study was to evaluate the role of RNA modification "writer" gene in the prognosis and immunotherapy response of low-grade glioma (LGG). The consensus non-negative matrix factorization (CNMF) method was used to identify different RNA modification subtypes. We used a novel eigengene screening method, the variable neighborhood learning Harris Hawks optimizer (VNLHHO), to screen for eigengenes among the RNA modification subtypes. We constructed a principal components analysis score(PCA_score)-based prognostic prediction model and validated it using an independent cohort. We also analyzed the association between PCA_score and the immune and molecular features of LGG. The results suggested that LGG can be divided into two different RNA modification-based subtypes with distinct prognostic and molecular features. High PCA_score was significantly associated with a poor prognosis in LGG and was an independent prognostic factor. A nomogram containing PCA_score and clinical features was constructed, and it showed a significant predictive value. PCA_score was negatively correlated with tumor purity and the abundance of CD4+ T cells in LGG patients. LGG patients with high PCA_score had lower Tumor Immune Dysfunction and Exclusion scores and showed an immunotherapy response. In conclusion, we report a novel RNA modification-based prognostic model for LGG that lays the foundation for evaluating LGG prognosis and developing more effective therapeutic strategies for these tumors.
Assuntos
Glioma , Humanos , Glioma/diagnóstico , Glioma/genética , Glioma/terapia , Imunoterapia , Nomogramas , Prognóstico , RNA , Microambiente Tumoral/genéticaRESUMO
Abundant evidence has indicated that the prognosis of cutaneous melanoma (CM) patients is highly complicated by the tumour immune microenvironment. We retrieved the clinical data and gene expression data of CM patients in The Cancer Genome Atlas (TCGA) database for modelling and validation analysis. Based on single-sample gene set enrichment analysis (ssGSEA) and consensus clustering analysis, CM patients were classified into three immune level groups, and the differences in the tumour immune microenvironment and clinical characteristics were evaluated. Seven immune-related CM prognostic molecules, including three mRNAs (SUCO, BTN3A1 and TBC1D2), three lncRNAs (HLA-DQB1-AS1, C9orf139 and C22orf34) and one miRNA (hsa-miR-17-5p), were screened by differential expression analysis, ceRNA network analysis, LASSO Cox regression analysis and univariate Cox regression analysis. Their biological functions were mainly concentrated in the phospholipid metabolic process, transcription regulator complex, protein serine/threonine kinase activity and MAPK signalling pathway. We established a novel prognostic model for CM integrating clinical variables and immune molecules that showed promising predictive performance demonstrated by receiver operating characteristic curves (AUC ≥ 0.74), providing a scientific basis for predicting the prognosis and improving the clinical outcomes of CM patients.
Assuntos
Melanoma , Neoplasias Cutâneas , Humanos , Melanoma/genética , Prognóstico , Neoplasias Cutâneas/genética , Biomarcadores Tumorais/genética , Microambiente Tumoral/genética , Butirofilinas , Antígenos CD , Melanoma Maligno CutâneoRESUMO
Due to the difficulty in predicting the prognosis of endometrial carcinoma (EC) patients by clinical variables alone, this study aims to build a new EC prognosis model integrating clinical and molecular information, so as to improve the accuracy of predicting the prognosis of EC. The clinical and gene expression data of 496 EC patients in the TCGA database were used to establish and validate this model. General Cox regression was applied to analyze clinical variables and RNAs. Elastic net-penalized Cox proportional hazard regression was employed to select the best EC prognosis-related RNAs, and ridge regression was used to construct the EC prognostic model. The predictive ability of the prognostic model was evaluated by the Kaplan-Meier curve and the area under the receiver operating characteristic curve (AUC-ROC). A clinical-RNA prognostic model integrating two clinical variables and 28 RNAs was established. The 5-year AUC of the clinical-RNA prognostic model was 0.932, which is higher than that of the clinical-alone (0.897) or RNA-alone prognostic model (0.836). This clinical-RNA prognostic model can better classify the prognosis risk of EC patients. In the training group (396 patients), the overall survival of EC patients was lower in the high-risk group than in the low-risk group [HR = 32.263, (95% CI, 7.707-135.058), P = 8e-14]. The same comparison result was also observed for the validation group. A novel EC prognosis model integrating clinical variables and RNAs was established, which can better predict the prognosis and help to improve the clinical management of EC patients.
RESUMO
For the deficiency of the basic sine-cosine algorithm in dealing with global optimization problems such as the low solution precision and the slow convergence speed, a new improved sine-cosine algorithm is proposed in this paper. The improvement involves three optimization strategies. Firstly, the method of exponential decreasing conversion parameter and linear decreasing inertia weight is adopted to balance the global exploration and local development ability of the algorithm. Secondly, it uses the random individuals near the optimal individuals to replace the optimal individuals in the primary algorithm, which allows the algorithm to easily jump out of the local optimum and increases the search range effectively. Finally, the greedy Levy mutation strategy is used for the optimal individuals to enhance the local development ability of the algorithm. The experimental results show that the proposed algorithm can effectively avoid falling into the local optimum, and it has faster convergence speed and higher optimization accuracy.