RESUMO
Several diverse fields including the healthcare system and drug development sectors have benefited immensely through the adoption of deep learning (DL), which is a subset of artificial intelligence (AI) and machine learning (ML). Cancer makes up a significant percentage of the illnesses that cause early human mortality across the globe, and this situation is likely to rise in the coming years, especially when non-communicable illnesses are not considered. As a result, cancer patients would greatly benefit from precise and timely diagnosis and prediction. Deep learning (DL) has become a common technique in healthcare due to the abundance of computational power. Gene expression datasets are frequently used in major DL-based applications for illness detection, notably in cancer therapy. The quantity of medical data, on the other hand, is often insufficient to fulfill deep learning requirements. Microarray gene expression datasets are used for training procedures despite their extreme dimensionality, limited volume of data samples, and sparsely available information. Data augmentation is commonly used to expand the training sample size for gene data. The Wasserstein Tabular Generative Adversarial Network (WT-GAN) model is used for the data augmentation process for generating synthetic data in this proposed work. The correlation-based feature selection technique selects the most relevant characteristics based on threshold values. Deep FNN and ML algorithms train and classify the gene expression samples. The augmented data give better classification results (> 97%) when using WT-GAN for cancer diagnosis.
Assuntos
Aprendizado Profundo , Humanos , Neoplasias/genética , Perfilação da Expressão Gênica , Redes Neurais de ComputaçãoRESUMO
BACKGROUND: Protein kinase CK2 activity is implicated in the pathogenesis of various hematological malignancies like Acute Myeloid Leukemia (AML) that remains challenging concerning treatment. This kinase has emerged as an attractive molecular target in therapeutic. Antitumoral peptide CIGB-300 blocks CK2 phospho-acceptor sites on their substrates but it also binds to CK2α catalytic subunit. Previous proteomic and phosphoproteomic experiments showed molecular and cellular processes with relevance for the peptide action in diverse AML backgrounds but earlier transcriptional level events might also support the CIGB-300 anti-leukemic effect. Here we used a Clariom S HT assay for gene expression profiling to study the molecular events supporting the anti-leukemic effect of CIGB-300 peptide on HL-60 and OCI-AML3 cell lines. RESULTS: We found 183 and 802 genes appeared significantly modulated in HL-60 cells at 30 min and 3 h of incubation with CIGB-300 for p < 0.01 and FC > = â1.5â, respectively; while 221 and 332 genes appeared modulated in OCI-AML3 cells. Importantly, functional enrichment analysis evidenced that genes and transcription factors related to apoptosis, cell cycle, leukocyte differentiation, signaling by cytokines/interleukins, and NF-kB, TNF signaling pathways were significantly represented in AML cells transcriptomic profiles. The influence of CIGB-300 on these biological processes and pathways is dependent on the cellular background, in the first place, and treatment duration. Of note, the impact of the peptide on NF-kB signaling was corroborated by the quantification of selected NF-kB target genes, as well as the measurement of p50 binding activity and soluble TNF-α induction. Quantification of CSF1/M-CSF and CDKN1A/P21 by qPCR supports peptide effects on differentiation and cell cycle. CONCLUSIONS: We explored for the first time the temporal dynamics of the gene expression profile regulated by CIGB-300 which, along with the antiproliferative mechanism, can stimulate immune responses by increasing immunomodulatory cytokines. We provided fresh molecular clues concerning the antiproliferative effect of CIGB-300 in two relevant AML backgrounds.
Assuntos
Leucemia Mieloide Aguda , Transcriptoma , Humanos , Linhagem Celular Tumoral , NF-kappa B , Proteômica , Peptídeos/farmacologia , Perfilação da Expressão Gênica , Apoptose , Leucemia Mieloide Aguda/genética , CitocinasRESUMO
BACKGROUND: Most eukaryotic genes produce different transcripts of multiple isoforms by inclusion or exclusion of particular exons. The isoforms of a gene often play diverse functional roles, and thus it is necessary to accurately measure isoform expressions as well as gene expressions. While previous studies have demonstrated the strong agreement between mRNA sequencing (RNA-seq) and array-based gene and/or isoform quantification platforms (Microarray gene expression and Exon-array), the more recently developed NanoString platform has not been systematically evaluated and compared, especially in large-scale studies across different cancer domains. RESULTS: In this paper, we present a large-scale comparative study among RNA-seq, NanoString, array-based, and RT-qPCR platforms using 46 cancer cell lines across different cancer types. The goal is to understand and evaluate the calibers of the platforms for measuring gene and isoform expressions in cancer studies. We first performed NanoString experiments on 59 cancer cell lines with 404 custom-designed probes for measuring the expressions of 478 isoforms in 155 genes, and additional RT-qPCR experiments for a subset of the measured isoforms in 13 cell lines. We then combined the data with the matched RNA-seq, Exon-array, and Microarray data of 46 of the 59 cell lines for the comparative analysis. CONCLUSION: In the comparisons of the platforms for measuring the expressions at both isoform and gene levels, we found that (1) the agreement on isoform expressions is lower than the agreement on gene expressions across the four platforms; (2) NanoString and Exon-array are not consistent on isoform quantification even though both techniques are based on hybridization reactions; (3) RT-qPCR experiments are more consistent with RNA-seq and Exon-array than NanoString in isoform quantification; (4) different RNA-seq isoform quantification methods show varying estimation results, and among the methods, Net-RSTQ and eXpress are more consistent across the platforms; and (5) RNA-seq has the best overall consistency with the other platforms on gene expression quantification.
Assuntos
Perfilação da Expressão Gênica/métodos , Algoritmos , Éxons/genética , Éxons/fisiologia , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Análise de Sequência de RNA/métodos , SoftwareRESUMO
The evaluation of intra-tumour heterogeneity (ITH) from a transcriptomic point of view is limited. Single-cell cancer studies reveal significant genomic and transcriptomic ITH within a tumour and it is no longer adequate to employ single-subtype assignment as this does not acknowledge the ITH that exists. Molecular assessment of subtype heterogeneity (MASH) was developed to comprehensively report on the composition of all transcriptomic subtypes within a tumour lesion. Using MASH on 3431 ovarian cancer samples, correlation and association analyses with survival, metastasis and clinical outcomes were performed to assess the impact of subtype composition as a surrogate for ITH. The association was validated on two independent cohorts. We identified that 30% of ovarian tumours consist of two or more subtypes. When biological features of the subtype constituents were examined, we identified significant impact on clinical outcomes with the presence of poor prognostic subtypes (Mes or Stem-A). Poorer outcomes correlated with having higher degrees of poor prognostic subtype populations within the tumour. Subtype prediction in several independent datasets reflected a similar prognostic trend. In addition, paired analysis of primary and recurrent/metastatic tumours demonstrated Mes and/or Stem-A subtypes predominated in recurrent and metastatic tumours regardless of the original primary subtype. Given the biological and prognostic value in delineating individual subtypes within a tumour, a clinically applicable MASH assay using NanoString® technology was developed as a classification tool to comprehensively describe constituents of molecular subtypes. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Assuntos
Neoplasias Ovarianas/genética , Medicina de Precisão/métodos , Transcriptoma , Adulto , Idoso , Feminino , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Humanos , Estimativa de Kaplan-Meier , Pessoa de Meia-Idade , Gradação de Tumores , Metástase Neoplásica , Estadiamento de Neoplasias , Neoplasias Ovarianas/patologia , Prognóstico , RecidivaRESUMO
INTRODUCTION: Abnormal gene expression patterns may contribute to the onset and progression of late-onset Alzheimer's disease (LOAD). METHODS: We performed transcriptome-wide meta-analysis (N = 1440) of blood-based microarray gene expression profiles as well as neuroimaging and cerebrospinal fluid (CSF) endophenotype analysis. RESULTS: We identified and replicated five genes (CREB5, CD46, TMBIM6, IRAK3, and RPAIN) as significantly dysregulated in LOAD. The most significantly altered gene, CREB5, was also associated with brain atrophy and increased amyloid beta (Aß) accumulation, especially in the entorhinal cortex region. cis-expression quantitative trait loci mapping analysis of CREB5 detected five significant associations (P < 5 × 10-8 ), where rs56388170 (most significant) was also significantly associated with global cortical Aß deposition measured by [18 F]Florbetapir positron emission tomography and CSF Aß1-42 . DISCUSSION: RNA from peripheral blood indicated a differential gene expression pattern in LOAD. Genes identified have been implicated in biological processes relevant to Alzheimer's disease. CREB, in particular, plays a key role in nervous system development, cell survival, plasticity, and learning and memory.
Assuntos
Doença de Alzheimer/genética , Doença de Alzheimer/patologia , Proteína A de Ligação a Elemento de Resposta do AMP Cíclico/genética , Perfilação da Expressão Gênica , Idoso , Doença de Alzheimer/sangue , Peptídeos beta-Amiloides/líquido cefalorraquidiano , Peptídeos beta-Amiloides/metabolismo , Compostos de Anilina , Atrofia/patologia , Encéfalo/patologia , Córtex Entorrinal/patologia , Etilenoglicóis , Feminino , Técnicas de Genotipagem , Humanos , Masculino , Tomografia por Emissão de PósitronsRESUMO
BACKGROUND: Pulpitis is an inflammatory disease, the grade of which is classified according to the level of inflammation. Traditional methods of evaluating the status of dental pulp tissue in clinical practice have limitations. The rapid and accurate diagnosis of pulpitis is essential for determining the appropriate treatment. By integrating different datasets from the Gene Expression Omnibus (GEO) database, we analysed a merged expression matrix of pulpitis, aiming to identify biological pathways and diagnostic biomarkers of pulpitis. METHODS: By integrating two datasets (GSE77459 and GSE92681) in the GEO database using the sva and limma packages of R, differentially expressed genes (DEGs) of pulpitis were identified. Then, the DEGs were analysed to identify biological pathways of dental pulp inflammation with Gene Ontology (GO) analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and Gene Set Enrichment Analysis (GSEA). Protein-protein interaction (PPI) networks and modules were constructed to identify hub genes with the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and Cytoscape. RESULTS: A total of 470 DEGs comprising 394 upregulated and 76 downregulated genes were found in pulpitis tissue. GO analysis revealed that the DEGs were enriched in biological processes related to inflammation, and the enriched pathways in the KEGG pathway analysis were cytokine-cytokine receptor interaction, chemokine signalling pathway and NF-κB signalling pathway. The GSEA results provided further functional annotations, including complement system, IL6/JAK/STAT3 signalling pathway and inflammatory response pathways. According to the degrees of nodes in the PPI network, 10 hub genes were identified, and 8 diagnostic biomarker candidates were screened: PTPRC, CD86, CCL2, IL6, TLR8, MMP9, CXCL8 and ICAM1. CONCLUSIONS: With bioinformatics analysis of merged datasets, biomarker candidates of pulpitis were screened and the findings may be as reference to develop a new method of pulpitis diagnosis.
Assuntos
Biologia Computacional , Pulpite , Biomarcadores , Perfilação da Expressão Gênica , Ontologia Genética , Humanos , Pulpite/diagnóstico , Pulpite/genéticaRESUMO
OBJECTIVE: The aim of this study is to filter out the most informative genes that mainly regulate the target tissue class, increase classification accuracy, reduce the curse of dimensionality, and discard redundant and irrelevant genes. METHOD: This paper presented the idea of gene selection using bagging sub-forest (BSF). The proposed method provided genes importance grounded on the idea specified in the standard random forest algorithm. The new method is compared with three state-of-the art methods, i.e., Wilcoxon, masked painter and proportional overlapped score (POS). These methods were applied on 5 data sets, i.e. Colon, Lymph node breast cancer, Leukaemia, Serrated colorectal carcinomas, and Breast Cancer. Comparison was done by selecting top 20 genes by applying the gene selection methods and applying random forest (RF) and support vector machine (SVM) classifiers to assess their predictive performance on the datasets with selected genes. Classification accuracy, Brier score, and sensitivity have been used as performance measures. RESULTS: The proposed method gave better results than the other methods using both random forest and SVM classifiers on all the datasets among all the feature selection methods. CONCLUSIONS: The proposed method showed improved performance in terms of classification accuracy, Brier score and sensitivity, and hence, could be used as a novel method for gene selection to classify tissue samples into their correct classes.
Assuntos
Aprendizado de Máquina , Máquina de Vetores de Suporte , Algoritmos , Genes Reguladores , Genômica , HumanosRESUMO
BACKGROUND: Chronic periodontitis (CP) is a multifactorial inflammatory disease. For the diagnosis of CP, it is necessary to investigate molecular biomarkers and the biological pathway of CP. Although analysis of mRNA expression profiling with microarray is useful to elucidate pathological mechanisms of multifactorial diseases, it is expensive. Therefore, we utilized pooled microarray gene expression data on the basis of data sharing to reduce hybridization costs and compensate for insufficient mRNA sampling. The aim of the present study was to identify molecular biomarker candidates and biological pathways of CP using pooled datasets in the Gene Expression Omnibus (GEO) database. METHODS: Three pooled transcriptomic datasets (GSE10334, GSE16134, and GSE23586) of gingival tissue with CP in the GEO database were analyzed for differentially expressed genes (DEGs) using GEO2R, functional analysis and biological pathways with the Database of Annotation Visualization and Integrated Discovery database, Protein-Protein Interaction (PPI) network and hub gene with the Search Tool for the Retrieval of Interaction Genes database, and biomarker candidates for diagnosis and prognosis and upstream regulators of dominant biomarker candidates with the Ingenuity Pathway Analysis database. RESULTS: We shared pooled microarray datasets in the GEO database. One hundred and twenty-three common DEGs were found in gingival tissue with CP, including 81 upregulated genes and 42 downregulated genes. Upregulated genes in Gene Ontology were significantly enriched in immune responses, and those in the Kyoto Encyclopedia of Genes and Genomes pathway were significantly enriched in the cytokine-cytokine receptor interaction pathway, cell adhesion molecules, and hematopoietic cell lineage. From the PPI network, the 12 nodes with the highest degree were screened as hub genes. Additionally, six biomarker candidates for CP diagnosis and prognosis were screened. CONCLUSIONS: We identified several potential biomarkers for CP diagnosis and prognosis (e.g., CSF3, CXCL12, IL1B, MS4A1, PECAM1, and TAGLN) and upstream regulators of biomarker candidates for CP diagnosis (TNF and TGF2). We also confirmed key genes of CP pathogenesis such as CD19, IL8, CD79A, FCGR3B, SELL, CSF3, IL1B, FCGR2B, CXCL12, C3, CD53, and IL10RA. To our knowledge, this is the first report to reveal associations of CD53, CD79A, MS4A1, PECAM1, and TAGLN with CP.
Assuntos
Periodontite Crônica , Biologia Computacional , Biomarcadores , Proteínas Ligadas por GPI , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Prognóstico , Receptores de IgGRESUMO
BACKGROUND: A family of parsimonious Gaussian mixture models for the biclustering of gene expression data is introduced. Biclustering is accommodated by adopting a mixture of factor analyzers model with a binary, row-stochastic factor loadings matrix. This particular form of factor loadings matrix results in a block-diagonal covariance matrix, which is a useful property in gene expression analyses, specifically in biomarker discovery scenarios where blood can potentially act as a surrogate tissue for other less accessible tissues. Prior knowledge of the factor loadings matrix is useful in this application and is reflected in the one-way supervised nature of the algorithm. Additionally, the factor loadings matrix can be assumed to be constant across all components because of the relationship desired between the various types of tissue samples. Parameter estimates are obtained through a variant of the expectation-maximization algorithm and the best-fitting model is selected using the Bayesian information criterion. The family of models is demonstrated using simulated data and two real microarray data sets. The first real data set is from a rat study that investigated the influence of diabetes on gene expression in different tissues. The second real data set is from a human transcriptomics study that focused on blood and immune tissues. The microarray data sets illustrate the biclustering family's performance in biomarker discovery involving peripheral blood as surrogate biopsy material. RESULTS: The simulation studies indicate that the algorithm identifies the correct biclusters, most optimally when the number of observation clusters is known. Moreover, the biclustering algorithm identified biclusters comprised of biologically meaningful data related to insulin resistance and immune function in the rat and human real data sets, respectively. CONCLUSIONS: Initial results using real data show that this biclustering technique provides a novel approach for biomarker discovery by enabling blood to be used as a surrogate for hard-to-obtain tissues.
Assuntos
Bases de Dados Genéticas , Expressão Gênica , Aprendizado de Máquina Supervisionado , Transcriptoma , Animais , Teorema de Bayes , Biomarcadores/sangue , Análise por Conglomerados , Diabetes Mellitus Experimental/genética , Humanos , Masculino , Modelos Teóricos , Ratos , Ratos ZuckerRESUMO
The clinical characteristics of clear cell carcinoma (CCC) and endometrioid carcinoma EC) are concomitant with endometriosis (ES), which leads to the postulation of malignant transformation of ES to endometriosis-associated ovarian carcinoma (EAOC). Different deregulated functional areas were proposed accounting for the pathogenesis of EAOC transformation, and there is still a lack of a data-driven analysis with the accumulated experimental data in publicly-available databases to incorporate the deregulated functions involved in the malignant transformation of EOAC. We used the microarray gene expression datasets of ES, CCC and EC downloaded from the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) database. Then, we investigated the pathogenesis of EAOC by a data-driven, function-based analytic model with the quantified molecular functions defined by 1454 Gene Ontology (GO) term gene sets. This model converts the gene expression profiles to the functionome consisting of 1454 quantified GO functions, and then, the key functions involving the malignant transformation of EOAC can be extracted by a series of filters. Our results demonstrate that the deregulated oxidoreductase activity, metabolism, hormone activity, inflammatory response, innate immune response and cell-cell signaling play the key roles in the malignant transformation of EAOC. These results provide the evidence supporting the specific molecular pathways involved in the malignant transformation of EAOC.
Assuntos
Carcinoma/genética , Endometriose/genética , Proteínas de Neoplasias/genética , Neoplasias Ovarianas/genética , Carcinoma/complicações , Carcinoma/patologia , Transformação Celular Neoplásica/genética , Endometriose/complicações , Endometriose/patologia , Feminino , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Análise em Microsséries , Neoplasias Ovarianas/complicações , Neoplasias Ovarianas/patologia , TranscriptomaRESUMO
The brown adipocyte phenotype (BAP) in white adipose tissue (WAT) is transiently induced in adult mammals in response to reduced ambient temperature. Since it is unknown whether a cold challenge can permanently induce brown adipocytes (BAs), we reared C57BL/6J (B6) and AxB8/PgJ (AxB8) mice at 17 or 29°C from birth to weaning, to assess the BAP in young and adult mice. Energy balance measurements showed that 17°C reduced fat mass in the preweaning mice by increasing energy expenditure and suppressed diet-induced obesity in adults. Microarray analysis of global gene expression of inguinal fat (ING) from 10-day-old (D) mice indicates that expression at 17°C vs. 29°C was not different. Between 10 and 21 days of age, the BAP was induced coincident with morphologic remodeling of ING and marked changes in expression of neural development genes (e.g., Akap 12 and Ngfr). Analyses of Ucp1 mRNA and protein showed that 17°C transiently increased the BAP in ING from 21D mice; however, BAs were unexpectedly present in mice reared at 29°C. The involution of the BAP in WAT occurred after weaning in mice reared at 23°C. Therefore, the capacity to stimulate thermogenically competent BAs in WAT is set by a temperature-independent, genetically controlled program between birth and weaning.
Assuntos
Adipócitos Marrons/fisiologia , Tecido Adiposo Branco/fisiologia , Desenvolvimento Embrionário/fisiologia , Tecido Adiposo Marrom/fisiologia , Animais , Temperatura Baixa , Metabolismo Energético/fisiologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Obesidade/fisiopatologia , FenótipoRESUMO
Genomic data integration is a key goal to be achieved towards large-scale genomic data analysis. This process is very challenging due to the diverse sources of information resulting from genomics experiments. In this work, we review methods designed to combine genomic data recorded from microarray gene expression (MAGE) experiments. It has been acknowledged that the main source of variation between different MAGE datasets is due to the so-called 'batch effects'. The methods reviewed here perform data integration by removing (or more precisely attempting to remove) the unwanted variation associated with batch effects. They are presented in a unified framework together with a wide range of evaluation tools, which are mandatory in assessing the efficiency and the quality of the data integration process. We provide a systematic description of the MAGE data integration methodology together with some basic recommendation to help the users in choosing the appropriate tools to integrate MAGE data for large-scale analysis; and also how to evaluate them from different perspectives in order to quantify their efficiency. All genomic data used in this study for illustration purposes were retrieved from InSilicoDB http://insilico.ulb.ac.be.
Assuntos
Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos , Transcriptoma , Simulação por Computador , Bases de Dados Genéticas , Expressão Gênica , Variação Genética , GenomaRESUMO
INTRODUCTION: This study provides global transcriptomic profiling and analysis of botulinum toxin A (BoNT-A)-treated muscle over a 1-year period. METHODS: Microarray analysis was performed on rat tibialis anterior muscles from 4 groups (n = 4/group) at 1, 4, 12, and 52 weeks after BoNT-A injection compared with saline-injected rats at 12 weeks. RESULTS: Dramatic transcriptional adaptation occurred at 1 week with a paradoxical increase in expression of slow and immature isoforms, activation of genes in competing pathways of repair and atrophy, impaired mitochondrial biogenesis, and increased metal ion imbalance. Adaptations of the basal lamina and fibrillar extracellular matrix (ECM) occurred by 4 weeks. The muscle transcriptome returned to its unperturbed state 12 weeks after injection. CONCLUSIONS: Acute transcriptional adaptations resemble denervated muscle with some subtle differences, but resolved more quickly compared with denervation. Overall, gene expression across time correlates with the generally accepted BoNT-A time course and suggests that the direct action of BoNT-A in skeletal muscle is relatively rapid.
Assuntos
Inibidores da Liberação da Acetilcolina/farmacologia , Toxinas Botulínicas Tipo A/farmacologia , Músculo Esquelético/efeitos dos fármacos , Ativação Transcricional/efeitos dos fármacos , Adaptação Fisiológica/efeitos dos fármacos , Animais , Matriz Extracelular/efeitos dos fármacos , Matriz Extracelular/metabolismo , Perfilação da Expressão Gênica , Masculino , Renovação Mitocondrial/efeitos dos fármacos , Contração Muscular/efeitos dos fármacos , Músculo Esquelético/metabolismo , Atrofia Muscular/induzido quimicamente , Atrofia Muscular/fisiopatologia , Junção Neuromuscular/efeitos dos fármacos , Junção Neuromuscular/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Estresse Oxidativo/efeitos dos fármacos , Ratos , Ratos Sprague-Dawley , Estatística como Assunto , Fatores de TempoRESUMO
The ARID1A gene is pivotal in chromatin remodeling and genomic integrity and is frequently mutated in various cancer types. ARID1A mutation is the second most frequently mutated tumor suppressor gene and has been suggested as a predictor of immunotherapeutic responsiveness in gastric carcinoma (GC). Despite its significance, the relationship among ARID1A somatic mutations, RNA expression levels, and protein expression remains unclear, particularly in GC. For this purpose, we performed comparative study in two cohorts. Cohort 1 used next-generation sequencing (NGS) to identify 112 GC cases with ARID1A mutations. These cases were compared with ARID1A immunohistochemistry (IHC) results. Cohort 2 employed microarray gene expression data to assess ARID1A RNA levels and compare them with ARID1A IHC results. In Cohort 1, 38.4% of ARID1A-mutated GC exhibited a complete loss of ARID1A protein when assessed by IHC, whereas the remaining 61.6% displayed intact ARID1A. Discordance between NGS and IHC results was not associated with specific mutation sites, variant classifications, or variant allele frequencies. In Cohort 2, 24.1% of the patients demonstrated a loss of ARID1A protein, and there was no significant difference in mRNA levels between the ARID1A protein-intact and -loss groups. Our study revealed a substantial discrepancy between ARID1A mutations detected using NGS and protein expression assessed using IHC in GC. Moreover, ARID1A mRNA expression levels did not correlate well with protein expression. These findings highlighted the complexity of ARID1A expression in GC.
Assuntos
Carcinoma , Proteínas de Ligação a DNA , Neoplasias Gástricas , Fatores de Transcrição , Humanos , Carcinoma/genética , Proteínas de Ligação a DNA/genética , Mutação , RNA Mensageiro/genética , Fatores de Transcrição/genética , Neoplasias Gástricas/genéticaRESUMO
In high-dimensional gene expression data, selecting an optimal subset of genes is crucial for achieving high classification accuracy and reliable diagnosis of diseases. This paper proposes a two-stage hybrid model for gene selection based on clustering and a swarm intelligence algorithm to identify the most informative genes with high accuracy. First, a clustering-based multivariate filter approach is performed to explore the interactions between the features and eliminate any redundant or irrelevant ones. Then, by controlling for the problem of premature convergence in the binary Bat algorithm, the optimal gene subset is determined using different classifiers with the Monte Carlo cross-validation data partitioning model. The effectiveness of our proposed framework is evaluated using eight gene expression datasets, by comparison with other recently published algorithms in the literature. Experiments confirm that in seven out of eight datasets, the proposed method can achieve superior results in terms of classification accuracy and gene subset size. In particular, it achieves a classification accuracy of 100% in Lymphoma and Ovarian datasets and above 97.4% in the rest with a minimum number of genes. The results demonstrate that our proposed algorithm has the potential to solve the feature selection problem in different applications with high-dimensional datasets.
Assuntos
Algoritmos , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/classificação , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Bases de Dados Genéticas , Biologia Computacional/métodos , FemininoRESUMO
The identification of tumors through gene analysis in microarray data is a pivotal area of research in artificial intelligence and bioinformatics. This task is challenging due to the large number of genes relative to the limited number of observations, making feature selection a critical step. This paper introduces a novel wrapper feature selection method that leverages a hybrid optimization algorithm combining a genetic operator with a Sinh Cosh Optimizer (SCHO), termed SCHO-GO. The SCHO-GO algorithm is designed to avoid local optima, streamline the search process, and select the most relevant features without compromising classifier performance. Traditional methods often falter with extensive search spaces, necessitating hybrid approaches. Our method aims to reduce the dimensionality and improve the classification accuracy, which is essential in pattern recognition and data analysis. The SCHO-GO algorithm, integrated with a support vector machine (SVM) classifier, significantly enhances cancer classification accuracy. We evaluated the performance of SCHO-GO using the CEC'2022 benchmark function and compared it with seven well-known metaheuristic algorithms. Statistical analyses indicate that SCHO-GO consistently outperforms these algorithms. Experimental tests on eight microarray gene expression datasets, particularly the Gene Expression Cancer RNA-Seq dataset, demonstrate an impressive accuracy of 99.01% with the SCHO-GO-SVM model, highlighting its robustness and precision in handling complex datasets. Furthermore, the SCHO-GO algorithm excels in feature selection and solving mathematical benchmark problems, presenting a promising approach for tumor identification and classification in microarray data analysis.
Assuntos
Neoplasias , Máquina de Vetores de Suporte , Humanos , Neoplasias/genética , Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodosRESUMO
Hepatitis C virus (HCV) infection poses a significant public health challenge and often leads to long-term health complications and even death. Parkinson's disease (PD) is a progressive neurodegenerative disorder with a proposed viral etiology. HCV infection and PD have been previously suggested to be related. This work aimed to identify potential biomarkers and pathways that may play a role in the joint development of PD and HCV infection. Using BioOptimatics-bioinformatics driven by mathematical global optimization-, 22 publicly available microarray and RNAseq datasets for both diseases were analyzed, focusing on sex-specific differences. Our results revealed that 19 genes, including MT1H, MYOM2, and RPL18, exhibited significant changes in expression in both diseases. Pathway and network analyses stratified by sex indicated that these gene expression changes were enriched in processes related to immune response regulation in females and immune cell activation in males. These findings suggest a potential link between HCV infection and PD, highlighting the importance of further investigation into the underlying mechanisms and potential therapeutic targets involved.
Assuntos
Hepatite C , Doença de Parkinson , Feminino , Humanos , Masculino , Biomarcadores , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Hepacivirus/genética , Hepatite C/complicações , Hepatite C/virologia , Doença de Parkinson/genética , Doença de Parkinson/virologia , Fatores SexuaisRESUMO
The microarray gene expression data poses a tremendous challenge due to their curse of dimensionality problem. The sheer volume of features far surpasses available samples, leading to overfitting and reduced classification accuracy. Thus the dimensionality of microarray gene expression data must be reduced with efficient feature extraction methods to reduce the volume of data and extract meaningful information to enhance the classification accuracy and interpretability. In this research, we discover the uniqueness of applying STFT (Short Term Fourier Transform), LASSO (Least Absolute Shrinkage and Selection Operator), and EHO (Elephant Herding Optimisation) for extracting significant features from lung cancer and reducing the dimensionality of the microarray gene expression database. The classification of lung cancer is performed using the following classifiers: Gaussian Mixture Model (GMM), Particle Swarm Optimization (PSO) with GMM, Detrended Fluctuation Analysis (DFA), Naive Bayes classifier (NBC), Firefly with GMM, Support Vector Machine with Radial Basis Kernel (SVM-RBF) and Flower Pollination Optimization (FPO) with GMM. The EHO feature extraction with the FPO-GMM classifier attained the highest accuracy in the range of 96.77, with an F1 score of 97.5, MCC of 0.92 and Kappa of 0.92. The reported results underline the significance of utilizing STFT, LASSO, and EHO for feature extraction in reducing the dimensionality of microarray gene expression data. These methodologies also help in improved and early diagnosis of lung cancer with enhanced classification accuracy and interpretability.
Assuntos
Neoplasias do Colo , Perfilação da Expressão Gênica , Aprendizado de Máquina , Humanos , Neoplasias do Colo/genética , Perfilação da Expressão Gênica/métodos , Máquina de Vetores de Suporte , Algoritmos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Teorema de Bayes , Regulação Neoplásica da Expressão Gênica , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/classificação , Análise de FourierRESUMO
Background: Gene expression data are often used to classify cancer genes. In such high-dimensional datasets, however, only a few feature genes are closely related to tumors. Therefore, it is important to accurately select a subset of feature genes with high contributions to cancer classification. Methods: In this article, a new three-stage hybrid gene selection method is proposed that combines a variance filter, extremely randomized tree and Harris Hawks (VEH). In the first stage, we evaluated each gene in the dataset through the variance filter and selected the feature genes that meet the variance threshold. In the second stage, we use extremely randomized tree to further eliminate irrelevant genes. Finally, we used the Harris Hawks algorithm to select the gene subset from the previous two stages to obtain the optimal feature gene subset. Results: We evaluated the proposed method using three different classifiers on eight published microarray gene expression datasets. The results showed a 100% classification accuracy for VEH in gastric cancer, acute lymphoblastic leukemia and ovarian cancer, and an average classification accuracy of 95.33% across a variety of other cancers. Compared with other advanced feature selection algorithms, VEH has obvious advantages when measured by many evaluation criteria.
RESUMO
Microarray gene expression data are useful for identifying gene expression patterns associated with cancer outcomes; however, their high dimensionality make it difficult to extract meaningful information and accurately classify tumors. Hence, developing effective methods for reducing dimensionality while preserving relevant information is a crucial task. Hybrid-based gene selection methods are widely proposed in the gene expression analysis domain and can still be enhanced in terms of efficiency and reliability. This study proposes a new hybrid-based gene selection method, called multi-filter embedded mountain gazelle optimizer (MUL-MGO), which utilizes two filters and an embedded method to remove irrelevant genes, followed by selecting the most relevant genes using recently developed MGO algorithm. To the best of our knowledge, this is the first work to exploit MGO as a gene or feature selection method. A new version of MGO, called recursive mountain gazelle optimizer (RMGO), which implements MGO algorithm recursively to avoid local optima, minimize search space, and obtain minimum gene count without decreasing the classifier's performance, is developed. The proposed RMGO is used to develop a new hybrid gene selection method employing similar filters and embedded methods as MUL-MGO, but with a recursive MGO algorithm version. The resulting method is called multi-filter embedded recursive mountain gazelle optimizer (MUL-RMGO). Several classifiers are used for cancer classification. Accordingly, several experimental studies are performed on eight microarray gene expression datasets to demonstrate the proficiencies of MUL-MGO and MUL-RMGO methods. The experimental findings indicate the efficiency and productivity of the suggested MUL-MGO and MUL-RMGO methods for gene selection. The methods outperform cutting-edge methods in the literature, with MUL-RMGO exceeding MUL-MGO in terms of accuracy and selected gene count.