RESUMO
This study aimed to perform exhaustive bioinformatic analysis by using GSE29221 micro-array maps obtained from healthy controls and Type 2 Diabetes (T2DM) patients. Raw data are downloaded from the Gene Expression Omnibus database and processed by the limma package in R software to identify Differentially Expressed Genes (DEGs). Gene ontology functional analysis and Kyoto Gene Encyclopedia and Genome Pathway analysis are performed to determine the biological functions and pathways of DEGs. A protein interaction network is constructed using the STRING database and Cytoscape software to identify key genes. Finally, immune infiltration analysis is performed using the Cibersort method. This study has implications for understanding the underlying molecular mechanism of T2DM and provides potential targets for further research.
Assuntos
Biologia Computacional , Diabetes Mellitus Tipo 2 , Perfilação da Expressão Gênica , Humanos , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/imunologia , Mapas de Interação de Proteínas/genética , Redes Reguladoras de Genes/genética , Ontologia Genética , Bases de Dados Genéticas , Estudos de Casos e ControlesRESUMO
The allele frequency net database (AFND, http://www.allelefrequencies.net ) is an online web-based repository that contains information on the frequencies of immune-related genes and their corresponding alleles in worldwide human populations. At present, the website contains data from 1784 population samples in more than 14 million individuals from 129 countries on the frequency of genes from different polymorphic regions including data for the human leukocyte antigen (HLA) system. In addition, over the last four years, AFND has also incorporated genotype raw data from 85,000 individuals comprising 215 population samples from 39 countries. Moreover, more population data sets containing next generation sequencing data spanning >3 million individuals have been added. This resource has been widely used in a variety of contexts such as histocompatibility, immunology, epidemiology, pharmacogenetics, epitope prediction algorithms for population coverage in vaccine development, population genetics, among many others. In this chapter, we present an update of the most used searching mechanisms as described in a previous volume and some of the latest developments included in AFND.
Assuntos
Bases de Dados Genéticas , Frequência do Gene , Genética Populacional , Humanos , Genética Populacional/métodos , Antígenos HLA/genética , Alelos , Biologia Computacional/métodos , Internet , Navegador , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Advancements in variant curation challenges: minority representation and incomplete data reporting.
Assuntos
Proteína BRCA1 , Proteína BRCA2 , Variação Genética , Humanos , Proteína BRCA1/genética , Proteína BRCA2/genética , Predisposição Genética para Doença , Feminino , Neoplasias da Mama/genética , Bases de Dados GenéticasRESUMO
OBJECTIVE: One of the techniques that has gained much attention is the in vitro maturation of oocytes for patients who use assisted reproduction techniques. However, its results are still inferior to controlled ovarian stimulation methodologies. Understanding the maturation mechanisms based on analyses can help improve this methodology's results. The work aims to identify the central genes differentially expressed in oocytes after in vitro maturation in the germinal vesicle and metaphase II stages. METHODS: This work is a computational analysis. The entire search will be conducted using the Gene Expression Omnibus (GEO) database. To carry out and obtain the data present in the work, an advanced research search was carried out in the GEO database within the period from January 1, 2013, to January 1, 2023. A total of 27 genomic data were available in the GEO database, of which only two were used. RESULTS: Two datasets were identified on the Gene Expression Omnibus database platform: registration data GSE158802 and GSE95477. From the analysis, we identified five downregulated and thirty-six upregulated genes; the central genes that correlated with the main gene proteins found were CLTA and PANK1. CONCLUSIONS: There was a differential regulation of gene expression. The most central ones are related to energy capture.
Assuntos
Biologia Computacional , Técnicas de Maturação in Vitro de Oócitos , Oócitos , Humanos , Oócitos/metabolismo , Biologia Computacional/métodos , Feminino , Perfilação da Expressão Gênica/métodos , Bases de Dados GenéticasRESUMO
The identification of orthologous genes is relevant for comparative genomics, phylogenetic analysis, and functional annotation. There are many computational tools for the prediction of orthologous groups as well as web-based resources that offer orthology datasets for download and online analysis. This chapter presents a simple and practical guide to the process of orthologous group prediction, using a dataset of 10 prokaryotic proteomes as example. The orthology methods covered are OrthoMCL, COGtriangles, OrthoFinder2, and OMA. The authors compare the number of orthologous groups predicted by these various methods, and present a brief workflow for the functional annotation and reconstruction of phylogenies from inferred single-copy orthologous genes. The chapter also demonstrates how to explore two orthology databases: eggNOG6 and OrthoDB.
Assuntos
Genômica , Filogenia , Genômica/métodos , Biologia Computacional/métodos , Software , Células Procarióticas/metabolismo , Bases de Dados Genéticas , Anotação de Sequência Molecular/métodos , Família Multigênica , Genoma BacterianoRESUMO
Metagenome-assembled genomes, or MAGs, are genomes retrieved from metagenome datasets. In the vast majority of cases, MAGs are genomes from prokaryotic species that have not been isolated or cultivated in the lab. They, therefore, provide us with information on these species that are impossible to obtain otherwise, at least until new cultivation methods are devised. Thanks to improvements and cost reductions of DNA sequencing technologies and growing interest in microbial ecology, the rise in number of MAGs in genome repositories has been exponential. This chapter covers the basics of MAG retrieval and processing and provides a practical step-by-step guide using a real dataset and state-of-the-art tools for MAG analysis and comparison.
Assuntos
Metagenoma , Metagenômica , Metagenoma/genética , Metagenômica/métodos , Software , Biologia Computacional/métodos , Bases de Dados Genéticas , Análise de Sequência de DNA/métodos , Genoma BacterianoRESUMO
Thanks to advancements in genome sequencing and bioinformatics, thousands of bacterial genome sequences are available in public databases. This presents an opportunity to study bacterial diversity in unprecedented detail. This chapter describes a complete bioinformatics workflow for comparative genomics of bacterial genomes, including genome annotation, pangenome reconstruction and visualization, phylogenetic analysis, and identification of sequences of interest such as antimicrobial-resistance genes, virulence factors, and phage sequences. The workflow uses state-of-the-art, open-source tools. The workflow is presented by means of a comparative analysis of Salmonella enterica serovar Typhimurium genomes. The workflow is based on Linux commands and scripts, and result visualization relies on the R environment. The chapter provides a step-by-step protocol that researchers with basic expertise in bioinformatics can easily follow to conduct investigations on their own genome datasets.
Assuntos
Biologia Computacional , Genoma Bacteriano , Genômica , Filogenia , Software , Genômica/métodos , Biologia Computacional/métodos , Fluxo de Trabalho , Bases de Dados Genéticas , Anotação de Sequência Molecular , Salmonella typhimurium/genéticaRESUMO
The field of viral genomic studies has experienced an unprecedented increase in data volume. New strains of known viruses are constantly being added to the GenBank database and so are completely new species with little or no resemblance to our databases of sequences. In addition to this, metagenomic techniques have the potential to further increase the number and rate of sequenced genomes. Besides, it is important to consider that viruses have a set of unique features that often break down molecular biology dogmas, e.g., the flux of information from RNA to DNA in retroviruses and the use of RNA molecules as genomes. As a result, extracting meaningful information from viral genomes remains a challenge and standard methods for comparing the unknown and our databases of characterized sequences may need adaptations. Thus, several bioinformatic approaches and tools have been created to address the challenge of analyzing viral data. This chapter offers descriptions and protocols of some of the most important bioinformatic techniques for comparative analysis of viruses. The authors also provide comments and discussion on how viruses' unique features can affect standard analyses and how to overcome some of the major sources of problems. Protocols and topics emphasize online tools (which are more accessible to users) and give the real experience of what most bioinformaticians do in day-by-day work with command-line pipelines. The topics discussed include (1) clustering related genomes, (2) whole genome multiple sequence alignments for small RNA viruses, (3) protein alignment for marker genes and species affiliation, (4) variant calling and annotation, and (5) virome analyses and pathogen identification.
Assuntos
Biologia Computacional , Genoma Viral , Vírus , Biologia Computacional/métodos , Vírus/genética , Vírus/classificação , Software , Bases de Dados GenéticasRESUMO
BACKGROUND: The reconstruction of the evolutionary history of organisms has been greatly influenced by the advent of molecular techniques, leading to a significant increase in studies utilizing genomic data from different species. However, the lack of standardization in gene nomenclature poses a challenge in database searches and evolutionary analyses, impacting the accuracy of results obtained. RESULTS: To address this issue, a Python class for standardizing gene nomenclatures, SynGenes, has been developed. It automatically recognizes and converts different nomenclature variations into a standardized form, facilitating comprehensive and accurate searches. Additionally, SynGenes offers a web form for individual searches using different names associated with the same gene. The SynGenes database contains a total of 545 gene name variations for mitochondrial and 2485 for chloroplasts genes, providing a valuable resource for researchers. CONCLUSIONS: The SynGenes platform offers a solution for standardizing gene nomenclatures of mitochondrial and chloroplast genes and providing a standardized search solution for specific markers in GenBank. Evaluation of SynGenes effectiveness through research conducted on GenBank and PubMedCentral demonstrated its ability to yield a greater number of outcomes compared to conventional searches, ensuring more comprehensive and accurate results. This tool is crucial for accurate database searches, and consequently, evolutionary analyses, addressing the challenges posed by non-standardized gene nomenclature.
Assuntos
Evolução Molecular , Terminologia como Assunto , Genes de Cloroplastos , Genes Mitocondriais , Bases de Dados Genéticas , Cloroplastos/genética , Internet , SoftwareRESUMO
Clustered regularly interspaced short palindromic repeats (CRISPR) has been widely characterized as a defense system against phages and other invading elements in bacteria and archaea. A low percentage of Ralstonia solanacearum species complex (RSSC) strains possess the CRISPR array and the CRISPR-associated proteins (Cas) that would confer immunity against various phages. To provide a wide-range screen of the CRISPR presence in the RSSC, we analyzed 378 genomes of RSSC strains to find the CRISPR locus. We found that 20.1, 14.3, and 54.5% of the R. solanacearum, R. pseudosolanacearum, and R. syzygii strains, respectively, possess the CRISPR locus. In addition, we performed further analysis to identify the respective phages that are restricted by the CRISPR arrays. We found 252 different phages infecting different strains of the RSSC, by means of the identification of similarities between the protospacers in phages and spacers in bacteria. We compiled this information in a database with web access called CRISPRals (https://crisprals.yachaytech.edu.ec/). Additionally, we made available a number of tools to detect and identify CRISPR array and Cas genes in genomic sequences that could be uploaded by users. Finally, a matching tool to relate bacteria spacer with phage protospacer sequences is available. CRISPRals is a valuable resource for the scientific community that contributes to the study of bacteria-phage interaction and a starting point that will help to design efficient phage therapy strategies.
Assuntos
Bacteriófagos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Ralstonia solanacearum , Ralstonia solanacearum/virologia , Ralstonia solanacearum/genética , Bacteriófagos/genética , Bacteriófagos/fisiologia , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Bases de Dados Genéticas , Internet , Sistemas CRISPR-Cas , Genoma Bacteriano/genética , Doenças das Plantas/microbiologia , Doenças das Plantas/virologiaRESUMO
The Pacific whiteleg shrimp Penaeus (Litopenaeus) vannamei is a highly relevant species for the world's aquaculture development, for which an incomplete genome is available in public databases. In this work, PacBio long-reads from 14 publicly available genomic libraries (131.2 Gb) were mined to improve the reference genome assembly. The libraries were assembled, polished using Illumina short-reads, and scaffolded with P. vannamei, Feneropenaeus chinensis, and Penaeus monodon genomes. The reference-guided assembly, organized into 44 pseudo-chromosomes and 15,682 scaffolds, showed an improvement from previous reference genomes with a genome size of 2.055 Gb, N50 of 40.14 Mb, L50 of 21, and the longest scaffold of 65.79 Mb. Most orthologous genes (92.6%) of the Arthropoda_odb10 database were detected as "complete," and BRAKER predicted 21,816 gene models; from these, we detected 1,814 single-copy orthologues conserved across the genomic references for Marsupenaeus japonicus, F. chinensis, and P. monodon. Transcriptomic-assembly data aligned in more than 99% to the new reference-guided assembly. The collinearity analysis of the assembled pseudo-chromosomes against the P. vannamei and P. monodon reference genomes showed high conservation in different sets of pseudo-chromosomes. In addition, more than 21,000 publicly available genetic marker sequences were mapped to single-site positions. This new assembly represents a step forward to previously reported P. vannamei assemblies. It will be helpful as a reference genome for future studies on the evolutionary history of the species, the genetic architecture of physiological and sex-determination traits, and the analysis of the changes in genetic diversity and composition of cultivated stocks.
Assuntos
Genoma , Penaeidae , Penaeidae/genética , Animais , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência MolecularRESUMO
RegulonDB is a database that contains the most comprehensive corpus of knowledge of the regulation of transcription initiation of Escherichia coli K-12, including data from both classical molecular biology and high-throughput methodologies. Here, we describe biological advances since our last NAR paper of 2019. We explain the changes to satisfy FAIR requirements. We also present a full reconstruction of the RegulonDB computational infrastructure, which has significantly improved data storage, retrieval and accessibility and thus supports a more intuitive and user-friendly experience. The integration of graphical tools provides clear visual representations of genetic regulation data, facilitating data interpretation and knowledge integration. RegulonDB version 12.0 can be accessed at https://regulondb.ccg.unam.mx.
Assuntos
Bases de Dados Genéticas , Escherichia coli K12 , Regulação Bacteriana da Expressão Gênica , Biologia Computacional/métodos , Escherichia coli K12/genética , Internet , Transcrição GênicaRESUMO
Females of the genus Mansonia feed on the blood of humans, livestock, and other vertebrates to develop their eggs. The females' biting behavior may cause severe disturbance to blood hosts, with a negative impact on public health and economics. Certain species have been identified as potential or effective disease vectors. The accurate species identification of field-collected specimens is of paramount importance for the success of monitoring and control strategies. Mansonia (Mansonia) morphological species boundaries are blurred by patterns of intraspecific heteromorphism and interspecific isomorphism. DNA barcodes can help to solve taxonomic controversies, especially if combined with other molecular tools. We used cytochrome c oxidase subunit I (COI) gene 5' end (DNA barcode) sequences to identify 327 field-collected specimens of Mansonia (Mansonia) spp. The sampling encompassed males and females collected from three Brazilian regions and previously assigned to species based on their morphological characteristics. Eleven GenBank and BOLD sequences were added to the DNA barcode analyses. Initial morphospecies assignments were mostly corroborated by the results of five clustering methods based on Kimura two-parameter distance and maximum likelihood phylogeny. Five to eight molecular operational taxonomic units may represent taxonomically unknown species. The first DNA barcode records for Mansonia fonsecai, Mansonia iguassuensis, and Mansonia pseudotitillans are presented.
Assuntos
Malvaceae , Código de Barras de DNA Taxonômico , Malvaceae/genética , Animais , Filogenia , Brasil , Bases de Dados Genéticas , Análise por ConglomeradosRESUMO
BACKGROUND: Acute myeloid leukemia (AML) is a highly heterogeneous hematological cancer. The current diagnosis and therapy model of AML has gradually shifted to personalization and accuracy. Artesunate, a member of the artemisinin family, has anti-tumor impacts on AML. This research uses network pharmacology and molecular docking to anticipate artesunate potential mechanisms of action in the therapy of AML. METHODS: Screening the action targets of artesunate through Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP), PubChem, and Swiss Target Prediction databases; The databases of Online Mendelian Inheritance in Man (OMIM), Disgenet, GeneCards, and Drugbank were utilized to identify target genes of AML, and an effective target of artesunate for AML treatment was obtained through cross-analysis. Protein-protein interaction (PPI) networks are built on the Cytoscape platform. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted on the relevant targets using R software. Finally, using molecular docking technology and Pymol, we performed verification of the effects of active components and essential targets. RESULTS: Artesunate 30 effective targets for treating AML include CASP3, EGFR, MAPK1, and STAT3, four targeted genes that may have a crucial function in disease management. The virus infection-related pathway (HeptatisB (HBV), Human papillomavirus (HPV), Epstein-Barr virus (EBV) infection and etc.), FoxO, viral carcinogenesis, and proteoglycans in cancer signaling pathways have all been hypothesized to be involved in the action mechanism of GO, which is enriched in 2044 biological processes, 125 molecular functions, 209 cellular components, and 106 KEGG pathways. Molecular docking findings revealed that artesunate was critically important in the therapy of AML due to its high affinity for the four primary disease targets. Molecular docking with a low binding energy yields helpful information for developing medicines against AML. CONCLUSIONS: Consequently, artesunate may play a role in multi-targeted, multi-signaling pathways in treating AML, suggesting that artesunate may have therapeutic potential for AML.
Assuntos
Medicamentos de Ervas Chinesas , Infecções por Vírus Epstein-Barr , Leucemia Mieloide Aguda , Humanos , Simulação de Acoplamento Molecular , Artesunato/uso terapêutico , Farmacologia em Rede , Herpesvirus Humano 4 , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Bases de Dados GenéticasRESUMO
Biodiversity is proposed as a sustainable alternative for the economic development of high-biodiversity regions. Especially in the field of biodiversity genomics, the development of low-cost DNA sequencing opens an opportunity for new actors beyond academia to engage in genomic sequencing. However, it is challenging to adequately compensate non-academic actors such as local populations for their contribution to the innovation process, preventing better bioeconomy development. Although many repositories register genomic data to support biodiversity research, they do not facilitate the fair sharing of economic benefits. In this work, we propose the creation of the Amazon Biobank, a community-based genetic database. We employed blockchain to build a transparent and verifiable log of transactions involving genomic data, and we used smart contracts to implement an internal monetary system for all participants who collect, insert, process, store, and validate genomic data. We also used peer-to-peer solutions to allow users with commodity computers to collaborate with the storage and distribution of DNA files. By combining emerging technologies, Amazon Biobank provides adequate benefit-sharing among all participants that collaborate with data, knowledge, and computational resources. It also provides traceability and auditability, allowing easy association between biotechnological research and DNA data. In addition, the solution is highly scalable and less dependent on the trust deposited in any system player. Therefore, Amazon Biobank can become an important stepping stone to unlock the potential of bioeconomy in rich ecosystems such as the Amazon Rainforest.
Assuntos
Bancos de Espécimes Biológicos , Ecossistema , Humanos , Genômica , Bases de Dados Genéticas , DNARESUMO
The biogeography of bacterial communities is a key topic in Microbial Ecology. Regarding continental water, most studies are carried out in the northern hemisphere, leaving a gap on microorganism's diversity patterns on a global scale. South America harbours approximately one third of the world's total freshwater resources, and is one of these understudied regions. To fill this gap, we compiled 16S rRNA amplicon sequencing data of microbial communities across South America continental water ecosystems, presenting the first database µSudAqua[db]. The database contains over 866 georeferenced samples from 9 different ecoregions with contextual environmental information. For its integration and validation we constructed a curated database (µSudAqua[db.sp]) using samples sequenced by Illumina MiSeq platform with commonly used prokaryote universal primers. This comprised ~60% of the total georeferenced samples of the µSudAqua[db]. This compilation was carried out in the scope of the µSudAqua collaborative network and represents one of the most complete databases of continental water microbial communities from South America.
Assuntos
Microbiota , Bactérias/genética , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Microbiota/genética , RNA Ribossômico 16S/genética , América do Sul , Microbiologia da ÁguaRESUMO
The reduction in the cost of DNA sequencing and the total time to perform this process has resulted in a significant increase in the deposit of biological information in public databases such as the NCBI (National Center for Biotechnology Information). The production of large volumes of data per run has culminated in the need to develop algorithms capable of handling data with this new feature and assisting in analyses such as the assembly and annotation of prokaryotic genomes. Over the years, several pipelines and computational tools have been developed to automate this task and consequently reduce the total time to know the genetic content of a given organism, especially non-model organisms, collaborating with the identification of possible targets with biotechnological applicability. In the case of automatic annotation tools, the accuracy of the results is widely observed in the literature, however, this does not excludes the manual curation process, where the information inferred in the automatic process is verified and enriched by the curators. This task requires a time which is directly proportional to the number of gene products of the target organism under study. To assist in this process, we present the ReNoteWeb web tool, endowed with a simple and intuitive interface, to perform the assembly enhancement process, with the possibility of identifying the missing products in the original genomic sequence. In addition, ReNoteWeb is capable of performing the annotation process for all products, based on information obtained from highly accurate external databases. The engine responsible for performing the data processing was developed in JAVA and the web platform uses the resources of the Yii framework. The annotation produced by this platform aims to reduce the overall time in the manual curation process. Twenty-three organisms were used to validate the tool. The efficiency was verified by comparing the annotation of these same organisms available in the NCBI database and the annotation performed on the RAST platform. The tool is available at: http://biod.ufpa.br/renoteweb/.
Assuntos
Genoma , Genômica , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular , Análise de Sequência de DNA , SoftwareRESUMO
The surface envelope (SU) protein determines the cell tropism and consequently the pathogenesis of the feline leukemia virus (FeLV) in felids. Recombination of exogenous FeLV (exFeLV) with endogenous retroviruses (enFeLV) allows the emergence of more pathogenic variants. Currently, phenotypic testing through interference assays is the only method to distinguish among subgroups-namely, FeLV-A, -B, -C, -E, and -T. This study proposes a new method for FeLV classification based on molecular analysis of the SU gene. A total of 404 publicly available SU sequences were used to reconstruct a maximum likelihood tree. However, only 63 of these sequences had available information about phenotypic tests or subgroup assignments. Two major clusters were observed: (a) clade FeLV-A, which includes FeLV-A, FeLV-C, FeLV-E, and FeLV-T sequences, and (b) clade enFeLV, which includes FeLV-B and enFeLV strains. We found that FeLV-B, FeLV-C, FeLV-E, and FeLV-T SU sequences share similarities to FeLV-A viruses and most likely arose independently through mutation or recombination from this strain. FeLV-B and FeLV-C arose from recombination between FeLV-A and enFeLV viruses, whereas FeLV-T is a monophyletic subgroup that has probably originated from FeLV-A through combined events of deletions and insertions. Unfortunately, this study could not identify polymorphisms that are specifically linked to the FeLV-E subgroup. We propose that phylogenetic and recombination analysis together can explain the current phenotypic classification of FeLV viruses.
Assuntos
Vírus da Leucemia Felina/classificação , Filogenia , Bases de Dados Genéticas , Geografia , Vírus da Leucemia Felina/genética , Mutação , Recombinação Genética , Proteínas do Envelope Viral/genéticaRESUMO
Antimicrobial resistance (AR) is a major global threat to public health. Understanding the population dynamics of AR is critical to restrain and control this issue. However, no study has provided a global picture of the whole resistome of Acinetobacter baumannii, a very important nosocomial pathogen. Here we analyse 1450+ genomes (covering >40 countries and >4 decades) to infer the global population dynamics of the resistome of this species. We show that gene flow and horizontal transfer have driven the dissemination of AR genes in A. baumannii. We found considerable variation in AR gene content across lineages. Although the individual AR gene histories have been affected by recombination, the AR gene content has been shaped by the phylogeny. Furthermore, many AR genes have been transferred to other well-known pathogens, such as Pseudomonas aeruginosa or Klebsiella pneumoniae. Despite using this massive data set, we were not able to sample the whole diversity of AR genes, which suggests that this species has an open resistome. Our results highlight the high mobilization risk of AR genes between important pathogens. On a broader perspective, this study gives a framework for an emerging perspective (resistome-centric) on the genomic epidemiology (and surveillance) of bacterial pathogens.
Assuntos
Acinetobacter baumannii/classificação , Proteínas de Bactérias/genética , Biologia Computacional/métodos , Farmacorresistência Bacteriana Múltipla , Acinetobacter baumannii/efeitos dos fármacos , Acinetobacter baumannii/genética , Bases de Dados Genéticas , Fluxo Gênico , Transferência Genética Horizontal , Filogenia , Sequenciamento Completo do GenomaRESUMO
Pterygium is a common ocular surface condition frequently associated with irritative symptoms. The precise identity of its critical triggers as well as the hierarchical relationship between all the elements involved in the pathogenesis of this disease are not yet elucidated. Meta-analysis of gene expression studies represents a novel strategy capable of identifying key pathogenic mediators and therapeutic targets in complex diseases. Samples from nine patients were collected during surgery after photo documentation and clinical characterization of pterygia. Gene expression experiments were performed using Human Clariom D Assay gene chip. Differential gene expression analysis between active and atrophic pterygia was performed using limma package after adjusting variables by age. In addition, a meta-analysis was performed including recent gene expression studies available at the Gene Expression Omnibus public repository. Two databases including samples from adults with pterygium and controls fulfilled our inclusion criteria. Meta-analysis was performed using the Rank Production algorithm of the RankProd package. Gene set analysis was performed using ClueGO and the transcription factor regulatory network prediction was performed using appropriate bioinformatics tools. Finally, miRNA-mRNA regulatory network was reconstructed using up-regulated genes identified in the gene set analysis from the meta-analysis and their interacting miRNAs from the Brazilian cohort expression data. The meta-analysis identified 154 up-regulated and 58 down-regulated genes. A gene set analysis with the top up-regulated genes evidenced an overrepresentation of pathways associated with remodeling of extracellular matrix. Other pathways represented in the network included formation of cornified envelopes and unsaturated fatty acid metabolic processes. The miRNA-mRNA target prediction network, also reconstructed based on the set of up-regulated genes presented in the gene ontology and biological pathways network, showed that 17 target genes were negatively correlated with their interacting miRNAs from the Brazilian cohort expression data. Once again, the main identified cluster involved extracellular matrix remodeling mechanisms, while the second cluster involved formation of cornified envelope, establishment of skin barrier and unsaturated fatty acid metabolic process. Differential expression comparing active pterygium with atrophic pterygium using data generated from the Brazilian cohort identified differentially expressed genes between the two forms of presentation of this condition. Our results reveal differentially expressed genes not only in pterygium, but also in active pterygium when compared to the atrophic ones. New insights in relation to pterygium's pathophysiology are suggested.