RESUMO
The origin of the genetic code remains enigmatic five decades after it was elucidated, although there is growing evidence that the code coevolved progressively with the ribosome. A number of primordial codes were proposed as ancestors of the modern genetic code, including comma-free codes such as the RRY, RNY, or GNC codes (R = G or A, Y = C or T, N = any nucleotide), and the X circular code, an error-correcting code that also allows identification and maintenance of the reading frame. It was demonstrated previously that motifs of the X circular code are significantly enriched in the protein-coding genes of most organisms, from bacteria to eukaryotes. Here, we show that imprints of this code also exist in the ribosomal RNA (rRNA). In a large-scale study involving 133 organisms representative of the three domains of life, we identified 32 universal X motifs that are conserved in the rRNA of >90% of the organisms. Intriguingly, most of the universal X motifs are located in rRNA regions involved in important ribosome functions, notably in the peptidyl transferase center and the decoding center that form the original "proto-ribosome." Building on the existing accretion models for ribosome evolution, we propose that error-correcting circular codes represented an important step in the emergence of the modern genetic code. Thus, circular codes would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emergence of more sophisticated start codon recognition and translation initiation mechanisms.
Assuntos
Evolução Molecular , Código Genético , Motivos de Nucleotídeos , Biossíntese de Proteínas , Ribossomos/genética , Ribossomos/metabolismo , Modelos Biológicos , Modelos Moleculares , Conformação Molecular , Conformação de Ácido Nucleico , RNA Ribossômico/química , RNA Ribossômico/genética , Ribossomos/química , Relação Estrutura-AtividadeRESUMO
OrthoInspector is one of the leading software suites for orthology relations inference. In this paper, we describe a major redesign of the OrthoInspector online resource along with a significant increase in the number of species: 4753 organisms are now covered across the three domains of life, making OrthoInspector the most exhaustive orthology resource to date in terms of covered species (excluding viruses). The new website integrates original data exploration and visualization tools in an ergonomic interface. Distributions of protein orthologs are represented by heatmaps summarizing their evolutionary histories, and proteins with similar profiles can be directly accessed. Two novel tools have been implemented for comparative genomics: a phylogenetic profile search that can be used to find proteins with a specific presence-absence profile and investigate their functions and, inversely, a GO profiling tool aimed at deciphering evolutionary histories of molecular functions, processes or cell components. In addition to the re-designed website, the OrthoInspector resource now provides a REST interface for programmatic access. OrthoInspector 3.0 is available at http://lbgi.fr/orthoinspectorv3.
Assuntos
Bases de Dados Genéticas , Genômica , Algoritmos , Bactérias/genética , Classificação , Eucariotos/genética , Evolução Molecular , Previsões , Ontologia Genética , Internet , Filogenia , Proteoma , Homologia de Sequência do Ácido Nucleico , Software , Especificidade da EspécieRESUMO
Cilia (flagella) are important eukaryotic organelles, present in the Last Eukaryotic Common Ancestor, and are involved in cell motility and integration of extracellular signals. Ciliary dysfunction causes a class of genetic diseases, known as ciliopathies, however current knowledge of the underlying mechanisms is still limited and a better characterization of genes is needed. As cilia have been lost independently several times during evolution and they are subject to important functional variation between species, ciliary genes can be investigated through comparative genomics. We performed phylogenetic profiling by predicting orthologs of human protein-coding genes in 100 eukaryotic species. The analysis integrated three independent methods to predict a consensus set of 274 ciliary genes, including 87 new promising candidates. A fine-grained analysis of the phylogenetic profiles allowed a partitioning of ciliary genes into modules with distinct evolutionary histories and ciliary functions (assembly, movement, centriole, etc.) and thus propagation of potential annotations to previously undocumented genes. The cilia/basal body localization was experimentally confirmed for five of these previously unannotated proteins (LRRC23, LRRC34, TEX9, WDR27, and BIVM), validating the relevance of our approach. Furthermore, our multi-level analysis sheds light on the core gene sets retained in gamete-only flagellates or Ecdysozoa for instance. By combining gene-centric and species-oriented analyses, this work reveals new ciliary and ciliopathy gene candidates and provides clues about the evolution of ciliary processes in the eukaryotic domain. Additionally, the positive and negative reference gene sets and the phylogenetic profile of human genes constructed during this study can be exploited in future work.
Assuntos
Cílios/genética , Ciliopatias/genética , Animais , Movimento Celular/genética , Cílios/metabolismo , Ciliopatias/metabolismo , Bases de Dados de Ácidos Nucleicos , Eucariotos , Células Eucarióticas , Evolução Molecular , Flagelos/genética , Flagelos/metabolismo , Genômica , Humanos , Filogenia , Análise de Sequência de DNA/métodosRESUMO
Numerous mutations in each of the mitochondrial aminoacyl-tRNA synthetases (aaRSs) have been implicated in human diseases. The mutations are autosomal and recessive and lead mainly to neurological disorders, although with pleiotropic effects. The processes and interactions that drive the etiology of the disorders associated with mitochondrial aaRSs (mt-aaRSs) are far from understood. The complexity of the clinical, genetic, and structural data requires concerted, interdisciplinary efforts to understand the molecular biology of these disorders. Toward this goal, we designed MiSynPat, a comprehensive knowledge base together with an ergonomic Web server designed to organize and access all pertinent information (sequences, multiple sequence alignments, structures, disease descriptions, mutation characteristics, original literature) on the disease-linked human mt-aaRSs. With MiSynPat, a user can also evaluate the impact of a possible mutation on sequence-conservation-structure in order to foster the links between basic and clinical researchers and to facilitate future diagnosis. The proposed integrated view, coupled with research on disease-related mt-aaRSs, will help to reveal new functions for these enzymes and to open new vistas in the molecular biology of the cell. The purpose of MiSynPat, freely available at http://misynpat.org, is to constitute a reference and a converging resource for scientists and clinicians.
Assuntos
Aminoacil-tRNA Sintetases/genética , Bases de Dados Genéticas , Mitocôndrias/enzimologia , Mutação/genética , Sequência de Aminoácidos , Aminoacil-tRNA Sintetases/química , Evolução Molecular , Doenças Genéticas Inatas/genética , Humanos , Mitocôndrias/genética , Estrutura Molecular , Conformação ProteicaRESUMO
BACKGROUND: The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. OBJECTIVE: MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. METHODS: MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. RESULTS: MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user's specific interests and provides an efficient way to share information with collaborators. Furthermore, the user's behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. CONCLUSIONS: We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends.
Assuntos
Doenças Genéticas Inatas/genética , Testes Genéticos/métodos , Rede Social , Telemedicina/estatística & dados numéricos , Amigos , Humanos , PesquisadoresRESUMO
SUMMARY: We previously developed OrthoInspector, a package incorporating an original algorithm for the detection of orthology and inparalogy relations between different species. We have added new functionalities to the package. While its original algorithm was not modified, performing similar orthology predictions, we facilitated the prediction of very large databases (thousands of proteomes), refurbished its graphical interface, added new visualization tools for comparative genomics/protein family analysis and facilitated its deployment in a network environment. Finally, we have released three online databases of precomputed orthology relationships. AVAILABILITY: Package and databases are freely available at http://lbgi.fr/orthoinspector with all major browsers supported. CONTACT: odile.lecompte@unistra.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Gráficos por Computador , Bases de Dados Factuais , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Software , Humanos , Anotação de Sequência Molecular , FilogeniaRESUMO
SUMMARY: We present PARSEC (PAtteRn Search and Contextualization), a new open source platform for guided discovery, allowing localization and biological characterization of short genomic sites in entire eukaryotic genomes. PARSEC can search for a sequence or a degenerated pattern. The retrieved set of genomic sites can be characterized in terms of (i) conservation in model organisms, (ii) genomic context (proximity to genes) and (iii) function of neighboring genes. These modules allow the user to explore, visualize, filter and extract biological knowledge from a set of short genomic regions such as transcription factor binding sites. AVAILABILITY: Web site implemented in Java, JavaScript and C++, with all major browsers supported. Freely available at lbgi.fr/parsec. Source code is freely available at sourceforge.net/projects/genomicparsec.
Assuntos
Genômica/métodos , Algoritmos , Genoma , Humanos , Internet , Dinâmica não Linear , Linguagens de Programação , SoftwareRESUMO
A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products. The KD4v (Comprehensible Knowledge Discovery System for Missense Variant) server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants. The server provides a set of rules learned by Induction Logic Programming (ILP) on a set of missense variants described by conservation, physico-chemical, functional and 3D structure predicates. These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation. The web server is available at http://decrypthon.igbmc.fr/kd4v.
Assuntos
Doença/genética , Mutação de Sentido Incorreto , Polimorfismo de Nucleotídeo Único , Software , Estudos de Associação Genética , Humanos , Internet , Bases de Conhecimento , Fenótipo , Proteínas/química , Proteínas/genéticaRESUMO
TFIIH is a eukaryotic complex composed of two subcomplexes, the CAK (Cdk activating kinase) and the core-TFIIH. The core-TFIIH, composed of seven subunits (XPB, XPD, P62, P52, P44, P34, and P8), plays a crucial role in transcription and repair. Here, we performed an extended sequence analysis to establish the accurate phylogenetic distribution of the core-TFIIH in 63 eukaryotic organisms. In spite of the high conservation of the seven subunits at the sequence and genomic levels, the non-enzymatic P8, P34, P52 and P62 are absent from one or a few unicellular species. To gain insight into their respective roles, we undertook a comparative genomic analysis of the whole proteome to identify the gene sets sharing similar presence/absence patterns. While little information was inferred for P8 and P62, our studies confirm the known role of P52 in repair and suggest for the first time the implication of the core TFIIH in mRNA splicing via P34.
Assuntos
Evolução Molecular , Complexos Multiproteicos/genética , Filogenia , Fator de Transcrição TFIIH/genética , Animais , Quinases Ciclina-Dependentes/genética , Proteínas de Ligação a DNA , Humanos , Subunidades Proteicas/genética , Transcrição GênicaRESUMO
BACKGROUND: Membrane trafficking involves the complex regulation of proteins and lipids intracellular localization and is required for metabolic uptake, cell growth and development. Different trafficking pathways passing through the endosomes are coordinated by the ENTH/ANTH/VHS adaptor protein superfamily. The endosomes are crucial for eukaryotes since the acquisition of the endomembrane system was a central process in eukaryogenesis. RESULTS: Our in silico analysis of this ENTH/ANTH/VHS superfamily, consisting of proteins gathered from 84 complete genomes representative of the different eukaryotic taxa, revealed that genomic distribution of this superfamily allows to discriminate Fungi and Metazoa from Plantae and Protists. Next, in a four way genome wide comparison, we showed that this discriminative feature is observed not only for other membrane trafficking effectors, but also for proteins involved in metabolism and in cytokinesis, suggesting that metabolism, cytokinesis and intracellular trafficking pathways co-evolved. Moreover, some of the proteins identified were implicated in multiple functions, in either trafficking and metabolism or trafficking and cytokinesis, suggesting that membrane trafficking is central to this co-evolution process. CONCLUSIONS: Our study suggests that membrane trafficking and compartmentalization were not only key features for the emergence of eukaryotic cells but also drove the separation of the eukaryotes in the different taxa.
Assuntos
Membrana Celular/metabolismo , Genômica/métodos , Transporte Proteico/fisiologia , Proteínas/metabolismo , Evolução Biológica , Citocinese/fisiologia , Filogenia , Proteínas/química , Proteínas/classificaçãoRESUMO
Rod-derived Cone Viability Factor (RdCVF) is a trophic factor with therapeutic potential for the treatment of retinitis pigmentosa, a retinal disease that commonly results in blindness. RdCVF is encoded by Nucleoredoxin-like 1 (Nxnl1), a gene homologous with the family of thioredoxins that participate in the defense against oxidative stress. RdCVF expression is lost after rod degeneration in the first phase of retinitis pigmentosa, and this loss has been implicated in the more clinically significant secondary cone degeneration that often occurs. Here, we describe a study of the Nxnl1 promoter using an approach that combines promoter and transcriptomic analysis. By transfection of selected candidate transcription factors, chosen based upon their expression pattern, we identified the homeodomain proteins CHX10/VSX2, VSX1 and PAX4, as well as the zinc finger protein SP3, as factors that can stimulate both the mouse and human Nxnl1 promoter. In addition, CHX10/VSX2 binds to the Nxnl1 promoter in vivo. Since CHX10/VSX2 is expressed predominantly in the inner retina, this finding motivated us to demonstrate that RdCVF is expressed in the inner as well as the outer retina. Interestingly, the loss of rods in the rd1 mouse, a model of retinitis pigmentosa, is associated with decreased expression of RdCVF by inner retinal cells as well as by rods. Based upon these results, we propose an alternative therapeutic strategy aimed at recapitulating RdCVF expression in the inner retina, where cell loss is not significant, to prevent secondary cone death and central vision loss in patients suffering from retinitis pigmentosa.
Assuntos
Proteínas do Olho/genética , Genes Homeobox , Proteínas de Homeodomínio/metabolismo , Regiões Promotoras Genéticas , Retina/metabolismo , Tiorredoxinas/genética , Fatores de Transcrição/metabolismo , Animais , Proteínas do Olho/metabolismo , Regulação da Expressão Gênica , Proteínas de Homeodomínio/genética , Humanos , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Knockout , Ligação Proteica , Retinose Pigmentar/genética , Retinose Pigmentar/metabolismo , Tiorredoxinas/metabolismo , Fatores de Transcrição/genéticaRESUMO
BACKGROUND: The deep-sea hydrothermal vent mussel Bathymodiolus azoricus harbors thiotrophic and methanotrophic symbiotic bacteria in its gills. While the symbiotic relationship between this hydrothermal mussel and these chemoautotrophic bacteria has been described, the molecular processes involved in the cross-talking between symbionts and host, in the maintenance of the symbiois, in the influence of environmental parameters on gene expression, and in transcriptome variation across individuals remain poorly understood. In an attempt to understand how, and to what extent, this double symbiosis affects host gene expression, we used a transcriptomic approach to identify genes potentially regulated by symbiont characteristics, environmental conditions or both. This study was done on mussels from two contrasting populations. RESULTS: Subtractive libraries allowed the identification of about 1000 genes putatively regulated by symbiosis and/or environmental factors. Microarray analysis showed that 120 genes (3.5% of all genes) were differentially expressed between the Menez Gwen (MG) and Rainbow (Rb) vent fields. The total number of regulated genes in mussels harboring a high versus a low symbiont content did not differ significantly. With regard to the impact of symbiont content, only 1% of all genes were regulated by thiotrophic (SOX) and methanotrophic (MOX) bacteria content in MG mussels whereas 5.6% were regulated in mussels collected at Rb. MOX symbionts also impacted a higher proportion of genes than SOX in both vent fields. When host transcriptome expression was analyzed with respect to symbiont gene expression, it was related to symbiont quantity in each field. CONCLUSIONS: Our study has produced a preliminary description of a transcriptomic response in a hydrothermal vent mussel host of both thiotrophic and methanotrophic symbiotic bacteria. This model can help to identify genes involved in the maintenance of symbiosis or regulated by environmental parameters. Our results provide evidence of symbiont effect on transcriptome regulation, with differences related to type of symbiont, even though the relative percentage of genes involved remains limited. Differences observed between the vent site indicate that environment strongly influences transcriptome regulation and impacts both activity and relative abundance of each symbiont. Among all these genes, those participating in recognition, the immune system, oxidative stress, and energy metabolism constitute new promising targets for extended studies on symbiosis and the effect of environmental parameters on the symbiotic relationships in B. azoricus.
Assuntos
Meio Ambiente , Regulação da Expressão Gênica , Mytilidae/genética , Simbiose/fisiologia , Animais , Bactérias/enzimologia , Bactérias/metabolismo , Biblioteca Gênica , Brânquias/microbiologia , Metanol/metabolismo , Oxigenases de Função Mista/genética , Oxigenases de Função Mista/metabolismo , Sulfato Adenililtransferase/genética , Sulfato Adenililtransferase/metabolismo , Compostos de Sulfidrila/metabolismo , TranscriptomaRESUMO
The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.
Assuntos
Códon/genética , Regulação da Expressão Gênica/genética , Motivos de Nucleotídeos/genética , Código Genético/genética , Fases de Leitura , RibossomosRESUMO
In the multiomics era, comparative genomics studies based on gene repertoire comparison are increasingly used to investigate evolutionary histories of species, to study genotype-phenotype relations, species adaptation to various environments, or to predict gene function using phylogenetic profiling. However, comparisons of orthologs have highlighted the prevalence of sequence plasticity among species, showing the benefits of combining protein and subprotein levels of analysis to allow for a more comprehensive study of genotype/phenotype correlations. In this article, we introduce a new approach called BLUR (BLAST Unexpected Ranking), capable of detecting genotype divergence or specialization between two related clades at different levels: gain/loss of proteins but also of subprotein regions. These regions can correspond to known domains, uncharacterized regions, or even small motifs. Our method was created to allow two types of research strategies: 1) the comparison of two groups of species with no previous knowledge, with the aim of predicting phenotype differences or specializations between close species or 2) the study of specific phenotypes by comparing species that present the phenotype of interest with species that do not. We designed a website to facilitate the use of BLUR with a possibility of in-depth analysis of the results with various tools, such as functional enrichments, protein-protein interaction networks, and multiple sequence alignments. We applied our method to the study of two different biological pathways and to the comparison of several groups of close species, all with very promising results. BLUR is freely available at http://lbgi.fr/blur/.
Assuntos
Evolução Molecular , Genômica/métodos , Proteínas/genética , Proteoma/genética , Proteoma/metabolismo , Animais , Proteínas do Domínio Armadillo , Bactérias , Sequência Conservada/genética , Fungos , Genótipo , Humanos , Fenótipo , Filogenia , Alinhamento de Sequência , Análise de Sequência , SoftwareRESUMO
Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database (http://www.peroxisomeDB.org) that includes the complete peroxisomal proteome of Homo sapiens and Saccharomyces cerevisiae, by gathering, updating and integrating the available genetic and functional information on peroxisomal genes. PeroxisomeDB is structured in interrelated sections 'Genes', 'Functions', 'Metabolic pathways' and 'Diseases', that include hyperlinks to selected features of NCBI, ENSEMBL and UCSC databases. We have designed graphical depictions of the main peroxisomal metabolic routes and have included updated flow charts for diagnosis. Precomputed BLAST, PSI-BLAST, multiple sequence alignment (MUSCLE) and phylogenetic trees are provided to assist in direct multispecies comparison to study evolutionary conserved functions and pathways. Highlights of the PeroxisomeDB include new tools developed for facilitating (i) identification of novel peroxisomal proteins, by means of identifying proteins carrying peroxisome targeting signal (PTS) motifs, (ii) detection of peroxisomes in silico, particularly useful for screening the deluge of newly sequenced genomes. PeroxisomeDB should contribute to the systematic characterization of the peroxisomal proteome and facilitate system biology approaches on the organelle.
Assuntos
Bases de Dados de Proteínas , Transtornos Peroxissômicos/genética , Peroxissomos/metabolismo , Proteoma/genética , Proteoma/fisiologia , Animais , Genômica , Humanos , Internet , Camundongos , Sinais Direcionadores de Proteínas , Proteoma/química , Ratos , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiologia , Software , Interface Usuário-ComputadorRESUMO
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel, 2015, 2017; Arquès and Michel, 1996). This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code (Arquès and Michel, 1996). Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the reading frame in genes. In a recent study of the X motifs in the complete genome of the yeast, Saccharomyces cerevisiae, it was shown that they are significantly enriched in the reading frame of the genes (protein-coding regions) of the genome (Michel et al., 2017). It was suggested that these X motifs may be evolutionary relics of a primitive code originally used for gene translation. The aim of this paper is to address two questions: are X motifs conserved during evolution? and do they continue to play a functional role in the processes of genome decoding and protein production? In a large scale analysis involving complete genomes from four mammals and nine different yeast species, we highlight specific evolutionary pressures on the X motifs in the genes of all the genomes, and identify important new properties of X motif conservation at the level of the encoded amino acids. We then compare the occurrence of X motifs with existing experimental data concerning protein expression and protein production, and report a significant correlation between the number of X motifs in a gene and increased protein abundance. In a general way, this work suggests that motifs from circular codes, i.e. motifs having the property of reading frame retrieval, may represent functional elements located within the coding regions of extant genomes.
Assuntos
Algoritmos , Eucariotos/genética , Evolução Molecular , Código Genético , Genoma , Modelos Genéticos , Motivos de Nucleotídeos , Animais , Sequência de Bases , Eucariotos/fisiologia , Homologia de SequênciaRESUMO
BACKGROUND: The retina is a multi-layered sensory tissue that lines the back of the eye and acts at the interface of input light and visual perception. Its main function is to capture photons and convert them into electrical impulses that travel along the optic nerve to the brain where they are turned into images. It consists of neurons, nourishing blood vessels and different cell types, of which neural cells predominate. Defects in any of these cells can lead to a variety of retinal diseases, including age-related macular degeneration, retinitis pigmentosa, Leber congenital amaurosis and glaucoma. Recent progress in genomics and microarray technology provides extensive opportunities to examine alterations in retinal gene expression profiles during development and diseases. However, there is no specific database that deals with retinal gene expression profiling. In this context we have built RETINOBASE, a dedicated microarray database for retina. DESCRIPTION: RETINOBASE is a microarray relational database, analysis and visualization system that allows simple yet powerful queries to retrieve information about gene expression in retina. It provides access to gene expression meta-data and offers significant insights into gene networks in retina, resulting in better hypothesis framing for biological problems that can subsequently be tested in the laboratory. Public and proprietary data are automatically analyzed with 3 distinct methods, RMA, dChip and MAS5, then clustered using 2 different K-means and 1 mixture models method. Thus, RETINOBASE provides a framework to compare these methods and to optimize the retinal data analysis. RETINOBASE has three different modules, "Gene Information", "Raw Data System Analysis" and "Fold change system Analysis" that are interconnected in a relational schema, allowing efficient retrieval and cross comparison of data. Currently, RETINOBASE contains datasets from 28 different microarray experiments performed in 5 different model systems: drosophila, zebrafish, rat, mouse and human. The database is supported by a platform that is designed to easily integrate new functionalities and is also frequently updated. CONCLUSION: The results obtained from various biological scenarios can be visualized, compared and downloaded. The results of a case study are presented that highlight the utility of RETINOBASE. Overall, RETINOBASE provides efficient access to the global expression profiling of retinal genes from different organisms under various conditions.
Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Internet , Retina , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da InformaçãoRESUMO
Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.
Assuntos
Códon de Terminação , Bases de Dados Genéticas , Mutação da Fase de Leitura , Genoma Arqueal , Genoma Bacteriano , Bacillus/genética , Genômica , Internet , Mycobacterium smegmatis/genética , Homologia de Sequência de Aminoácidos , Interface Usuário-ComputadorRESUMO
BACKGROUND: The post-genomic era is characterised by a torrent of biological information flooding the public databases. As a direct consequence, similarity searches starting with a single query sequence frequently lead to the identification of hundreds, or even thousands of potential homologues. The huge volume of data renders the subsequent structural, functional and evolutionary analyses very difficult. It is therefore essential to develop new strategies for efficient sampling of this large sequence space, in order to reduce the number of sequences to be processed. At the same time, it is important to retain the most pertinent sequences for structural and functional studies. RESULTS: An exhaustive analysis on a large scale test set (284 protein families) was performed to compare the efficiency of four different sampling methods aimed at selecting the most pertinent sequences. These four methods sample the proteins detected by BlastP searches and can be divided into two categories: two customisable methods where the user defines either the maximal number or the percentage of sequences to be selected; two automatic methods in which the number of sequences selected is determined by the program. We focused our analysis on the potential information content of the sampled sets of sequences using multiple alignment of complete sequences as the main validation tool. The study considered two criteria: the total number of sequences in BlastP and their associated E-values. The subsequent analyses investigated the influence of the sampling methods on the E-value distributions, the sequence coverage, the final multiple alignment quality and the active site characterisation at various residue conservation thresholds as a function of these criteria. CONCLUSION: The comparative analysis of the four sampling methods allows us to propose a suitable sampling strategy that significantly reduces the number of homologous sequences required for alignment, while at the same time maintaining the relevant information concerning the active site residues.
Assuntos
Algoritmos , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Proteínas/metabolismo , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sequência Conservada , Sistemas de Gerenciamento de Base de Dados , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos , Relação Estrutura-AtividadeRESUMO
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae. Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae. We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae, but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.