Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
RNA ; 25(12): 1714-1730, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31506380

RESUMEN

The origin of the genetic code remains enigmatic five decades after it was elucidated, although there is growing evidence that the code coevolved progressively with the ribosome. A number of primordial codes were proposed as ancestors of the modern genetic code, including comma-free codes such as the RRY, RNY, or GNC codes (R = G or A, Y = C or T, N = any nucleotide), and the X circular code, an error-correcting code that also allows identification and maintenance of the reading frame. It was demonstrated previously that motifs of the X circular code are significantly enriched in the protein-coding genes of most organisms, from bacteria to eukaryotes. Here, we show that imprints of this code also exist in the ribosomal RNA (rRNA). In a large-scale study involving 133 organisms representative of the three domains of life, we identified 32 universal X motifs that are conserved in the rRNA of >90% of the organisms. Intriguingly, most of the universal X motifs are located in rRNA regions involved in important ribosome functions, notably in the peptidyl transferase center and the decoding center that form the original "proto-ribosome." Building on the existing accretion models for ribosome evolution, we propose that error-correcting circular codes represented an important step in the emergence of the modern genetic code. Thus, circular codes would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emergence of more sophisticated start codon recognition and translation initiation mechanisms.


Asunto(s)
Evolución Molecular , Código Genético , Motivos de Nucleótidos , Biosíntesis de Proteínas , Ribosomas/genética , Ribosomas/metabolismo , Modelos Biológicos , Modelos Moleculares , Conformación Molecular , Conformación de Ácido Nucleico , ARN Ribosómico/química , ARN Ribosómico/genética , Ribosomas/química , Relación Estructura-Actividad
2.
Nucleic Acids Res ; 47(D1): D411-D418, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30380106

RESUMEN

OrthoInspector is one of the leading software suites for orthology relations inference. In this paper, we describe a major redesign of the OrthoInspector online resource along with a significant increase in the number of species: 4753 organisms are now covered across the three domains of life, making OrthoInspector the most exhaustive orthology resource to date in terms of covered species (excluding viruses). The new website integrates original data exploration and visualization tools in an ergonomic interface. Distributions of protein orthologs are represented by heatmaps summarizing their evolutionary histories, and proteins with similar profiles can be directly accessed. Two novel tools have been implemented for comparative genomics: a phylogenetic profile search that can be used to find proteins with a specific presence-absence profile and investigate their functions and, inversely, a GO profiling tool aimed at deciphering evolutionary histories of molecular functions, processes or cell components. In addition to the re-designed website, the OrthoInspector resource now provides a REST interface for programmatic access. OrthoInspector 3.0 is available at http://lbgi.fr/orthoinspectorv3.


Asunto(s)
Bases de Datos Genéticas , Genómica , Algoritmos , Bacterias/genética , Clasificación , Eucariontes/genética , Evolución Molecular , Predicción , Ontología de Genes , Internet , Filogenia , Proteoma , Homología de Secuencia de Ácido Nucleico , Programas Informáticos , Especificidad de la Especie
3.
Mol Biol Evol ; 34(8): 2016-2034, 2017 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-28460059

RESUMEN

Cilia (flagella) are important eukaryotic organelles, present in the Last Eukaryotic Common Ancestor, and are involved in cell motility and integration of extracellular signals. Ciliary dysfunction causes a class of genetic diseases, known as ciliopathies, however current knowledge of the underlying mechanisms is still limited and a better characterization of genes is needed. As cilia have been lost independently several times during evolution and they are subject to important functional variation between species, ciliary genes can be investigated through comparative genomics. We performed phylogenetic profiling by predicting orthologs of human protein-coding genes in 100 eukaryotic species. The analysis integrated three independent methods to predict a consensus set of 274 ciliary genes, including 87 new promising candidates. A fine-grained analysis of the phylogenetic profiles allowed a partitioning of ciliary genes into modules with distinct evolutionary histories and ciliary functions (assembly, movement, centriole, etc.) and thus propagation of potential annotations to previously undocumented genes. The cilia/basal body localization was experimentally confirmed for five of these previously unannotated proteins (LRRC23, LRRC34, TEX9, WDR27, and BIVM), validating the relevance of our approach. Furthermore, our multi-level analysis sheds light on the core gene sets retained in gamete-only flagellates or Ecdysozoa for instance. By combining gene-centric and species-oriented analyses, this work reveals new ciliary and ciliopathy gene candidates and provides clues about the evolution of ciliary processes in the eukaryotic domain. Additionally, the positive and negative reference gene sets and the phylogenetic profile of human genes constructed during this study can be exploited in future work.


Asunto(s)
Cilios/genética , Ciliopatías/genética , Animales , Movimiento Celular/genética , Cilios/metabolismo , Ciliopatías/metabolismo , Bases de Datos de Ácidos Nucleicos , Eucariontes , Células Eucariotas , Evolución Molecular , Flagelos/genética , Flagelos/metabolismo , Genómica , Humanos , Filogenia , Análisis de Secuencia de ADN/métodos
4.
Hum Mutat ; 38(10): 1316-1324, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28608363

RESUMEN

Numerous mutations in each of the mitochondrial aminoacyl-tRNA synthetases (aaRSs) have been implicated in human diseases. The mutations are autosomal and recessive and lead mainly to neurological disorders, although with pleiotropic effects. The processes and interactions that drive the etiology of the disorders associated with mitochondrial aaRSs (mt-aaRSs) are far from understood. The complexity of the clinical, genetic, and structural data requires concerted, interdisciplinary efforts to understand the molecular biology of these disorders. Toward this goal, we designed MiSynPat, a comprehensive knowledge base together with an ergonomic Web server designed to organize and access all pertinent information (sequences, multiple sequence alignments, structures, disease descriptions, mutation characteristics, original literature) on the disease-linked human mt-aaRSs. With MiSynPat, a user can also evaluate the impact of a possible mutation on sequence-conservation-structure in order to foster the links between basic and clinical researchers and to facilitate future diagnosis. The proposed integrated view, coupled with research on disease-related mt-aaRSs, will help to reveal new functions for these enzymes and to open new vistas in the molecular biology of the cell. The purpose of MiSynPat, freely available at http://misynpat.org, is to constitute a reference and a converging resource for scientists and clinicians.


Asunto(s)
Aminoacil-ARNt Sintetasas/genética , Bases de Datos Genéticas , Mitocondrias/enzimología , Mutación/genética , Secuencia de Aminoácidos , Aminoacil-ARNt Sintetasas/química , Evolución Molecular , Enfermedades Genéticas Congénitas/genética , Humanos , Mitocondrias/genética , Estructura Molecular , Conformación Proteica
5.
J Med Internet Res ; 19(6): e212, 2017 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-28623182

RESUMEN

BACKGROUND: The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. OBJECTIVE: MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. METHODS: MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. RESULTS: MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user's specific interests and provides an efficient way to share information with collaborators. Furthermore, the user's behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. CONCLUSIONS: We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends.


Asunto(s)
Enfermedades Genéticas Congénitas/genética , Pruebas Genéticas/métodos , Red Social , Telemedicina/estadística & datos numéricos , Amigos , Humanos , Investigadores
6.
Bioinformatics ; 31(3): 447-8, 2015 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-25273105

RESUMEN

SUMMARY: We previously developed OrthoInspector, a package incorporating an original algorithm for the detection of orthology and inparalogy relations between different species. We have added new functionalities to the package. While its original algorithm was not modified, performing similar orthology predictions, we facilitated the prediction of very large databases (thousands of proteomes), refurbished its graphical interface, added new visualization tools for comparative genomics/protein family analysis and facilitated its deployment in a network environment. Finally, we have released three online databases of precomputed orthology relationships. AVAILABILITY: Package and databases are freely available at http://lbgi.fr/orthoinspector with all major browsers supported. CONTACT: odile.lecompte@unistra.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Gráficos por Computador , Bases de Datos Factuales , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
7.
Bioinformatics ; 29(20): 2643-4, 2013 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-23929031

RESUMEN

SUMMARY: We present PARSEC (PAtteRn Search and Contextualization), a new open source platform for guided discovery, allowing localization and biological characterization of short genomic sites in entire eukaryotic genomes. PARSEC can search for a sequence or a degenerated pattern. The retrieved set of genomic sites can be characterized in terms of (i) conservation in model organisms, (ii) genomic context (proximity to genes) and (iii) function of neighboring genes. These modules allow the user to explore, visualize, filter and extract biological knowledge from a set of short genomic regions such as transcription factor binding sites. AVAILABILITY: Web site implemented in Java, JavaScript and C++, with all major browsers supported. Freely available at lbgi.fr/parsec. Source code is freely available at sourceforge.net/projects/genomicparsec.


Asunto(s)
Genómica/métodos , Algoritmos , Genoma , Humanos , Internet , Dinámicas no Lineales , Lenguajes de Programación , Programas Informáticos
8.
Nucleic Acids Res ; 40(Web Server issue): W71-5, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22641855

RESUMEN

A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products. The KD4v (Comprehensible Knowledge Discovery System for Missense Variant) server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants. The server provides a set of rules learned by Induction Logic Programming (ILP) on a set of missense variants described by conservation, physico-chemical, functional and 3D structure predicates. These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation. The web server is available at http://decrypthon.igbmc.fr/kd4v.


Asunto(s)
Enfermedad/genética , Mutación Missense , Polimorfismo de Nucleótido Simple , Programas Informáticos , Estudios de Asociación Genética , Humanos , Internet , Bases del Conocimiento , Fenotipo , Proteínas/química , Proteínas/genética
9.
Genomics ; 101(3): 178-86, 2013 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-23147676

RESUMEN

TFIIH is a eukaryotic complex composed of two subcomplexes, the CAK (Cdk activating kinase) and the core-TFIIH. The core-TFIIH, composed of seven subunits (XPB, XPD, P62, P52, P44, P34, and P8), plays a crucial role in transcription and repair. Here, we performed an extended sequence analysis to establish the accurate phylogenetic distribution of the core-TFIIH in 63 eukaryotic organisms. In spite of the high conservation of the seven subunits at the sequence and genomic levels, the non-enzymatic P8, P34, P52 and P62 are absent from one or a few unicellular species. To gain insight into their respective roles, we undertook a comparative genomic analysis of the whole proteome to identify the gene sets sharing similar presence/absence patterns. While little information was inferred for P8 and P62, our studies confirm the known role of P52 in repair and suggest for the first time the implication of the core TFIIH in mRNA splicing via P34.


Asunto(s)
Evolución Molecular , Complejos Multiproteicos/genética , Filogenia , Factor de Transcripción TFIIH/genética , Animales , Quinasas Ciclina-Dependientes/genética , Proteínas de Unión al ADN , Humanos , Subunidades de Proteína/genética , Transcripción Genética
10.
BMC Genomics ; 13: 297, 2012 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-22748146

RESUMEN

BACKGROUND: Membrane trafficking involves the complex regulation of proteins and lipids intracellular localization and is required for metabolic uptake, cell growth and development. Different trafficking pathways passing through the endosomes are coordinated by the ENTH/ANTH/VHS adaptor protein superfamily. The endosomes are crucial for eukaryotes since the acquisition of the endomembrane system was a central process in eukaryogenesis. RESULTS: Our in silico analysis of this ENTH/ANTH/VHS superfamily, consisting of proteins gathered from 84 complete genomes representative of the different eukaryotic taxa, revealed that genomic distribution of this superfamily allows to discriminate Fungi and Metazoa from Plantae and Protists. Next, in a four way genome wide comparison, we showed that this discriminative feature is observed not only for other membrane trafficking effectors, but also for proteins involved in metabolism and in cytokinesis, suggesting that metabolism, cytokinesis and intracellular trafficking pathways co-evolved. Moreover, some of the proteins identified were implicated in multiple functions, in either trafficking and metabolism or trafficking and cytokinesis, suggesting that membrane trafficking is central to this co-evolution process. CONCLUSIONS: Our study suggests that membrane trafficking and compartmentalization were not only key features for the emergence of eukaryotic cells but also drove the separation of the eukaryotes in the different taxa.


Asunto(s)
Membrana Celular/metabolismo , Genómica/métodos , Transporte de Proteínas/fisiología , Proteínas/metabolismo , Evolución Biológica , Citocinesis/fisiología , Filogenia , Proteínas/química , Proteínas/clasificación
11.
Hum Mol Genet ; 19(2): 250-61, 2010 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-19843539

RESUMEN

Rod-derived Cone Viability Factor (RdCVF) is a trophic factor with therapeutic potential for the treatment of retinitis pigmentosa, a retinal disease that commonly results in blindness. RdCVF is encoded by Nucleoredoxin-like 1 (Nxnl1), a gene homologous with the family of thioredoxins that participate in the defense against oxidative stress. RdCVF expression is lost after rod degeneration in the first phase of retinitis pigmentosa, and this loss has been implicated in the more clinically significant secondary cone degeneration that often occurs. Here, we describe a study of the Nxnl1 promoter using an approach that combines promoter and transcriptomic analysis. By transfection of selected candidate transcription factors, chosen based upon their expression pattern, we identified the homeodomain proteins CHX10/VSX2, VSX1 and PAX4, as well as the zinc finger protein SP3, as factors that can stimulate both the mouse and human Nxnl1 promoter. In addition, CHX10/VSX2 binds to the Nxnl1 promoter in vivo. Since CHX10/VSX2 is expressed predominantly in the inner retina, this finding motivated us to demonstrate that RdCVF is expressed in the inner as well as the outer retina. Interestingly, the loss of rods in the rd1 mouse, a model of retinitis pigmentosa, is associated with decreased expression of RdCVF by inner retinal cells as well as by rods. Based upon these results, we propose an alternative therapeutic strategy aimed at recapitulating RdCVF expression in the inner retina, where cell loss is not significant, to prevent secondary cone death and central vision loss in patients suffering from retinitis pigmentosa.


Asunto(s)
Proteínas del Ojo/genética , Genes Homeobox , Proteínas de Homeodominio/metabolismo , Regiones Promotoras Genéticas , Retina/metabolismo , Tiorredoxinas/genética , Factores de Transcripción/metabolismo , Animales , Proteínas del Ojo/metabolismo , Regulación de la Expresión Génica , Proteínas de Homeodominio/genética , Humanos , Ratones , Ratones Endogámicos BALB C , Ratones Noqueados , Unión Proteica , Retinitis Pigmentosa/genética , Retinitis Pigmentosa/metabolismo , Tiorredoxinas/metabolismo , Factores de Transcripción/genética
12.
BMC Genomics ; 12: 530, 2011 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-22034982

RESUMEN

BACKGROUND: The deep-sea hydrothermal vent mussel Bathymodiolus azoricus harbors thiotrophic and methanotrophic symbiotic bacteria in its gills. While the symbiotic relationship between this hydrothermal mussel and these chemoautotrophic bacteria has been described, the molecular processes involved in the cross-talking between symbionts and host, in the maintenance of the symbiois, in the influence of environmental parameters on gene expression, and in transcriptome variation across individuals remain poorly understood. In an attempt to understand how, and to what extent, this double symbiosis affects host gene expression, we used a transcriptomic approach to identify genes potentially regulated by symbiont characteristics, environmental conditions or both. This study was done on mussels from two contrasting populations. RESULTS: Subtractive libraries allowed the identification of about 1000 genes putatively regulated by symbiosis and/or environmental factors. Microarray analysis showed that 120 genes (3.5% of all genes) were differentially expressed between the Menez Gwen (MG) and Rainbow (Rb) vent fields. The total number of regulated genes in mussels harboring a high versus a low symbiont content did not differ significantly. With regard to the impact of symbiont content, only 1% of all genes were regulated by thiotrophic (SOX) and methanotrophic (MOX) bacteria content in MG mussels whereas 5.6% were regulated in mussels collected at Rb. MOX symbionts also impacted a higher proportion of genes than SOX in both vent fields. When host transcriptome expression was analyzed with respect to symbiont gene expression, it was related to symbiont quantity in each field. CONCLUSIONS: Our study has produced a preliminary description of a transcriptomic response in a hydrothermal vent mussel host of both thiotrophic and methanotrophic symbiotic bacteria. This model can help to identify genes involved in the maintenance of symbiosis or regulated by environmental parameters. Our results provide evidence of symbiont effect on transcriptome regulation, with differences related to type of symbiont, even though the relative percentage of genes involved remains limited. Differences observed between the vent site indicate that environment strongly influences transcriptome regulation and impacts both activity and relative abundance of each symbiont. Among all these genes, those participating in recognition, the immune system, oxidative stress, and energy metabolism constitute new promising targets for extended studies on symbiosis and the effect of environmental parameters on the symbiotic relationships in B. azoricus.


Asunto(s)
Ambiente , Regulación de la Expresión Génica , Mytilidae/genética , Simbiosis/fisiología , Animales , Bacterias/enzimología , Bacterias/metabolismo , Biblioteca de Genes , Branquias/microbiología , Metanol/metabolismo , Oxigenasas de Función Mixta/genética , Oxigenasas de Función Mixta/metabolismo , Sulfato Adenililtransferasa/genética , Sulfato Adenililtransferasa/metabolismo , Compuestos de Sulfhidrilo/metabolismo , Transcriptoma
13.
Biosystems ; 203: 104368, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-33567309

RESUMEN

The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.


Asunto(s)
Codón/genética , Regulación de la Expresión Génica/genética , Motivos de Nucleótidos/genética , Código Genético/genética , Sistemas de Lectura , Ribosomas
14.
Genome Biol Evol ; 13(1)2021 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-33211099

RESUMEN

In the multiomics era, comparative genomics studies based on gene repertoire comparison are increasingly used to investigate evolutionary histories of species, to study genotype-phenotype relations, species adaptation to various environments, or to predict gene function using phylogenetic profiling. However, comparisons of orthologs have highlighted the prevalence of sequence plasticity among species, showing the benefits of combining protein and subprotein levels of analysis to allow for a more comprehensive study of genotype/phenotype correlations. In this article, we introduce a new approach called BLUR (BLAST Unexpected Ranking), capable of detecting genotype divergence or specialization between two related clades at different levels: gain/loss of proteins but also of subprotein regions. These regions can correspond to known domains, uncharacterized regions, or even small motifs. Our method was created to allow two types of research strategies: 1) the comparison of two groups of species with no previous knowledge, with the aim of predicting phenotype differences or specializations between close species or 2) the study of specific phenotypes by comparing species that present the phenotype of interest with species that do not. We designed a website to facilitate the use of BLUR with a possibility of in-depth analysis of the results with various tools, such as functional enrichments, protein-protein interaction networks, and multiple sequence alignments. We applied our method to the study of two different biological pathways and to the comparison of several groups of close species, all with very promising results. BLUR is freely available at http://lbgi.fr/blur/.


Asunto(s)
Evolución Molecular , Genómica/métodos , Proteínas/genética , Proteoma/genética , Proteoma/metabolismo , Animales , Proteínas del Dominio Armadillo , Bacterias , Secuencia Conservada/genética , Hongos , Genotipo , Humanos , Fenotipo , Filogenia , Alineación de Secuencia , Análisis de Secuencia , Programas Informáticos
15.
Nucleic Acids Res ; 35(Database issue): D815-22, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17135190

RESUMEN

Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database (http://www.peroxisomeDB.org) that includes the complete peroxisomal proteome of Homo sapiens and Saccharomyces cerevisiae, by gathering, updating and integrating the available genetic and functional information on peroxisomal genes. PeroxisomeDB is structured in interrelated sections 'Genes', 'Functions', 'Metabolic pathways' and 'Diseases', that include hyperlinks to selected features of NCBI, ENSEMBL and UCSC databases. We have designed graphical depictions of the main peroxisomal metabolic routes and have included updated flow charts for diagnosis. Precomputed BLAST, PSI-BLAST, multiple sequence alignment (MUSCLE) and phylogenetic trees are provided to assist in direct multispecies comparison to study evolutionary conserved functions and pathways. Highlights of the PeroxisomeDB include new tools developed for facilitating (i) identification of novel peroxisomal proteins, by means of identifying proteins carrying peroxisome targeting signal (PTS) motifs, (ii) detection of peroxisomes in silico, particularly useful for screening the deluge of newly sequenced genomes. PeroxisomeDB should contribute to the systematic characterization of the peroxisomal proteome and facilitate system biology approaches on the organelle.


Asunto(s)
Bases de Datos de Proteínas , Trastorno Peroxisomal/genética , Peroxisomas/metabolismo , Proteoma/genética , Proteoma/fisiología , Animales , Genómica , Humanos , Internet , Ratones , Señales de Clasificación de Proteína , Proteoma/química , Ratas , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiología , Programas Informáticos , Interfaz Usuario-Computador
16.
Biosystems ; 175: 57-74, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30367916

RESUMEN

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel, 2015, 2017; Arquès and Michel, 1996). This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code (Arquès and Michel, 1996). Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the reading frame in genes. In a recent study of the X motifs in the complete genome of the yeast, Saccharomyces cerevisiae, it was shown that they are significantly enriched in the reading frame of the genes (protein-coding regions) of the genome (Michel et al., 2017). It was suggested that these X motifs may be evolutionary relics of a primitive code originally used for gene translation. The aim of this paper is to address two questions: are X motifs conserved during evolution? and do they continue to play a functional role in the processes of genome decoding and protein production? In a large scale analysis involving complete genomes from four mammals and nine different yeast species, we highlight specific evolutionary pressures on the X motifs in the genes of all the genomes, and identify important new properties of X motif conservation at the level of the encoded amino acids. We then compare the occurrence of X motifs with existing experimental data concerning protein expression and protein production, and report a significant correlation between the number of X motifs in a gene and increased protein abundance. In a general way, this work suggests that motifs from circular codes, i.e. motifs having the property of reading frame retrieval, may represent functional elements located within the coding regions of extant genomes.


Asunto(s)
Algoritmos , Eucariontes/genética , Evolución Molecular , Código Genético , Genoma , Modelos Genéticos , Motivos de Nucleótidos , Animales , Secuencia de Bases , Eucariontes/fisiología , Homología de Secuencia
17.
BMC Genomics ; 9: 208, 2008 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-18457592

RESUMEN

BACKGROUND: The retina is a multi-layered sensory tissue that lines the back of the eye and acts at the interface of input light and visual perception. Its main function is to capture photons and convert them into electrical impulses that travel along the optic nerve to the brain where they are turned into images. It consists of neurons, nourishing blood vessels and different cell types, of which neural cells predominate. Defects in any of these cells can lead to a variety of retinal diseases, including age-related macular degeneration, retinitis pigmentosa, Leber congenital amaurosis and glaucoma. Recent progress in genomics and microarray technology provides extensive opportunities to examine alterations in retinal gene expression profiles during development and diseases. However, there is no specific database that deals with retinal gene expression profiling. In this context we have built RETINOBASE, a dedicated microarray database for retina. DESCRIPTION: RETINOBASE is a microarray relational database, analysis and visualization system that allows simple yet powerful queries to retrieve information about gene expression in retina. It provides access to gene expression meta-data and offers significant insights into gene networks in retina, resulting in better hypothesis framing for biological problems that can subsequently be tested in the laboratory. Public and proprietary data are automatically analyzed with 3 distinct methods, RMA, dChip and MAS5, then clustered using 2 different K-means and 1 mixture models method. Thus, RETINOBASE provides a framework to compare these methods and to optimize the retinal data analysis. RETINOBASE has three different modules, "Gene Information", "Raw Data System Analysis" and "Fold change system Analysis" that are interconnected in a relational schema, allowing efficient retrieval and cross comparison of data. Currently, RETINOBASE contains datasets from 28 different microarray experiments performed in 5 different model systems: drosophila, zebrafish, rat, mouse and human. The database is supported by a platform that is designed to easily integrate new functionalities and is also frequently updated. CONCLUSION: The results obtained from various biological scenarios can be visualized, compared and downloaded. The results of a case study are presented that highlight the utility of RETINOBASE. Overall, RETINOBASE provides efficient access to the global expression profiling of retinal genes from different organisms under various conditions.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Internet , Retina , Sistemas de Administración de Bases de Datos , Almacenamiento y Recuperación de la Información
18.
Nucleic Acids Res ; 34(Database issue): D338-43, 2006 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-16381882

RESUMEN

Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.


Asunto(s)
Codón de Terminación , Bases de Datos Genéticas , Mutación del Sistema de Lectura , Genoma Arqueal , Genoma Bacteriano , Bacillus/genética , Genómica , Internet , Mycobacterium smegmatis/genética , Homología de Secuencia de Aminoácido , Interfaz Usuario-Computador
19.
BMC Bioinformatics ; 8: 62, 2007 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-17319945

RESUMEN

BACKGROUND: The post-genomic era is characterised by a torrent of biological information flooding the public databases. As a direct consequence, similarity searches starting with a single query sequence frequently lead to the identification of hundreds, or even thousands of potential homologues. The huge volume of data renders the subsequent structural, functional and evolutionary analyses very difficult. It is therefore essential to develop new strategies for efficient sampling of this large sequence space, in order to reduce the number of sequences to be processed. At the same time, it is important to retain the most pertinent sequences for structural and functional studies. RESULTS: An exhaustive analysis on a large scale test set (284 protein families) was performed to compare the efficiency of four different sampling methods aimed at selecting the most pertinent sequences. These four methods sample the proteins detected by BlastP searches and can be divided into two categories: two customisable methods where the user defines either the maximal number or the percentage of sequences to be selected; two automatic methods in which the number of sequences selected is determined by the program. We focused our analysis on the potential information content of the sampled sets of sequences using multiple alignment of complete sequences as the main validation tool. The study considered two criteria: the total number of sequences in BlastP and their associated E-values. The subsequent analyses investigated the influence of the sampling methods on the E-value distributions, the sequence coverage, the final multiple alignment quality and the active site characterisation at various residue conservation thresholds as a function of these criteria. CONCLUSION: The comparative analysis of the four sampling methods allows us to propose a suitable sampling strategy that significantly reduces the number of homologous sequences required for alignment, while at the same time maintaining the relevant information concerning the active site residues.


Asunto(s)
Algoritmos , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Proteínas/química , Proteínas/metabolismo , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Secuencia Conservada , Sistemas de Administración de Bases de Datos , Datos de Secuencia Molecular , Homología de Secuencia de Aminoácido , Relación Estructura-Actividad
20.
Life (Basel) ; 7(4)2017 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-29207500

RESUMEN

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae. Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae. We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae, but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA