Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Nature ; 530(7590): 331-5, 2016 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-26814964

RESUMEN

Seagrasses colonized the sea on at least three independent occasions to form the basis of one of the most productive and widespread coastal ecosystems on the planet. Here we report the genome of Zostera marina (L.), the first, to our knowledge, marine angiosperm to be fully sequenced. This reveals unique insights into the genomic losses and gains involved in achieving the structural and physiological adaptations required for its marine lifestyle, arguably the most severe habitat shift ever accomplished by flowering plants. Key angiosperm innovations that were lost include the entire repertoire of stomatal genes, genes involved in the synthesis of terpenoids and ethylene signalling, and genes for ultraviolet protection and phytochromes for far-red sensing. Seagrasses have also regained functions enabling them to adjust to full salinity. Their cell walls contain all of the polysaccharides typical of land plants, but also contain polyanionic, low-methylated pectins and sulfated galactans, a feature shared with the cell walls of all macroalgae and that is important for ion homoeostasis, nutrient uptake and O2/CO2 exchange through leaf epidermal cells. The Z. marina genome resource will markedly advance a wide range of functional ecological studies from adaptation of marine ecosystems under climate warming, to unravelling the mechanisms of osmoregulation under high salinities that may further inform our understanding of the evolution of salt tolerance in crop plants.


Asunto(s)
Adaptación Fisiológica/genética , Evolución Molecular , Genoma de Planta/genética , Agua de Mar , Zosteraceae/genética , Aclimatación/genética , Pared Celular/química , Etilenos/biosíntesis , Duplicación de Gen , Genes de Plantas/genética , Redes y Vías Metabólicas , Datos de Secuencia Molecular , Océanos y Mares , Osmorregulación/genética , Filogenia , Hojas de la Planta/metabolismo , Estomas de Plantas/genética , Polen/metabolismo , Salinidad , Tolerancia a la Sal/genética , Algas Marinas/genética , Terpenos/metabolismo
2.
BMC Biol ; 19(1): 1, 2021 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-33407428

RESUMEN

BACKGROUND: Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (~ 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. RESULTS: We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. CONCLUSION: These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage.


Asunto(s)
Evolución Biológica , ADN Protozoario/análisis , Dinoflagelados/citología , Dinoflagelados/genética , Orgánulos/fisiología , Proteínas Protozoarias/análisis , Secuencia de Bases , Evolución Molecular , Intrones/fisiología
3.
Nature ; 479(7374): 487-92, 2011 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-22113690

RESUMEN

The spider mite Tetranychus urticae is a cosmopolitan agricultural pest with an extensive host plant range and an extreme record of pesticide resistance. Here we present the completely sequenced and annotated spider mite genome, representing the first complete chelicerate genome. At 90 megabases T. urticae has the smallest sequenced arthropod genome. Compared with other arthropods, the spider mite genome shows unique changes in the hormonal environment and organization of the Hox complex, and also reveals evolutionary innovation of silk production. We find strong signatures of polyphagy and detoxification in gene families associated with feeding on different hosts and in new gene families acquired by lateral gene transfer. Deep transcriptome analysis of mites feeding on different plants shows how this pest responds to a changing host environment. The T. urticae genome thus offers new insights into arthropod evolution and plant-herbivore interactions, and provides unique opportunities for developing novel plant protection strategies.


Asunto(s)
Adaptación Fisiológica/genética , Genoma/genética , Herbivoria/genética , Tetranychidae/genética , Tetranychidae/fisiología , Adaptación Fisiológica/fisiología , Animales , Ecdisterona/análogos & derivados , Ecdisterona/genética , Evolución Molecular , Fibroínas/genética , Regulación de la Expresión Génica , Transferencia de Gen Horizontal/genética , Genes Homeobox/genética , Genómica , Herbivoria/fisiología , Datos de Secuencia Molecular , Muda/genética , Familia de Multigenes/genética , Nanoestructuras/química , Plantas/parasitología , Seda/biosíntesis , Seda/química , Transcriptoma/genética
5.
Nature ; 465(7298): 617-21, 2010 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-20520714

RESUMEN

Brown algae (Phaeophyceae) are complex photosynthetic organisms with a very different evolutionary history to green plants, to which they are only distantly related. These seaweeds are the dominant species in rocky coastal ecosystems and they exhibit many interesting adaptations to these, often harsh, environments. Brown algae are also one of only a small number of eukaryotic lineages that have evolved complex multicellularity (Fig. 1). We report the 214 million base pair (Mbp) genome sequence of the filamentous seaweed Ectocarpus siliculosus (Dillwyn) Lyngbye, a model organism for brown algae, closely related to the kelps (Fig. 1). Genome features such as the presence of an extended set of light-harvesting and pigment biosynthesis genes and new metabolic processes such as halide metabolism help explain the ability of this organism to cope with the highly variable tidal environment. The evolution of multicellularity in this lineage is correlated with the presence of a rich array of signal transduction genes. Of particular interest is the presence of a family of receptor kinases, as the independent evolution of related molecules has been linked with the emergence of multicellularity in both the animal and green plant lineages. The Ectocarpus genome sequence represents an important step towards developing this organism as a model species, providing the possibility to combine genomic and genetic approaches to explore these and other aspects of brown algal biology further.


Asunto(s)
Proteínas Algáceas/genética , Evolución Biológica , Genoma/genética , Phaeophyceae/citología , Phaeophyceae/genética , Animales , Eucariontes , Evolución Molecular , Datos de Secuencia Molecular , Phaeophyceae/metabolismo , Filogenia , Pigmentos Biológicos/biosíntesis , Transducción de Señal/genética
6.
Proc Natl Acad Sci U S A ; 108(22): 9166-71, 2011 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-21536894

RESUMEN

Rust fungi are some of the most devastating pathogens of crop plants. They are obligate biotrophs, which extract nutrients only from living plant tissues and cannot grow apart from their hosts. Their lifestyle has slowed the dissection of molecular mechanisms underlying host invasion and avoidance or suppression of plant innate immunity. We sequenced the 101-Mb genome of Melampsora larici-populina, the causal agent of poplar leaf rust, and the 89-Mb genome of Puccinia graminis f. sp. tritici, the causal agent of wheat and barley stem rust. We then compared the 16,399 predicted proteins of M. larici-populina with the 17,773 predicted proteins of P. graminis f. sp tritici. Genomic features related to their obligate biotrophic lifestyle include expanded lineage-specific gene families, a large repertoire of effector-like small secreted proteins, impaired nitrogen and sulfur assimilation pathways, and expanded families of amino acid and oligopeptide membrane transporters. The dramatic up-regulation of transcripts coding for small secreted proteins, secreted hydrolytic enzymes, and transporters in planta suggests that they play a role in host infection and nutrient acquisition. Some of these genomic hallmarks are mirrored in the genomes of other microbial eukaryotes that have independently evolved to infect plants, indicating convergent adaptation to a biotrophic existence inside plant cells.


Asunto(s)
Basidiomycota/genética , Hongos/genética , Triticum/microbiología , Perfilación de la Expresión Génica , Genes Fúngicos , Genoma , Genoma Fúngico , Modelos Genéticos , Nitratos/química , Análisis de Secuencia por Matrices de Oligonucleótidos , Filogenia , Enfermedades de las Plantas/microbiología , Hojas de la Planta/microbiología , Análisis de Secuencia de ADN , Sulfatos/química
7.
Mol Biol Evol ; 29(2): 849-59, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-21998273

RESUMEN

The number of introns varies considerably among different organisms. This can be explained by the differences in the rates of intron gain and loss. Two factors that are likely to influence these rates are selection for or against introns and the mutation rate that generates the novel intron or the intronless copy. Although it has been speculated that stronger selection for a compact genome might result in a higher rate of intron loss and a lower rate of intron gain, clear evidence is lacking, and the role of selection in determining these rates has not been established. Here, we studied the gain and loss of introns in the two closely related species Arabidopsis thaliana and A. lyrata as it was recently shown that A. thaliana has been undergoing a faster genome reduction driven by selection. We found that A. thaliana has lost six times more introns than A. lyrata since the divergence of the two species but gained very few introns. We suggest that stronger selection for genome reduction probably resulted in the much higher intron loss rate in A. thaliana, although further analysis is required as we could not find evidence that the loss rate increased in A. thaliana as opposed to having decreased in A. lyrata compared with the rate in the common ancestor. We also examined the pattern of the intron gains and losses to better understand the mechanisms by which they occur. Microsimilarity was detected between the splice sites of several gained and lost introns, suggesting that nonhomologous end joining repair of double-strand breaks might be a common pathway not only for intron gain but also for intron loss.


Asunto(s)
Arabidopsis/genética , Tamaño del Genoma , Inestabilidad Genómica , Intrones/genética , Roturas del ADN de Doble Cadena , Reparación del ADN , Evolución Molecular , Genoma de Planta , Modelos Genéticos , Mutación , Tasa de Mutación , Selección Genética
8.
Plant Biotechnol J ; 11(5): 605-17, 2013 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23433242

RESUMEN

Despite current advances in next-generation sequencing data analysis procedures, de novo assembly of a reference sequence required for SNP discovery and expression analysis is still a major challenge in genetically uncharacterized, highly heterozygous species. High levels of polymorphism inherent to outbreeding crop species hamper De Bruijn Graph-based de novo assembly algorithms, causing transcript fragmentation and the redundant assembly of allelic contigs. If multiple genotypes are sequenced to study genetic diversity, primary de novo assembly is best performed per genotype to limit the level of polymorphism and avoid transcript fragmentation. Here, we propose an Orthology Guided Assembly procedure that first uses sequence similarity (tBLASTn) to proteins of a model species to select allelic and fragmented contigs from all genotypes and then performs CAP3 clustering on a gene-by-gene basis. Thus, we simultaneously annotate putative orthologues for each protein of the model species, resolve allelic redundancy and fragmentation and create a de novo transcript sequence representing the consensus of all alleles present in the sequenced genotypes. We demonstrate the procedure using RNA-seq data from 14 genotypes of Lolium perenne to generate a reference transcriptome for gene discovery and translational research, to reveal the transcriptome-wide distribution and density of SNPs in an outbreeding crop and to illustrate the effect of polymorphisms on the assembly procedure. The results presented here illustrate that constructing a non-redundant reference sequence is essential for comparative genomics, orthology-based annotation and candidate gene selection but also for read mapping and subsequent polymorphism discovery and/or read count-based gene expression analysis.


Asunto(s)
Biología Computacional/métodos , Productos Agrícolas/genética , Variación Genética , Heterocigoto , Lolium/genética , Transcriptoma/genética , Regulación de la Expresión Génica de las Plantas , Sistemas de Lectura Abierta/genética , Filogenia , Polimorfismo de Nucleótido Simple/genética , Estándares de Referencia , Análisis de Secuencia de ADN
9.
Mol Plant Microbe Interact ; 25(3): 279-93, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22046958

RESUMEN

The obligate biotrophic rust fungus Melampsora larici-populina is the most devastating and widespread pathogen of poplars. Studies over recent years have identified various small secreted proteins (SSP) from plant biotrophic filamentous pathogens and have highlighted their role as effectors in host-pathogen interactions. The recent analysis of the M. larici-populina genome sequence has revealed the presence of 1,184 SSP-encoding genes in this rust fungus. In the present study, the expression and evolutionary dynamics of these SSP were investigated to pinpoint the arsenal of putative effectors that could be involved in the interaction between the rust fungus and poplar. Similarity with effectors previously described in Melampsora spp., richness in cysteines, and organization in large families were extensively detailed and discussed. Positive selection analyses conducted over clusters of paralogous genes revealed fast-evolving candidate effectors. Transcript profiling of selected M. laricipopulina SSP showed a timely coordinated expression during leaf infection, and the accumulation of four candidate effectors in distinct rust infection structures was demonstrated by immunolocalization. This integrated and multifaceted approach helps to prioritize candidate effector genes for functional studies.


Asunto(s)
Basidiomycota/genética , Proteínas Fúngicas/genética , Enfermedades de las Plantas/microbiología , Populus/microbiología , Evolución Biológica , Proteínas Fúngicas/metabolismo , Perfilación de la Expresión Génica , Genes Fúngicos/genética , Interacciones Huésped-Patógeno , Anotación de Secuencia Molecular , Familia de Multigenes/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Hojas de la Planta/microbiología , ARN de Hongos/genética , Factores de Tiempo
10.
New Phytol ; 194(4): 1001-1013, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22463738

RESUMEN

Parasitism and saprotrophic wood decay are two fungal strategies fundamental for succession and nutrient cycling in forest ecosystems. An opportunity to assess the trade-off between these strategies is provided by the forest pathogen and wood decayer Heterobasidion annosum sensu lato. We report the annotated genome sequence and transcript profiling, as well as the quantitative trait loci mapping, of one member of the species complex: H. irregulare. Quantitative trait loci critical for pathogenicity, and rich in transposable elements, orphan and secreted genes, were identified. A wide range of cellulose-degrading enzymes are expressed during wood decay. By contrast, pathogenic interaction between H. irregulare and pine engages fewer carbohydrate-active enzymes, but involves an increase in pectinolytic enzymes, transcription modules for oxidative stress and secondary metabolite production. Our results show a trade-off in terms of constrained carbohydrate decomposition and membrane transport capacity during interaction with living hosts. Our findings establish that saprotrophic wood decay and necrotrophic parasitism involve two distinct, yet overlapping, processes.


Asunto(s)
Basidiomycota/genética , Genoma Fúngico , Interacciones Huésped-Patógeno , Árboles/microbiología , Madera/microbiología , Mapeo Cromosómico , Perfilación de la Expresión Génica , Datos de Secuencia Molecular , Sitios de Carácter Cuantitativo
11.
BMC Genomics ; 12: 368, 2011 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-21767361

RESUMEN

BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant source of genetic variation among individuals of a species. New genotyping technologies allow examining hundreds to thousands of SNPs in a single reaction for a wide range of applications such as genetic diversity analysis, linkage mapping, fine QTL mapping, association studies, marker-assisted or genome-wide selection. In this paper, we evaluated the potential of highly-multiplexed SNP genotyping for genetic mapping in maritime pine (Pinus pinaster Ait.), the main conifer used for commercial plantation in southwestern Europe. RESULTS: We designed a custom GoldenGate assay for 1,536 SNPs detected through the resequencing of gene fragments (707 in vitro SNPs/Indels) and from Sanger-derived Expressed Sequenced Tags assembled into a unigene set (829 in silico SNPs/Indels). Offspring from three-generation outbred (G2) and inbred (F2) pedigrees were genotyped. The success rate of the assay was 63.6% and 74.8% for in silico and in vitro SNPs, respectively. A genotyping error rate of 0.4% was further estimated from segregating data of SNPs belonging to the same gene. Overall, 394 SNPs were available for mapping. A total of 287 SNPs were integrated with previously mapped markers in the G2 parental maps, while 179 SNPs were localized on the map generated from the analysis of the F2 progeny. Based on 98 markers segregating in both pedigrees, we were able to generate a consensus map comprising 357 SNPs from 292 different loci. Finally, the analysis of sequence homology between mapped markers and their orthologs in a Pinus taeda linkage map, made it possible to align the 12 linkage groups of both species. CONCLUSIONS: Our results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in maritime pine, a conifer species that has a genome seven times the size of the human genome. This SNP-array will be extended thanks to recent sequencing effort using new generation sequencing technologies and will include SNPs from comparative orthologous sequences that were identified in the present study, providing a wider collection of anchor points for comparative genomics among the conifers.


Asunto(s)
Pinus taeda/genética , Pinus/genética , Polimorfismo de Nucleótido Simple , Mapeo Cromosómico , Etiquetas de Secuencia Expresada , Genotipo , Análisis de Secuencia por Matrices de Oligonucleótidos , Linaje
13.
Curr Opin Plant Biol ; 10(2): 199-203, 2007 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-17289424

RESUMEN

Annotation of the first few complete plant genomes has revealed that plants have many genes. For Arabidopsis, over 26,500 gene loci have been predicted, whereas for rice, the number adds up to 41,000. Recent analysis of the poplar genome suggests more than 45,000 genes, and partial sequence data from Medicago and Lotus also suggest that these plants contain more than 40,000 genes. Nevertheless, estimations suggest that ancestral angiosperms had no more than 12,000-14,000 genes. One explanation for the large increase in gene number during angiosperm evolution is gene duplication. It has been shown previously that the retention of duplicates following small- and large-scale duplication events in plants is substantial. Taking into account the function of genes that have been duplicated, we are now beginning to understand why many plant genes might have been retained, and how their retention might be linked to the typical lifestyle of plants.


Asunto(s)
Genes de Plantas , Sistemas de Lectura Abierta/genética
14.
BMC Genomics ; 10: 288, 2009 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-19563678

RESUMEN

BACKGROUND: Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. RESULTS: In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. CONCLUSION: We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.


Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Análisis por Conglomerados , Perfilación de la Expresión Génica , Proteómica , Análisis de Secuencia de Proteína
15.
Bioinformatics ; 24(13): i24-31, 2008 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-18586720

RESUMEN

MOTIVATION: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work. RESULTS: Comparing the average structural profile based on base stacking energy of transcribed, promoter and intergenic sequences demonstrates that the core promoter has unique features that cannot be found in other sequences. We show that unsupervised clustering by using self-organizing maps can clearly distinguish between the structural profiles of promoter sequences and other genomic sequences. An implementation of this promoter prediction program, called ProSOM, is available and has been compared with the state-of-the-art. We propose an objective, accurate and biologically sound validation scheme for core promoter predictors. ProSOM performs at least as well as the software currently available, but our technique is more balanced in terms of the number of predicted sites and the number of false predictions, resulting in a better all-round performance. Additional tests on the ENCODE regions of the human genome show that 98% of all predictions made by ProSOM can be associated with transcriptionally active regions, which demonstrates the high precision. AVAILABILITY: Predictions for the human genome, the validation datasets and the program (ProSOM) are available upon request.


Asunto(s)
Inteligencia Artificial , Mapeo Cromosómico/métodos , Análisis por Conglomerados , Genoma Humano/genética , Regiones Promotoras Genéticas/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Secuencia de Bases , Humanos , Datos de Secuencia Molecular , Reconocimiento de Normas Patrones Automatizadas/métodos , Programas Informáticos
16.
Microb Cell Fact ; 8: 53, 2009 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-19835590

RESUMEN

The first genome sequences of the important yeast protein production host Pichia pastoris have been released into the public domain this spring. In order to provide the scientific community easy and versatile access to the sequence, two web-sites have been installed as a resource for genomic sequence, gene and protein information for P. pastoris: A GBrowse based genome browser was set up at http://www.pichiagenome.org and a genome portal with gene annotation and browsing functionality at http://bioinformatics.psb.ugent.be/webtools/bogas. Both websites are offering information on gene annotation and function, regulation and structure. In addition, a WiKi based platform allows all users to create additional information on genes, proteins, physiology and other items of P. pastoris research, so that the Pichia community can benefit from exchange of knowledge, data and materials.


Asunto(s)
Bases de Datos Genéticas , Pichia/genética , Genoma Fúngico , Programas Informáticos
17.
Bioinformatics ; 23(4): 414-20, 2007 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-17204465

RESUMEN

MOTIVATION: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential. RESULTS: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. SUPPLEMENTARY DATA: http://bioinformatics.psb.ugent.be/.


Asunto(s)
ADN Bacteriano/genética , ADN de Hongos/genética , ADN de Plantas/genética , Exones/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis de Secuencia de ADN/métodos , Vertebrados/genética , Algoritmos , Animales , Inteligencia Artificial , Secuencia de Bases , Datos de Secuencia Molecular , Sistemas de Lectura Abierta/genética
18.
Nucleic Acids Res ; 33(13): 4255-64, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16049029

RESUMEN

DNA encodes at least two independent levels of functional information. The first level is for encoding proteins and sequence targets for DNA-binding factors, while the second one is contained in the physical and structural properties of the DNA molecule itself. Although the physical and structural properties are ultimately determined by the nucleotide sequence itself, the cell exploits these properties in a way in which the sequence itself plays no role other than to support or facilitate certain spatial structures. In this work, we focus on these structural properties, comparing them between different organisms and assessing their ability to describe the core promoter. We prove the existence of distinct types of core promoters, based on a clustering of their structural profiles. These results indicate that the structural profiles are much conserved within plants (Arabidopsis and rice) and animals (human and mouse), but differ considerably between plants and animals. Furthermore, we demonstrate that these structural profiles can be an alternative way of describing the core promoter, in addition to more classical motif or IUPAC-based approaches. Using the structural profiles as discriminatory elements to separate promoter regions from non-promoter regions, reliable models can be built to identify core-promoter regions using a strictly computational approach.


Asunto(s)
Genoma de Planta , Genómica/métodos , Regiones Promotoras Genéticas , Animales , Arabidopsis/genética , Biología Computacional/métodos , ADN/química , Humanos , Ratones , Conformación de Ácido Nucleico , Oryza/genética
19.
Nucleic Acids Res ; 33(Database issue): D641-6, 2005 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-15608279

RESUMEN

Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Prot.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Bases de Datos Genéticas , Genes de Plantas , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/fisiología , Filosofía , Integración de Sistemas , Interfaz Usuario-Computador
20.
Nucleic Acids Res ; 30(19): 4103-17, 2002 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-12364589

RESUMEN

While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also briefly described and the resulting software classified according to both the method and the type of evidence used. Finally, the several difficulties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.


Asunto(s)
Biología Computacional/métodos , Genoma , Algoritmos , Empalme Alternativo/genética , Animales , Etiquetas de Secuencia Expresada , Genes/genética , Humanos , Alineación de Secuencia/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA