Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Nature ; 530(7590): 331-5, 2016 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-26814964

RESUMO

Seagrasses colonized the sea on at least three independent occasions to form the basis of one of the most productive and widespread coastal ecosystems on the planet. Here we report the genome of Zostera marina (L.), the first, to our knowledge, marine angiosperm to be fully sequenced. This reveals unique insights into the genomic losses and gains involved in achieving the structural and physiological adaptations required for its marine lifestyle, arguably the most severe habitat shift ever accomplished by flowering plants. Key angiosperm innovations that were lost include the entire repertoire of stomatal genes, genes involved in the synthesis of terpenoids and ethylene signalling, and genes for ultraviolet protection and phytochromes for far-red sensing. Seagrasses have also regained functions enabling them to adjust to full salinity. Their cell walls contain all of the polysaccharides typical of land plants, but also contain polyanionic, low-methylated pectins and sulfated galactans, a feature shared with the cell walls of all macroalgae and that is important for ion homoeostasis, nutrient uptake and O2/CO2 exchange through leaf epidermal cells. The Z. marina genome resource will markedly advance a wide range of functional ecological studies from adaptation of marine ecosystems under climate warming, to unravelling the mechanisms of osmoregulation under high salinities that may further inform our understanding of the evolution of salt tolerance in crop plants.


Assuntos
Adaptação Fisiológica/genética , Evolução Molecular , Genoma de Planta/genética , Água do Mar , Zosteraceae/genética , Aclimatação/genética , Parede Celular/química , Etilenos/biossíntese , Duplicação Gênica , Genes de Plantas/genética , Redes e Vias Metabólicas , Dados de Sequência Molecular , Oceanos e Mares , Osmorregulação/genética , Filogenia , Folhas de Planta/metabolismo , Estômatos de Plantas/genética , Pólen/metabolismo , Salinidade , Tolerância ao Sal/genética , Alga Marinha/genética , Terpenos/metabolismo
2.
BMC Biol ; 19(1): 1, 2021 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-33407428

RESUMO

BACKGROUND: Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (~ 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. RESULTS: We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. CONCLUSION: These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage.


Assuntos
Evolução Biológica , DNA de Protozoário/análise , Dinoflagellida/citologia , Dinoflagellida/genética , Organelas/fisiologia , Proteínas de Protozoários/análise , Sequência de Bases , Evolução Molecular , Íntrons/fisiologia
3.
Nature ; 479(7374): 487-92, 2011 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-22113690

RESUMO

The spider mite Tetranychus urticae is a cosmopolitan agricultural pest with an extensive host plant range and an extreme record of pesticide resistance. Here we present the completely sequenced and annotated spider mite genome, representing the first complete chelicerate genome. At 90 megabases T. urticae has the smallest sequenced arthropod genome. Compared with other arthropods, the spider mite genome shows unique changes in the hormonal environment and organization of the Hox complex, and also reveals evolutionary innovation of silk production. We find strong signatures of polyphagy and detoxification in gene families associated with feeding on different hosts and in new gene families acquired by lateral gene transfer. Deep transcriptome analysis of mites feeding on different plants shows how this pest responds to a changing host environment. The T. urticae genome thus offers new insights into arthropod evolution and plant-herbivore interactions, and provides unique opportunities for developing novel plant protection strategies.


Assuntos
Adaptação Fisiológica/genética , Genoma/genética , Herbivoria/genética , Tetranychidae/genética , Tetranychidae/fisiologia , Adaptação Fisiológica/fisiologia , Animais , Ecdisterona/análogos & derivados , Ecdisterona/genética , Evolução Molecular , Fibroínas/genética , Regulação da Expressão Gênica , Transferência Genética Horizontal/genética , Genes Homeobox/genética , Genômica , Herbivoria/fisiologia , Dados de Sequência Molecular , Muda/genética , Família Multigênica/genética , Nanoestruturas/química , Plantas/parasitologia , Seda/biossíntese , Seda/química , Transcriptoma/genética
5.
Nature ; 465(7298): 617-21, 2010 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-20520714

RESUMO

Brown algae (Phaeophyceae) are complex photosynthetic organisms with a very different evolutionary history to green plants, to which they are only distantly related. These seaweeds are the dominant species in rocky coastal ecosystems and they exhibit many interesting adaptations to these, often harsh, environments. Brown algae are also one of only a small number of eukaryotic lineages that have evolved complex multicellularity (Fig. 1). We report the 214 million base pair (Mbp) genome sequence of the filamentous seaweed Ectocarpus siliculosus (Dillwyn) Lyngbye, a model organism for brown algae, closely related to the kelps (Fig. 1). Genome features such as the presence of an extended set of light-harvesting and pigment biosynthesis genes and new metabolic processes such as halide metabolism help explain the ability of this organism to cope with the highly variable tidal environment. The evolution of multicellularity in this lineage is correlated with the presence of a rich array of signal transduction genes. Of particular interest is the presence of a family of receptor kinases, as the independent evolution of related molecules has been linked with the emergence of multicellularity in both the animal and green plant lineages. The Ectocarpus genome sequence represents an important step towards developing this organism as a model species, providing the possibility to combine genomic and genetic approaches to explore these and other aspects of brown algal biology further.


Assuntos
Proteínas de Algas/genética , Evolução Biológica , Genoma/genética , Phaeophyceae/citologia , Phaeophyceae/genética , Animais , Eucariotos , Evolução Molecular , Dados de Sequência Molecular , Phaeophyceae/metabolismo , Filogenia , Pigmentos Biológicos/biossíntese , Transdução de Sinais/genética
6.
Proc Natl Acad Sci U S A ; 108(22): 9166-71, 2011 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-21536894

RESUMO

Rust fungi are some of the most devastating pathogens of crop plants. They are obligate biotrophs, which extract nutrients only from living plant tissues and cannot grow apart from their hosts. Their lifestyle has slowed the dissection of molecular mechanisms underlying host invasion and avoidance or suppression of plant innate immunity. We sequenced the 101-Mb genome of Melampsora larici-populina, the causal agent of poplar leaf rust, and the 89-Mb genome of Puccinia graminis f. sp. tritici, the causal agent of wheat and barley stem rust. We then compared the 16,399 predicted proteins of M. larici-populina with the 17,773 predicted proteins of P. graminis f. sp tritici. Genomic features related to their obligate biotrophic lifestyle include expanded lineage-specific gene families, a large repertoire of effector-like small secreted proteins, impaired nitrogen and sulfur assimilation pathways, and expanded families of amino acid and oligopeptide membrane transporters. The dramatic up-regulation of transcripts coding for small secreted proteins, secreted hydrolytic enzymes, and transporters in planta suggests that they play a role in host infection and nutrient acquisition. Some of these genomic hallmarks are mirrored in the genomes of other microbial eukaryotes that have independently evolved to infect plants, indicating convergent adaptation to a biotrophic existence inside plant cells.


Assuntos
Basidiomycota/genética , Fungos/genética , Triticum/microbiologia , Perfilação da Expressão Gênica , Genes Fúngicos , Genoma , Genoma Fúngico , Modelos Genéticos , Nitratos/química , Análise de Sequência com Séries de Oligonucleotídeos , Filogenia , Doenças das Plantas/microbiologia , Folhas de Planta/microbiologia , Análise de Sequência de DNA , Sulfatos/química
7.
Mol Biol Evol ; 29(2): 849-59, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21998273

RESUMO

The number of introns varies considerably among different organisms. This can be explained by the differences in the rates of intron gain and loss. Two factors that are likely to influence these rates are selection for or against introns and the mutation rate that generates the novel intron or the intronless copy. Although it has been speculated that stronger selection for a compact genome might result in a higher rate of intron loss and a lower rate of intron gain, clear evidence is lacking, and the role of selection in determining these rates has not been established. Here, we studied the gain and loss of introns in the two closely related species Arabidopsis thaliana and A. lyrata as it was recently shown that A. thaliana has been undergoing a faster genome reduction driven by selection. We found that A. thaliana has lost six times more introns than A. lyrata since the divergence of the two species but gained very few introns. We suggest that stronger selection for genome reduction probably resulted in the much higher intron loss rate in A. thaliana, although further analysis is required as we could not find evidence that the loss rate increased in A. thaliana as opposed to having decreased in A. lyrata compared with the rate in the common ancestor. We also examined the pattern of the intron gains and losses to better understand the mechanisms by which they occur. Microsimilarity was detected between the splice sites of several gained and lost introns, suggesting that nonhomologous end joining repair of double-strand breaks might be a common pathway not only for intron gain but also for intron loss.


Assuntos
Arabidopsis/genética , Tamanho do Genoma , Instabilidade Genômica , Íntrons/genética , Quebras de DNA de Cadeia Dupla , Reparo do DNA , Evolução Molecular , Genoma de Planta , Modelos Genéticos , Mutação , Taxa de Mutação , Seleção Genética
8.
Plant Biotechnol J ; 11(5): 605-17, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23433242

RESUMO

Despite current advances in next-generation sequencing data analysis procedures, de novo assembly of a reference sequence required for SNP discovery and expression analysis is still a major challenge in genetically uncharacterized, highly heterozygous species. High levels of polymorphism inherent to outbreeding crop species hamper De Bruijn Graph-based de novo assembly algorithms, causing transcript fragmentation and the redundant assembly of allelic contigs. If multiple genotypes are sequenced to study genetic diversity, primary de novo assembly is best performed per genotype to limit the level of polymorphism and avoid transcript fragmentation. Here, we propose an Orthology Guided Assembly procedure that first uses sequence similarity (tBLASTn) to proteins of a model species to select allelic and fragmented contigs from all genotypes and then performs CAP3 clustering on a gene-by-gene basis. Thus, we simultaneously annotate putative orthologues for each protein of the model species, resolve allelic redundancy and fragmentation and create a de novo transcript sequence representing the consensus of all alleles present in the sequenced genotypes. We demonstrate the procedure using RNA-seq data from 14 genotypes of Lolium perenne to generate a reference transcriptome for gene discovery and translational research, to reveal the transcriptome-wide distribution and density of SNPs in an outbreeding crop and to illustrate the effect of polymorphisms on the assembly procedure. The results presented here illustrate that constructing a non-redundant reference sequence is essential for comparative genomics, orthology-based annotation and candidate gene selection but also for read mapping and subsequent polymorphism discovery and/or read count-based gene expression analysis.


Assuntos
Biologia Computacional/métodos , Produtos Agrícolas/genética , Variação Genética , Heterozigoto , Lolium/genética , Transcriptoma/genética , Regulação da Expressão Gênica de Plantas , Fases de Leitura Aberta/genética , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Padrões de Referência , Análise de Sequência de DNA
9.
Mol Plant Microbe Interact ; 25(3): 279-93, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22046958

RESUMO

The obligate biotrophic rust fungus Melampsora larici-populina is the most devastating and widespread pathogen of poplars. Studies over recent years have identified various small secreted proteins (SSP) from plant biotrophic filamentous pathogens and have highlighted their role as effectors in host-pathogen interactions. The recent analysis of the M. larici-populina genome sequence has revealed the presence of 1,184 SSP-encoding genes in this rust fungus. In the present study, the expression and evolutionary dynamics of these SSP were investigated to pinpoint the arsenal of putative effectors that could be involved in the interaction between the rust fungus and poplar. Similarity with effectors previously described in Melampsora spp., richness in cysteines, and organization in large families were extensively detailed and discussed. Positive selection analyses conducted over clusters of paralogous genes revealed fast-evolving candidate effectors. Transcript profiling of selected M. laricipopulina SSP showed a timely coordinated expression during leaf infection, and the accumulation of four candidate effectors in distinct rust infection structures was demonstrated by immunolocalization. This integrated and multifaceted approach helps to prioritize candidate effector genes for functional studies.


Assuntos
Basidiomycota/genética , Proteínas Fúngicas/genética , Doenças das Plantas/microbiologia , Populus/microbiologia , Evolução Biológica , Proteínas Fúngicas/metabolismo , Perfilação da Expressão Gênica , Genes Fúngicos/genética , Interações Hospedeiro-Patógeno , Anotação de Sequência Molecular , Família Multigênica/genética , Análise de Sequência com Séries de Oligonucleotídeos , Folhas de Planta/microbiologia , RNA Fúngico/genética , Fatores de Tempo
10.
New Phytol ; 194(4): 1001-1013, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22463738

RESUMO

Parasitism and saprotrophic wood decay are two fungal strategies fundamental for succession and nutrient cycling in forest ecosystems. An opportunity to assess the trade-off between these strategies is provided by the forest pathogen and wood decayer Heterobasidion annosum sensu lato. We report the annotated genome sequence and transcript profiling, as well as the quantitative trait loci mapping, of one member of the species complex: H. irregulare. Quantitative trait loci critical for pathogenicity, and rich in transposable elements, orphan and secreted genes, were identified. A wide range of cellulose-degrading enzymes are expressed during wood decay. By contrast, pathogenic interaction between H. irregulare and pine engages fewer carbohydrate-active enzymes, but involves an increase in pectinolytic enzymes, transcription modules for oxidative stress and secondary metabolite production. Our results show a trade-off in terms of constrained carbohydrate decomposition and membrane transport capacity during interaction with living hosts. Our findings establish that saprotrophic wood decay and necrotrophic parasitism involve two distinct, yet overlapping, processes.


Assuntos
Basidiomycota/genética , Genoma Fúngico , Interações Hospedeiro-Patógeno , Árvores/microbiologia , Madeira/microbiologia , Mapeamento Cromossômico , Perfilação da Expressão Gênica , Dados de Sequência Molecular , Locos de Características Quantitativas
11.
BMC Genomics ; 12: 368, 2011 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-21767361

RESUMO

BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant source of genetic variation among individuals of a species. New genotyping technologies allow examining hundreds to thousands of SNPs in a single reaction for a wide range of applications such as genetic diversity analysis, linkage mapping, fine QTL mapping, association studies, marker-assisted or genome-wide selection. In this paper, we evaluated the potential of highly-multiplexed SNP genotyping for genetic mapping in maritime pine (Pinus pinaster Ait.), the main conifer used for commercial plantation in southwestern Europe. RESULTS: We designed a custom GoldenGate assay for 1,536 SNPs detected through the resequencing of gene fragments (707 in vitro SNPs/Indels) and from Sanger-derived Expressed Sequenced Tags assembled into a unigene set (829 in silico SNPs/Indels). Offspring from three-generation outbred (G2) and inbred (F2) pedigrees were genotyped. The success rate of the assay was 63.6% and 74.8% for in silico and in vitro SNPs, respectively. A genotyping error rate of 0.4% was further estimated from segregating data of SNPs belonging to the same gene. Overall, 394 SNPs were available for mapping. A total of 287 SNPs were integrated with previously mapped markers in the G2 parental maps, while 179 SNPs were localized on the map generated from the analysis of the F2 progeny. Based on 98 markers segregating in both pedigrees, we were able to generate a consensus map comprising 357 SNPs from 292 different loci. Finally, the analysis of sequence homology between mapped markers and their orthologs in a Pinus taeda linkage map, made it possible to align the 12 linkage groups of both species. CONCLUSIONS: Our results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in maritime pine, a conifer species that has a genome seven times the size of the human genome. This SNP-array will be extended thanks to recent sequencing effort using new generation sequencing technologies and will include SNPs from comparative orthologous sequences that were identified in the present study, providing a wider collection of anchor points for comparative genomics among the conifers.


Assuntos
Pinus taeda/genética , Pinus/genética , Polimorfismo de Nucleotídeo Único , Mapeamento Cromossômico , Etiquetas de Sequências Expressas , Genótipo , Análise de Sequência com Séries de Oligonucleotídeos , Linhagem
13.
Curr Opin Plant Biol ; 10(2): 199-203, 2007 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-17289424

RESUMO

Annotation of the first few complete plant genomes has revealed that plants have many genes. For Arabidopsis, over 26,500 gene loci have been predicted, whereas for rice, the number adds up to 41,000. Recent analysis of the poplar genome suggests more than 45,000 genes, and partial sequence data from Medicago and Lotus also suggest that these plants contain more than 40,000 genes. Nevertheless, estimations suggest that ancestral angiosperms had no more than 12,000-14,000 genes. One explanation for the large increase in gene number during angiosperm evolution is gene duplication. It has been shown previously that the retention of duplicates following small- and large-scale duplication events in plants is substantial. Taking into account the function of genes that have been duplicated, we are now beginning to understand why many plant genes might have been retained, and how their retention might be linked to the typical lifestyle of plants.


Assuntos
Genes de Plantas , Fases de Leitura Aberta/genética
14.
BMC Genomics ; 10: 288, 2009 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-19563678

RESUMO

BACKGROUND: Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. RESULTS: In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. CONCLUSION: We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Análise por Conglomerados , Perfilação da Expressão Gênica , Proteômica , Análise de Sequência de Proteína
15.
Bioinformatics ; 24(13): i24-31, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18586720

RESUMO

MOTIVATION: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work. RESULTS: Comparing the average structural profile based on base stacking energy of transcribed, promoter and intergenic sequences demonstrates that the core promoter has unique features that cannot be found in other sequences. We show that unsupervised clustering by using self-organizing maps can clearly distinguish between the structural profiles of promoter sequences and other genomic sequences. An implementation of this promoter prediction program, called ProSOM, is available and has been compared with the state-of-the-art. We propose an objective, accurate and biologically sound validation scheme for core promoter predictors. ProSOM performs at least as well as the software currently available, but our technique is more balanced in terms of the number of predicted sites and the number of false predictions, resulting in a better all-round performance. Additional tests on the ENCODE regions of the human genome show that 98% of all predictions made by ProSOM can be associated with transcriptionally active regions, which demonstrates the high precision. AVAILABILITY: Predictions for the human genome, the validation datasets and the program (ProSOM) are available upon request.


Assuntos
Inteligência Artificial , Mapeamento Cromossômico/métodos , Análise por Conglomerados , Genoma Humano/genética , Regiões Promotoras Genéticas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Humanos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Software
16.
Microb Cell Fact ; 8: 53, 2009 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-19835590

RESUMO

The first genome sequences of the important yeast protein production host Pichia pastoris have been released into the public domain this spring. In order to provide the scientific community easy and versatile access to the sequence, two web-sites have been installed as a resource for genomic sequence, gene and protein information for P. pastoris: A GBrowse based genome browser was set up at http://www.pichiagenome.org and a genome portal with gene annotation and browsing functionality at http://bioinformatics.psb.ugent.be/webtools/bogas. Both websites are offering information on gene annotation and function, regulation and structure. In addition, a WiKi based platform allows all users to create additional information on genes, proteins, physiology and other items of P. pastoris research, so that the Pichia community can benefit from exchange of knowledge, data and materials.


Assuntos
Bases de Dados Genéticas , Pichia/genética , Genoma Fúngico , Software
17.
Bioinformatics ; 23(4): 414-20, 2007 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-17204465

RESUMO

MOTIVATION: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential. RESULTS: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. SUPPLEMENTARY DATA: http://bioinformatics.psb.ugent.be/.


Assuntos
DNA Bacteriano/genética , DNA Fúngico/genética , DNA de Plantas/genética , Éxons/genética , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de DNA/métodos , Vertebrados/genética , Algoritmos , Animais , Inteligência Artificial , Sequência de Bases , Dados de Sequência Molecular , Fases de Leitura Aberta/genética
18.
Nucleic Acids Res ; 33(13): 4255-64, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16049029

RESUMO

DNA encodes at least two independent levels of functional information. The first level is for encoding proteins and sequence targets for DNA-binding factors, while the second one is contained in the physical and structural properties of the DNA molecule itself. Although the physical and structural properties are ultimately determined by the nucleotide sequence itself, the cell exploits these properties in a way in which the sequence itself plays no role other than to support or facilitate certain spatial structures. In this work, we focus on these structural properties, comparing them between different organisms and assessing their ability to describe the core promoter. We prove the existence of distinct types of core promoters, based on a clustering of their structural profiles. These results indicate that the structural profiles are much conserved within plants (Arabidopsis and rice) and animals (human and mouse), but differ considerably between plants and animals. Furthermore, we demonstrate that these structural profiles can be an alternative way of describing the core promoter, in addition to more classical motif or IUPAC-based approaches. Using the structural profiles as discriminatory elements to separate promoter regions from non-promoter regions, reliable models can be built to identify core-promoter regions using a strictly computational approach.


Assuntos
Genoma de Planta , Genômica/métodos , Regiões Promotoras Genéticas , Animais , Arabidopsis/genética , Biologia Computacional/métodos , DNA/química , Humanos , Camundongos , Conformação de Ácido Nucleico , Oryza/genética
19.
Nucleic Acids Res ; 33(Database issue): D641-6, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608279

RESUMO

Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Prot.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Bases de Dados Genéticas , Genes de Plantas , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/fisiologia , Filosofia , Integração de Sistemas , Interface Usuário-Computador
20.
Nucleic Acids Res ; 30(19): 4103-17, 2002 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-12364589

RESUMO

While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also briefly described and the resulting software classified according to both the method and the type of evidence used. Finally, the several difficulties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.


Assuntos
Biologia Computacional/métodos , Genoma , Algoritmos , Processamento Alternativo/genética , Animais , Etiquetas de Sequências Expressas , Genes/genética , Humanos , Alinhamento de Sequência/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA