RESUMO
In The Institute for Genomic Research Rice Genome Annotation project (http://rice.tigr.org), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42,653 non-transposable element-related genes encoding 49,472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13,237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31,739 gene models), representing approximately 50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Oryza/genética , Elementos de DNA Transponíveis , DNA Complementar/química , Etiquetas de Sequências Expressas/química , Expressão Gênica , Internet , Oryza/metabolismo , Proteômica , Interface Usuário-ComputadorRESUMO
BACKGROUND: The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs) for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale. RESULTS: All available ESTs and Expressed Transcripts (ETs), 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana), were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55-81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28-58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16-19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices. CONCLUSION: Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.
Assuntos
Genes de Plantas , Genoma de Planta , Solanaceae/genética , Sequência Conservada , DNA Complementar/metabolismo , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Biblioteca Gênica , Genoma , Internet , Modelos Genéticos , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Filogenia , Proteínas de Plantas , RNA Mensageiro/metabolismo , Análise de Sequência de DNA , SoftwareRESUMO
Fusarium verticillioides (teleomorph Gibberella moniliformis) is a pathogen of maize worldwide and produces fumonisins, a family of mycotoxins that have been associated with several animal diseases as well as cancer in humans. In this study, we sought to identify fungal genes that affect fumonisin production and/or the plant-fungal interaction. We generated over 87,000 expressed sequence tags from nine different cDNA libraries that correspond to 11,119 unique sequences and are estimated to represent 80% of the genomic complement of genes. A comparative analysis of the libraries showed that all 15 genes in the fumonisin gene cluster were differentially expressed. In addition, nine candidate fumonisin regulatory genes and a number of genes that may play a role in plant-fungal interaction were identified. Analysis of over 700 FUM gene transcripts from five different libraries provided evidence for transcripts with unspliced introns and spliced introns with alternative 3' splice sites. The abundance of the alternative splice forms and the frequency with which they were found for genes involved in the biosynthesis of a single family of metabolites as well as their differential expression suggest they may have a biological function. Finally, analysis of an EST that aligns to genomic sequence between FUM12 and FUM13 provided evidence for a previously unidentified gene (FUM20) in the FUM gene cluster.
Assuntos
Etiquetas de Sequências Expressas , Fumonisinas/metabolismo , Fusarium/genética , Perfilação da Expressão Gênica , Biblioteca Gênica , Genes Fúngicos , Sequência de Aminoácidos , Sequência de Bases , DNA Fúngico/química , DNA Fúngico/genética , Fusarium/metabolismo , Regulação Fúngica da Expressão Gênica , Genes Reguladores , Íntrons , Dados de Sequência Molecular , Processamento Pós-Transcricional do RNA , RNA Fúngico/genética , RNA Mensageiro/genética , Análise de Sequência de DNARESUMO
The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Dados de Sequência MolecularRESUMO
TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Ácidos Nucleicos , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Análise por Conglomerados , Regulação da Expressão Gênica/genética , Homologia de Sequência , SoftwareRESUMO
Comparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA;
Assuntos
Células Eucarióticas , Genes/genética , Alinhamento de Sequência/métodos , Algoritmos , Animais , Bovinos , Biologia Computacional/métodos , Sequência Consenso/genética , Bases de Dados Genéticas , Células Eucarióticas/química , Células Eucarióticas/metabolismo , Genoma Humano , Humanos , Camundongos , Filogenia , Ratos , Homologia de Sequência do Ácido NucleicoRESUMO
The cultivated potato (Solanum tuberosum) shares similar biology with other members of the Solanaceae, yet has features unique within the family, such as modified stems (stolons) that develop into edible tubers. To better understand potato biology, we have undertaken a survey of the potato transcriptome using expressed sequence tags (ESTs) from diverse tissues. A total of 61,940 ESTs were generated from aerial tissues, below-ground tissues, and tissues challenged with the late-blight pathogen (Phytophthora infestans). Clustering and assembly of these ESTs resulted in a total of 19,892 unique sequences with 8,741 tentative consensus sequences and 11,151 singleton ESTs. We were able to identify a putative function for 43.7% of these sequences. A number of sequences (48) were expressed throughout the libraries sampled, representing constitutively expressed sequences. Other sequences (13,068, 21%) were uniquely expressed and were detected only in a single library. Using hierarchal and k means clustering of the EST sequences, we were able to correlate changes in gene expression with major physiological events in potato biology. Using pair-wise comparisons of tuber-related tissues, we were able to associate genes with tuber initiation, dormancy, and sprouting. We also were able to identify a number of characterized as well as novel sequences that were unique to the incompatible interaction of late-blight pathogen, thereby providing a foundation for further understanding the mechanism of resistance.