RESUMO
Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.
Assuntos
Bases de Dados Genéticas , Genômica , Internet , Software , Animais , Biologia Computacional , Genoma Bacteriano/genética , Genoma Fúngico/genética , Genoma de Planta/genética , Plantas/classificação , Plantas/genética , Vertebrados/classificação , Vertebrados/genéticaRESUMO
Gramene (http://www.gramene.org), a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops, supports agricultural researchers worldwide. The resource is committed to open access and reproducible science based on the FAIR data principles. Since the last NAR update, we made nine releases; doubled the genome portal's content; expanded curated genes, pathways and expression sets; and implemented the Domain Informational Vocabulary Extraction (DIVE) algorithm for extracting gene function information from publications. The current release, #63 (October 2020), hosts 93 reference genomes-over 3.9 million genes in 122 947 families with orthologous and paralogous classifications. Plant Reactome portrays pathway networks using a combination of manual biocuration in rice (320 reference pathways) and orthology-based projections to 106 species. The Reactome platform facilitates comparison between reference and projected pathways, gene expression analyses and overlays of gene-gene interactions. Gramene integrates ontology-based protein structure-function annotation; information on genetic, epigenetic, expression, and phenotypic diversity; and gene functional annotations extracted from plant-focused journals using DIVE. We train plant researchers in biocuration of genes and pathways; host curated maize gene structures as tracks in the maize genome browser; and integrate curated rice genes and pathways in the Plant Reactome.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Genômica/métodos , Proteínas de Plantas/genética , Plantas/genética , Produtos Agrícolas , Elementos de DNA Transponíveis , Duplicação Gênica , Ontologia Genética , Redes Reguladoras de Genes , Internet , Bases de Conhecimento , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Oryza/genética , Oryza/metabolismo , Proteínas de Plantas/metabolismo , Plantas/classificação , Plantas/metabolismo , Poliploidia , Mapeamento de Interação de Proteínas , Software , Zea mays/genética , Zea mays/metabolismoRESUMO
High-throughput phenotyping systems are powerful, dramatically changing our ability to document, measure, and detect biological phenomena. Here, we describe a cost-effective combination of a custom-built imaging platform and deep-learning-based computer vision pipeline. A minimal version of the maize (Zea mays) ear scanner was built with low-cost and readily available parts. The scanner rotates a maize ear while a digital camera captures a video of the surface of the ear, which is then digitally flattened into a two-dimensional projection. Segregating GFP and anthocyanin kernel phenotypes are clearly distinguishable in ear projections and can be manually annotated and analyzed using image analysis software. Increased throughput was attained by designing and implementing an automated kernel counting system using transfer learning and a deep learning object detection model. The computer vision model was able to rapidly assess over 390 000 kernels, identifying male-specific transmission defects across a wide range of GFP-marked mutant alleles. This includes a previously undescribed defect putatively associated with mutation of Zm00001d002824, a gene predicted to encode a vacuolar processing enzyme. Thus, by using this system, the quantification of transmission data and other ear and kernel phenotypes can be accelerated and scaled to generate large datasets for robust analyses.
Assuntos
Sementes/anatomia & histologia , Zea mays/anatomia & histologia , Análise Custo-Benefício , Conjuntos de Dados como Assunto , Aprendizado Profundo , Ensaios de Triagem em Larga Escala/economia , Ensaios de Triagem em Larga Escala/instrumentação , Ensaios de Triagem em Larga Escala/métodos , Fenótipo , Sementes/classificação , Gravação em Vídeo/métodos , Zea mays/classificaçãoRESUMO
Plant Reactome (https://plantreactome.gramene.org) is an open-source, comparative plant pathway knowledgebase of the Gramene project. It uses Oryza sativa (rice) as a reference species for manual curation of pathways and extends pathway knowledge to another 82 plant species via gene-orthology projection using the Reactome data model and framework. It currently hosts 298 reference pathways, including metabolic and transport pathways, transcriptional networks, hormone signaling pathways, and plant developmental processes. In addition to browsing plant pathways, users can upload and analyze their omics data, such as the gene-expression data, and overlay curated or experimental gene-gene interaction data to extend pathway knowledge. The curation team actively engages researchers and students on gene and pathway curation by offering workshops and online tutorials. The Plant Reactome supports, implements and collaborates with the wider community to make data and tools related to genes, genomes, and pathways Findable, Accessible, Interoperable and Re-usable (FAIR).
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica , Metabolômica , Plantas/genética , Plantas/metabolismo , Proteômica , Redes Reguladoras de Genes , Genômica/métodos , Humanos , Redes e Vias Metabólicas , Metabolômica/métodos , Proteômica/métodos , Transdução de Sinais , NavegadorRESUMO
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Variação Genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Algoritmos , Animais , Caenorhabditis elegans/genética , Genômica , Internet , Anotação de Sequência Molecular , Fenótipo , Plantas/genética , Valores de Referência , Software , Interface Usuário-ComputadorRESUMO
The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Plantas/genética , Produtos Agrícolas/genética , Curadoria de Dados , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Anotação de Sequência Molecular , Fenótipo , Software , Interface Usuário-ComputadorRESUMO
Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions.
Assuntos
Bases de Dados Genéticas , Animais , Perfilação da Expressão Gênica , Humanos , Mamíferos/genética , Mamíferos/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Plantas/genética , Plantas/metabolismo , Proteômica , Análise de Sequência de RNA , Especificidade da Espécie , Interface Usuário-ComputadorRESUMO
Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene-gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genômica/métodos , Bases de Conhecimento , Plantas/genética , Epigênese Genética , Ontologia Genética , Pesquisa em Genética , Variação Genética , Genoma de Planta , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/metabolismo , Software , Interface Usuário-ComputadorRESUMO
Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Plantas/genética , Plantas/metabolismo , Ferramenta de Busca , Genômica/métodos , Redes e Vias Metabólicas , Transdução de Sinais , Biologia de Sistemas/métodos , Interface Usuário-Computador , NavegadorRESUMO
Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to â¼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Plantas/metabolismo , Expressão Gênica , Variação Genética , Genômica , Internet , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Plantas/genéticaRESUMO
Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Genômica , Produtos Agrícolas/genética , Variação Genética , Internet , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/genética , Plantas/metabolismoRESUMO
The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
Assuntos
Genoma de Planta , Genômica/métodos , Plantas/anatomia & histologia , Plantas/genética , Software , Alquil e Aril Transferases/genética , Bases de Dados Genéticas , Flores/genética , Internet , Anotação de Sequência Molecular , Família Multigênica , Fenótipo , Folhas de Planta/anatomia & histologia , Proteínas de Plantas/genéticaRESUMO
Chia (Salvia hispanica L.) is one of the most popular nutrition-rich foods and pseudocereal crops of the family Lamiaceae. Chia seeds are a rich source of proteins, polyunsaturated fatty acids (PUFAs), dietary fibers, and antioxidants. In this study, we present the assembly of the chia reference genome, which spans 303.6 Mb and encodes 48,090 annotated protein-coding genes. Our analysis revealed that ~42% of the chia genome harbors repetitive content, and identified ~3 million single nucleotide polymorphisms (SNPs) and 15,380 simple sequence repeat (SSR) marker sites. By investigating the chia transcriptome, we discovered that ~44% of the genes undergo alternative splicing with a higher frequency of intron retention events. Additionally, we identified chia genes associated with important nutrient content and quality traits, such as the biosynthesis of PUFAs and seed mucilage fiber (dietary fiber) polysaccharides. Notably, this is the first report of in-silico annotation of a plant genome for protein-derived small bioactive peptides (biopeptides) associated with improving human health. To facilitate further research and translational applications of this valuable orphan crop, we have developed the Salvia genomics database (SalviaGDB), accessible at https://salviagdb.org.
RESUMO
PREMISE OF THE STUDY: Bio-ontologies are essential tools for accessing and analyzing the rapidly growing pool of plant genomic and phenomic data. Ontologies provide structured vocabularies to support consistent aggregation of data and a semantic framework for automated analyses and reasoning. They are a key component of the semantic web. METHODS: This paper provides background on what bio-ontologies are, why they are relevant to botany, and the principles of ontology development. It includes an overview of ontologies and related resources that are relevant to plant science, with a detailed description of the Plant Ontology (PO). We discuss the challenges of building an ontology that covers all green plants (Viridiplantae). KEY RESULTS: Ontologies can advance plant science in four keys areas: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education. CONCLUSIONS: Bio-ontologies offer a flexible framework for comparative plant biology, based on common botanical understanding. As genomic and phenomic data become available for more species, we anticipate that the annotation of data with ontology terms will become less centralized, while at the same time, the need for cross-species queries will become more common, causing more researchers in plant science to turn to ontologies.
Assuntos
Biologia Computacional/métodos , Plantas/genética , Botânica/métodos , Interpretação Estatística de Dados , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Genoma de Planta/genética , Genômica , Anotação de Sequência Molecular , Fenótipo , Plantas/anatomia & histologia , Plantas/classificação , Semântica , Terminologia como Assunto , Vocabulário ControladoRESUMO
Plant Reactome (https://plantreactome.gramene.org) and PubChem ( https://pubchem.ncbi.nlm.nih.gov ) are two reference data portals and resources for curated plant pathways, small molecules, metabolites, gene products, and macromolecular interactions. Plant Reactome knowledgebase, a conceptual plant pathway network, is built by biocuration and integrating (bio)chemical entities, gene products, and macromolecular interactions. It provides manually curated pathways for the reference species Oryza sativa (rice) and gene orthology-based projections that extend pathway knowledge to 106 plant species. Currently, it hosts 320 reference pathways for plant metabolism, hormone signaling, transport, genetic regulation, plant organ development and differentiation, and biotic and abiotic stress responses. In addition to the pathway browsing and search functions, the Plant Reactome provides the analysis tools for pathway comparison between reference and projected species, pathway enrichment in gene expression data, and overlay of gene-gene interaction data on pathways. PubChem, a popular reference database of (bio)chemical entities, provides information on small molecules and other types of chemical entities, such as siRNAs, miRNAs, lipids, carbohydrates, and chemically modified nucleotides. The data in PubChem is collected from hundreds of data sources, including Plant Reactome. This chapter provides a brief overview of the Plant Reactome and the PubChem knowledgebases, their association to other public resources providing accessory information, and how users can readily access the contents.
Assuntos
Bases de Conhecimento , Redes e Vias Metabólicas , Bases de Dados Factuais , Plantas/genética , Plantas/metabolismo , Proteínas/metabolismoRESUMO
Biocuration plays a crucial role in building databases and complex systems-level platforms required for processing, annotating and analyzing 'Big Data' in biology. However, biocuration efforts cannot keep pace with a dramatic increase in the production of omics data; this presents one of the bottlenecks in genomics. In two pathway curation jamborees, Plant Reactome curators tested strategies for introducing researchers to pathway curation tools, harnessing biologists' expertise in curating plant pathways and developing a network of community biocurators. We summarize the strategy, workflow and outcomes of these exercises, and discuss the role of community biocuration in advancing databases and genomic resources.
Assuntos
Curadoria de Dados/métodos , Bases de Dados Genéticas , Redes Reguladoras de Genes/genética , Genômica/métodos , Big Data , Mineração de Dados , Genes de Plantas/genética , Fluxo de TrabalhoRESUMO
Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationships to enrich the annotation of genomic data and provides tools to perform powerful comparative analyses across a wide spectrum of plant species. It consists of an integrated portal for querying, visualizing and analyzing data for 44 plant reference genomes, genetic variation data sets for 12 species, expression data for 16 species, curated rice pathways and orthology-based pathway projections for 66 plant species including various crops. Here we briefly describe the functions and uses of the Gramene database.
RESUMO
BACKGROUND: Large quantities of digital images are now generated for biological collections, including those developed in projects premised on the high-throughput screening of genome-phenome experiments. These images often carry annotations on taxonomy and observable features, such as anatomical structures and phenotype variations often recorded in response to the environmental factors under which the organisms were sampled. At present, most of these annotations are described in free text, may involve limited use of non-standard vocabularies, and rarely specify precise coordinates of features on the image plane such that a computer vision algorithm could identify, extract and annotate them. Therefore, researchers and curators need a tool that can identify and demarcate features in an image plane and allow their annotation with semantically contextual ontology terms. Such a tool would generate data useful for inter and intra-specific comparison and encourage the integration of curation standards. In the future, quality annotated image segments may provide training data sets for developing machine learning applications for automated image annotation. RESULTS: We developed a novel image segmentation and annotation software application, "Annotation of Image Segments with Ontologies" (AISO). The tool enables researchers and curators to delineate portions of an image into multiple highlighted segments and annotate them with an ontology-based controlled vocabulary. AISO is a freely available Java-based desktop application and runs on multiple platforms. It can be downloaded at http://www.plantontology.org/software/AISO. CONCLUSIONS: AISO enables curators and researchers to annotate digital images with ontology terms in a manner which ensures the future computational value of the annotated images. We foresee uses for such data-encoded image annotations in biological data mining, machine learning, predictive annotation, semantic inference, and comparative analyses.
RESUMO
BACKGROUND: Triticum monococcum (2n) is a close ancestor of T. urartu, the A-genome progenitor of cultivated hexaploid wheat, and is therefore a useful model for the study of components regulating photomorphogenesis in diploid wheat. In order to develop genetic and genomic resources for such a study, we constructed genome-wide transcriptomes of two Triticum monococcum subspecies, the wild winter wheat T. monococcum ssp. aegilopoides (accession G3116) and the domesticated spring wheat T. monococcum ssp. monococcum (accession DV92) by generating de novo assemblies of RNA-Seq data derived from both etiolated and green seedlings. PRINCIPAL FINDINGS: The de novo transcriptome assemblies of DV92 and G3116 represent 120,911 and 117,969 transcripts, respectively. We successfully mapped â¼90% of these transcripts from each accession to barley and â¼95% of the transcripts to T. urartu genomes. However, only â¼77% transcripts mapped to the annotated barley genes and â¼85% transcripts mapped to the annotated T. urartu genes. Differential gene expression analyses revealed 22% more light up-regulated and 35% more light down-regulated transcripts in the G3116 transcriptome compared to DV92. The DV92 and G3116 mRNA sequence reads aligned against the reference barley genome led to the identification of â¼500,000 single nucleotide polymorphism (SNP) and â¼22,000 simple sequence repeat (SSR) sites. CONCLUSIONS: De novo transcriptome assemblies of two accessions of the diploid wheat T. monococcum provide new empirical transcriptome references for improving Triticeae genome annotations, and insights into transcriptional programming during photomorphogenesis. The SNP and SSR sites identified in our analysis provide additional resources for the development of molecular markers.
Assuntos
Diploide , Transcriptoma/genética , Triticum/genética , Genoma de Planta/genética , Plântula/genéticaRESUMO
BACKGROUND: Next-generation sequencing and 'omics' platforms are used extensively in plant biology research to unravel new genomes and study their interactions with abiotic and biotic agents in the growth environment. Despite the availability of a large and growing number of genomic data sets, there are only limited resources providing highly-curated and up-to-date metabolic and regulatory networks for plant pathways. RESULTS: Using PathVisio, a pathway editor tool associated with WikiPathways, we created a gene interaction network of 430 rice (Oryza sativa) genes involved in the seed development process by curating interactions reported in the published literature. We then applied an InParanoid-based homology search to these genes and used the resulting gene clusters to identify 351 Arabidopsis thaliana genes. Using this list of homologous genes, we constructed a seed development network in Arabidopsis by processing the gene list and the rice network through a Perl utility software called Pathway GeneSWAPPER developed by us. In order to demonstrate the utility of these networks in generating testable hypotheses and preliminary analysis prior to more in-depth downstream analysis, we used the expression viewer and statistical analysis features of PathVisio to analyze publicly-available and published microarray gene expression data sets on diurnal photoperiod response and the seed development time course to discover patterns of coexpressed genes found in the rice and Arabidopsis seed development networks. These seed development networks described herein, along with other plant pathways and networks, are freely available on the plant pathways portal at WikiPathways (http://plants.wikipathways.org). CONCLUSION: In collaboration with the WikiPathways project we present a community curation and analysis platform for plant biologists where registered users can freely create, edit, share and monitor pathways supported by published literature. We describe the curation and annotation of a seed development network in rice, and the projection of a similar, gene homology-based network in Arabidopsis. We also demonstrate the utility of the Pathway GeneSWAPPER (PGS) application in saving valuable time and labor when a reference network in one species compiled in GPML format is used to project a similar network in another species based on gene homology.