RESUMO
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Epigenoma , Anotação de Sequência Molecular , Algoritmos , Animais , Gráficos por Computador , Bases de Dados de Proteínas , Variação Genética , Estudo de Associação Genômica Ampla , Genômica , Histonas/metabolismo , Humanos , Imageamento Tridimensional , Internet , Ligantes , Ferramenta de Busca , Software , Especificidade da Espécie , Transcriptoma , Interface Usuário-Computador , NavegadorRESUMO
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Variação Genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Algoritmos , Animais , Caenorhabditis elegans/genética , Genômica , Internet , Anotação de Sequência Molecular , Fenótipo , Plantas/genética , Valores de Referência , Software , Interface Usuário-ComputadorRESUMO
The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.
Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Vertebrados/genética , Animais , Biologia Computacional/tendências , Humanos , Camundongos , Anotação de Sequência Molecular , SoftwareRESUMO
The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
Assuntos
Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma , Disseminação de Informação , Animais , Epigenômica , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Vertebrados/genética , NavegadorRESUMO
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Ferramenta de Busca , Software , Navegador , Animais , Mineração de Dados , Evolução Molecular , Regulação da Expressão Gênica , Variação Genética , Genoma Humano , Humanos , Anotação de Sequência Molecular , Especificidade da Espécie , VertebradosRESUMO
The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
Assuntos
Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Animais , Genes , Variação Genética , Humanos , Internet , Camundongos , Proteínas/genética , Ratos , Sequências Reguladoras de Ácido Nucleico , SoftwareRESUMO
Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Epigênese Genética , Variação Genética , Genoma Humano , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico , SoftwareRESUMO
Cells associated with veins of petioles of C(3) tobacco possess high activities of the decarboxylase enzymes required in C(4) photosynthesis. It is not clear whether this is the case in other C(3) species, nor whether these enzymes provide precursors for specific biosynthetic pathways. Here, we investigate the activity of C(4) acid decarboxylases in the mid-vein of Arabidopsis, identify regulatory regions sufficient for this activity, and determine the impact of removing individual isoforms of each protein on mid-vein metabolite profiles. This showed that radiolabelled malate and bicarbonate fed to the xylem stream were incorporated into soluble and insoluble material in the mid-vein of Arabidopsis leaves. Compared with the leaf lamina, mid-veins possessed high activities of NADP-dependent malic enzyme (NADP-ME), NAD-dependent malic enzyme (NAD-ME) and phosphoenolpyruvate carboxykinase (PEPCK). Transcripts derived from both NAD-ME, one PCK and two of the four NADP-ME genes were detectable in these veinal cells. The promoters of each decarboxylase gene were sufficient for expression in mid-veins. Analysis of insertional mutants revealed that cytosolic NADP-ME2 is responsible for 80% of NADP-ME activity in mid-veins. Removing individual decarboxylases affected the abundance of amino acids derived from pyruvate and phosphoenolpyruvate. Reducing cytosolic NADP-ME activity preferentially affected the sugar content, whereas abolishing NAD-ME affected both the amino acid and the glucosamine content of mid-veins.
Assuntos
Aminoácidos/metabolismo , Arabidopsis/enzimologia , Arabidopsis/metabolismo , Metabolismo dos Carboidratos/fisiologia , Fotossíntese/fisiologia , Arabidopsis/genética , Metabolismo dos Carboidratos/genética , Radioisótopos de Carbono/metabolismo , Cromatografia em Camada Fina , Malato Desidrogenase/genética , Malato Desidrogenase/fisiologia , Malatos/metabolismo , Mutagênese Insercional , Fosfoenolpiruvato Carboxilase/genética , Fosfoenolpiruvato Carboxilase/fisiologia , Fotossíntese/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , XilemaRESUMO
Cells associated with veins of C(3) species often contain significant amounts of chlorophyll, and radiotracer analysis shows that carbon present in the transpiration stream may be used for photosynthesis in these cells. It is not clear whether CO2 is also supplied to these cells close to veins via stomata, nor whether this veinal photosynthesis supplies carbon skeletons to particular metabolic pathways. In addition, it has not been possible to determine whether photosynthesis in cells close to veins of C(3) plants is quantitatively important for growth or fitness. To investigate the role of photosynthesis in cells in and around the veins of C(3) plants, we have trans-activated a hairpin construct to the chlorophyll synthase gene (CS) using an Arabidopsis thaliana enhancer trap line specific to veins. CS is responsible for addition of the phytol chain to the tetrapyrolle head group of chlorophyll, and, as a result of cell-specific trans-activation of the hairpin to CS, chlorophyll accumulation is reduced around veins. We use these plants to show that, under steady-state conditions, the extent to which CO2 is supplied to cells close to veins via stomata is limited. Fixation by minor veins of CO2 supplied to the xylem stream and the amount of specific metabolites associated with carbohydrate metabolism and the shikimate pathway were all reduced. In addition, an abundance of transcripts encoding components of pathways that generate phosphoenolpyruvate were altered. Leaf senescence, growth rate and seed size were all reduced in the lines with lower photosynthetic ability in veins and in cells close to veins.
Assuntos
Arabidopsis/fisiologia , Clorofila/biossíntese , Fotossíntese , Ácido Chiquímico/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Metabolismo dos Carboidratos , Dióxido de Carbono/metabolismo , Carbono-Oxigênio Ligases/genética , Carbono-Oxigênio Ligases/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Plantas Geneticamente Modificadas/genética , Plantas Geneticamente Modificadas/metabolismo , Plantas Geneticamente Modificadas/fisiologia , Interferência de RNA , RNA de Plantas/metabolismoRESUMO
Tree disease epidemics are a global problem, impacting food security, biodiversity and national economies. The potential for conservation and breeding in trees is hampered by complex genomes and long lifecycles, with most species lacking genomic resources. The European Ash tree Fraxinus excelsior is being devastated by the fungal pathogen Hymenoscyphus fraxineus, which causes ash dieback disease. Taking this system as an example and utilizing Associative Transcriptomics for the first time in a plant pathology study, we discovered gene sequence and gene expression variants across a genetic diversity panel scored for disease symptoms and identified markers strongly associated with canopy damage in infected trees. Using these markers we predicted phenotypes in a test panel of unrelated trees, successfully identifying individuals with a low level of susceptibility to the disease. Co-expression analysis suggested that pre-priming of defence responses may underlie reduced susceptibility to ash dieback.