Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Nat Commun ; 10(1): 2373, 2019 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-31147538

RESUMO

We aimed to develop an efficient, flexible and scalable approach to diagnostic genome-wide sequence analysis of genetically heterogeneous clinical presentations. Here we present G2P ( www.ebi.ac.uk/gene2phenotype ) as an online system to establish, curate and distribute datasets for diagnostic variant filtering via association of allelic requirement and mutational consequence at a defined locus with phenotypic terms, confidence level and evidence links. An extension to Ensembl Variant Effect Predictor (VEP), VEP-G2P was used to filter both disease-associated and control whole exome sequence (WES) with Developmental Disorders G2P (G2PDD; 2044 entries). VEP-G2PDD shows a sensitivity/precision of 97.3%/33% for de novo and 81.6%/22.7% for inherited pathogenic genotypes respectively. Many of the missing genotypes are likely false-positive pathogenic assignments. The expected number and discriminative features of background genotypes are defined using control WES. Using only human genetic data VEP-G2P performs well compared to other freely-available diagnostic systems and future phenotypic matching capabilities should further enhance performance.


Assuntos
Deficiências do Desenvolvimento/genética , Sequenciamento do Exoma , Testes Genéticos , Genoma Humano , Alelos , Genótipo , Humanos , Técnicas de Diagnóstico Molecular , Mutação , Fenótipo , Análise de Sequência de DNA , Sequenciamento Completo do Genoma
2.
Database (Oxford) ; 20182018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30576484

RESUMO

The major goal of sequencing humans and many other species is to understand the link between genomic variation, phenotype and disease. There are numerous valuable and well-established variation resources, but collating and making sense of non-homogeneous, often large-scale data sets from disparate sources remains a challenge. Without a systematic catalogue of these data and appropriate query and annotation tools, understanding the genome sequence of an individual and assessing their disease risk is impossible. In Ensembl, we substantially solve this problem: we develop methods to facilitate data integration and broad access; aggregate information in a consistent manner and make it available a variety of standard formats, both visually and programmatically; build analysis pipelines to compare variants to comprehensive genomic annotation sets; and make all tools and data publicly available.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , Algoritmos , Humanos , Análise de Sequência de DNA , Interface Usuário-Computador
3.
Nat Commun ; 9(1): 4128, 2018 10 08.
Artigo em Inglês | MEDLINE | ID: mdl-30297836

RESUMO

Selecting the most appropriate protein sequences is critical for precision drug design. Here we describe Haplosaurus, a bioinformatic tool for computation of protein haplotypes. Haplosaurus computes protein haplotypes from pre-existing chromosomally-phased genomic variation data. Integration into the Ensembl resource provides rapid and detailed protein haplotypes retrieval. Using Haplosaurus, we build a database of unique protein haplotypes from the 1000 Genomes dataset reflecting real-world protein sequence variability and their prevalence. For one in seven genes, their most common protein haplotype differs from the reference sequence and a similar number differs on their most common haplotype between human populations. Three case studies show how knowledge of the range of commonly encountered protein forms predicted in populations leads to insights into therapeutic efficacy. Haplosaurus and its associated database is expected to find broad applications in many disciplines using protein sequences and particularly impactful for therapeutics design.


Assuntos
Biologia Computacional/métodos , Desenho de Fármacos , Haplótipos , Medicina de Precisão/métodos , Proteínas/genética , Desenho Assistido por Computador , Genoma Humano/genética , Genômica/métodos , Humanos , Proteoma/genética , Reprodutibilidade dos Testes , Software
4.
Nucleic Acids Res ; 46(D1): D754-D761, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29155950

RESUMO

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.


Assuntos
Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma , Disseminação de Informação , Animais , Epigenômica , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Vertebrados/genética , Navegador
5.
Nucleic Acids Res ; 45(D1): D635-D642, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899575

RESUMO

Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Ferramenta de Busca , Software , Navegador , Animais , Mineração de Dados , Evolução Molecular , Regulação da Expressão Gênica , Variação Genética , Genoma Humano , Humanos , Anotação de Sequência Molecular , Especificidade da Espécie , Vertebrados
6.
Genome Biol ; 17(1): 122, 2016 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-27268795

RESUMO

The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.


Assuntos
Variação Genética , Anotação de Sequência Molecular/métodos , Software , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Genômica , Humanos , Internet
7.
Nucleic Acids Res ; 44(D1): D710-6, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26687719

RESUMO

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.


Assuntos
Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Animais , Genes , Variação Genética , Humanos , Internet , Camundongos , Proteínas/genética , Ratos , Sequências Reguladoras de Ácido Nucleico , Software
8.
Bioinformatics ; 31(1): 143-5, 2015 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-25236461

RESUMO

MOTIVATION: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. AVAILABILITY AND IMPLEMENTATION: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Linguagens de Programação , Software , Variação Genética , Genômica , Humanos
9.
Nucleic Acids Res ; 43(Database issue): D662-9, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25352552

RESUMO

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Epigênese Genética , Variação Genética , Genoma Humano , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico , Software
10.
Elife ; 3: e02626, 2014 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-25279814

RESUMO

As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.


Assuntos
Fígado/metabolismo , Mamíferos/metabolismo , Transdução de Sinais , Fatores de Transcrição/metabolismo , Animais , Coagulação Sanguínea/genética , Imunoprecipitação da Cromatina , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Genômica , Humanos , Metabolismo dos Lipídeos/genética , Masculino , Anotação de Sequência Molecular , Especificidade de Órgãos , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica , Sequências Reguladoras de Ácido Nucleico/genética , Especificidade da Espécie
11.
Nucleic Acids Res ; 42(Database issue): D749-55, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24316576

RESUMO

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Cordados/genética , Variação Genética , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Fenótipo , Ratos
12.
Nucleic Acids Res ; 41(Database issue): D48-55, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203987

RESUMO

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Regulação da Expressão Gênica , Variação Genética , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Ratos , Software , Peixe-Zebra/genética
13.
Nucleic Acids Res ; 40(Database issue): D84-90, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22086963

RESUMO

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Regulação da Expressão Gênica , Variação Genética , Humanos , Camundongos , Anotação de Sequência Molecular , Ratos
15.
Nucleic Acids Res ; 39(Database issue): D800-6, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21045057

RESUMO

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Variação Genética , Humanos , Camundongos , Anotação de Sequência Molecular , Ratos , Sequências Reguladoras de Ácido Nucleico , Software , Peixe-Zebra/genética
16.
Bioinformatics ; 26(16): 2069-70, 2010 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-20562413

RESUMO

SUMMARY: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species. AVAILABILITY: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software.


Assuntos
Variação Genética , Genômica , Polimorfismo de Nucleotídeo Único , Software , Internet
17.
BMC Genomics ; 11: 293, 2010 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-20459805

RESUMO

BACKGROUND: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. DESCRIPTION: The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. CONCLUSIONS: Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.


Assuntos
Bases de Dados Genéticas , Variação Genética , Genômica/métodos , Algoritmos , Animais , Sequência de Bases , Bovinos , Genótipo , Humanos , Internet , Desequilíbrio de Ligação , Camundongos , Fenótipo , Filogenia , Polimorfismo de Nucleotídeo Único , Ratos , Análise de Sequência de DNA , Interface Usuário-Computador
18.
BMC Bioinformatics ; 11: 238, 2010 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-20459810

RESUMO

BACKGROUND: Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources. RESULTS: Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments. CONCLUSIONS: Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at http://www.ensembl.org.


Assuntos
Bases de Dados Factuais , Genômica/métodos , Genótipo , Análise de Sequência de DNA/métodos , Genoma , Fenótipo
19.
Genome Res ; 20(6): 791-803, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20430781

RESUMO

The spontaneously hypertensive rat (SHR) is the most widely studied animal model of hypertension. Scores of SHR quantitative loci (QTLs) have been mapped for hypertension and other phenotypes. We have sequenced the SHR/OlaIpcv genome at 10.7-fold coverage by paired-end sequencing on the Illumina platform. We identified 3.6 million high-quality single nucleotide polymorphisms (SNPs) between the SHR/OlaIpcv and Brown Norway (BN) reference genome, with a high rate of validation (sensitivity 96.3%-98.0% and specificity 99%-100%). We also identified 343,243 short indels between the SHR/OlaIpcv and reference genomes. These SNPs and indels resulted in 161 gain or loss of stop codons and 629 frameshifts compared with the BN reference sequence. We also identified 13,438 larger deletions that result in complete or partial absence of 107 genes in the SHR/OlaIpcv genome compared with the BN reference and 588 copy number variants (CNVs) that overlap with the gene regions of 688 genes. Genomic regions containing genes whose expression had been previously mapped as cis-regulated expression quantitative trait loci (eQTLs) were significantly enriched with SNPs, short indels, and larger deletions, suggesting that some of these variants have functional effects on gene expression. Genes that were affected by major alterations in their coding sequence were highly enriched for genes related to ion transport, transport, and plasma membrane localization, providing insights into the likely molecular and cellular basis of hypertension and other phenotypes specific to the SHR strain. This near complete catalog of genomic differences between two extensively studied rat strains provides the starting point for complete elucidation, at the molecular level, of the physiological and pathophysiological phenotypic differences between individuals from these strains.


Assuntos
Hipertensão/genética , Animais , Códon de Terminação , Dosagem de Genes , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Ratos , Ratos Endogâmicos SHR , Transcrição Gênica
20.
Genome Med ; 2(4): 24, 2010 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-20398331

RESUMO

As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specific purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-file record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)-approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants affecting human health. Further information can be found on the LRG web site: http://www.lrg-sequence.org.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA