Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 46(D1): D754-D761, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29155950

RESUMEN

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.


Asunto(s)
Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Genoma , Difusión de la Información , Animales , Epigenómica , Genoma Humano , Estudio de Asociación del Genoma Completo , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Anotación de Secuencia Molecular , Vertebrados/genética , Navegador Web
2.
Nucleic Acids Res ; 45(D1): D635-D642, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899575

RESUMEN

Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Motor de Búsqueda , Programas Informáticos , Navegador Web , Animales , Minería de Datos , Evolución Molecular , Regulación de la Expresión Génica , Variación Genética , Genoma Humano , Humanos , Anotación de Secuencia Molecular , Especificidad de la Especie , Vertebrados
3.
Nucleic Acids Res ; 44(D1): D710-6, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26687719

RESUMEN

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.


Asunto(s)
Bases de Datos Genéticas , Genómica , Anotación de Secuencia Molecular , Animales , Genes , Variación Genética , Humanos , Internet , Ratones , Proteínas/genética , Ratas , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos
4.
Nucleic Acids Res ; 43(Database issue): D662-9, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25352552

RESUMEN

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genómica , Animales , Epigénesis Genética , Variación Genética , Genoma Humano , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos
5.
Bioinformatics ; 31(1): 143-5, 2015 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-25236461

RESUMEN

MOTIVATION: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. AVAILABILITY AND IMPLEMENTATION: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Lenguajes de Programación , Programas Informáticos , Variación Genética , Genómica , Humanos
6.
Nature ; 464(7289): 757-62, 2010 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-20360741

RESUMEN

The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.


Asunto(s)
Pinzones/genética , Genoma/genética , Regiones no Traducidas 3'/genética , Animales , Percepción Auditiva/genética , Encéfalo/fisiología , Pollos/genética , Evolución Molecular , Femenino , Pinzones/fisiología , Duplicación de Gen , Redes Reguladoras de Genes/genética , Masculino , MicroARNs/genética , Modelos Animales , Familia de Multigenes/genética , Retroelementos/genética , Cromosomas Sexuales/genética , Secuencias Repetidas Terminales/genética , Transcripción Genética/genética , Vocalización Animal/fisiología
7.
Nucleic Acids Res ; 42(Database issue): D749-55, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24316576

RESUMEN

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.


Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Cordados/genética , Variación Genética , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Fenotipo , Ratas
8.
Nat Genet ; 39(7): 827-9, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17558408

RESUMEN

We tested 310,605 SNPs for association in 778 individuals with celiac disease and 1,422 controls. Outside the HLA region, the most significant finding (rs13119723; P = 2.0 x 10(-7)) was in the KIAA1109-TENR-IL2-IL21 linkage disequilibrium block. We independently confirmed association in two further collections (strongest association at rs6822844, 24 kb 5' of IL21; meta-analysis P = 1.3 x 10(-14), odds ratio = 0.63), suggesting that genetic variation in this region predisposes to celiac disease.


Asunto(s)
Enfermedad Celíaca/genética , Predisposición Genética a la Enfermedad , Variación Genética , Genoma Humano , Interleucina-2/genética , Interleucinas/genética , Animales , Cromosomas Humanos Par 4/genética , Humanos , Desequilibrio de Ligamiento , Ratones , Polimorfismo de Nucleótido Simple , Factores de Riesgo
9.
Nucleic Acids Res ; 41(Database issue): D48-55, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23203987

RESUMEN

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.


Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Regulación de la Expresión Génica , Variación Genética , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Ratas , Programas Informáticos , Pez Cebra/genética
10.
Nucleic Acids Res ; 40(Database issue): D84-90, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22086963

RESUMEN

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.


Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Regulación de la Expresión Génica , Variación Genética , Humanos , Ratones , Anotación de Secuencia Molecular , Ratas
11.
Genome Res ; 20(6): 791-803, 2010 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-20430781

RESUMEN

The spontaneously hypertensive rat (SHR) is the most widely studied animal model of hypertension. Scores of SHR quantitative loci (QTLs) have been mapped for hypertension and other phenotypes. We have sequenced the SHR/OlaIpcv genome at 10.7-fold coverage by paired-end sequencing on the Illumina platform. We identified 3.6 million high-quality single nucleotide polymorphisms (SNPs) between the SHR/OlaIpcv and Brown Norway (BN) reference genome, with a high rate of validation (sensitivity 96.3%-98.0% and specificity 99%-100%). We also identified 343,243 short indels between the SHR/OlaIpcv and reference genomes. These SNPs and indels resulted in 161 gain or loss of stop codons and 629 frameshifts compared with the BN reference sequence. We also identified 13,438 larger deletions that result in complete or partial absence of 107 genes in the SHR/OlaIpcv genome compared with the BN reference and 588 copy number variants (CNVs) that overlap with the gene regions of 688 genes. Genomic regions containing genes whose expression had been previously mapped as cis-regulated expression quantitative trait loci (eQTLs) were significantly enriched with SNPs, short indels, and larger deletions, suggesting that some of these variants have functional effects on gene expression. Genes that were affected by major alterations in their coding sequence were highly enriched for genes related to ion transport, transport, and plasma membrane localization, providing insights into the likely molecular and cellular basis of hypertension and other phenotypes specific to the SHR strain. This near complete catalog of genomic differences between two extensively studied rat strains provides the starting point for complete elucidation, at the molecular level, of the physiological and pathophysiological phenotypic differences between individuals from these strains.


Asunto(s)
Hipertensión/genética , Animales , Codón de Terminación , Dosificación de Gen , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Ratas , Ratas Endogámicas SHR , Transcripción Genética
12.
Nucleic Acids Res ; 39(Database issue): D800-6, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21045057

RESUMEN

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.


Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Variación Genética , Humanos , Ratones , Anotación de Secuencia Molecular , Ratas , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos , Pez Cebra/genética
13.
Hum Mol Genet ; 19(13): 2539-53, 2010 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-20211853

RESUMEN

We describe a novel approach for evaluating SNP genotypes of a genome-wide association scan to identify "ethnic outlier" subjects whose ethnicity is different or admixed compared to most other subjects in the genotyped sample set. Each ethnic outlier is detected by counting a genomic excess of "rare" heterozygotes and/or homozygotes whose frequencies are low (<1%) within genotypes of the sample set being evaluated. This method also enables simple and striking visualization of non-Caucasian chromosomal DNA segments interspersed within the chromosomes of ethnically admixed individuals. We show that this visualization of the mosaic structure of admixed human chromosomes gives results similar to another visualization method (SABER) but with much less computational time and burden. We also show that other methods for detecting ethnic outliers are enhanced by evaluating only genomic regions of visualized admixture rather than diluting outlier ancestry by evaluating the entire genome considered in aggregate. We have validated our method in the Wellcome Trust Case Control Consortium (WTCCC) study of 17,000 subjects as well as in HapMap subjects and simulated outliers of known ethnicity and admixture. The method's ability to precisely delineate chromosomal segments of non-Caucasian ethnicity has enabled us to demonstrate previously unreported non-Caucasian admixture in two HapMap Caucasian parents and in a number of WTCCC subjects. Its sensitive detection of ethnic outliers and simple visual discrimination of discrete chromosomal segments of different ethnicity implies that this method of rare heterozygotes and homozygotes (RHH) is likely to have diverse and important applications in humans and other species.


Asunto(s)
Cromosomas Humanos , Genoma Humano , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Polimorfismo de Nucleótido Simple , Población Blanca/genética , Algoritmos , Marcadores Genéticos , Genética de Población , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos
14.
Nucleic Acids Res ; 38(Database issue): D557-62, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19906699

RESUMEN

Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Acceso a la Información , Animales , Biología Computacional/tendencias , Bases de Datos de Proteínas , Variación Genética , Genómica/métodos , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Estructura Terciaria de Proteína , Programas Informáticos , Especificidad de la Especie
15.
PLoS Genet ; 5(3): e1000433, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19300499

RESUMEN

We report the first genome-wide association study (GWAS) whose sample size (1,053 Swedish subjects) is sufficiently powered to detect genome-wide significance (p<1.5 x 10(-7)) for polymorphisms that modestly alter therapeutic warfarin dose. The anticoagulant drug warfarin is widely prescribed for reducing the risk of stroke, thrombosis, pulmonary embolism, and coronary malfunction. However, Caucasians vary widely (20-fold) in the dose needed for therapeutic anticoagulation, and hence prescribed doses may be too low (risking serious illness) or too high (risking severe bleeding). Prior work established that approximately 30% of the dose variance is explained by single nucleotide polymorphisms (SNPs) in the warfarin drug target VKORC1 and another approximately 12% by two non-synonymous SNPs (*2, *3) in the cytochrome P450 warfarin-metabolizing gene CYP2C9. We initially tested each of 325,997 GWAS SNPs for association with warfarin dose by univariate regression and found the strongest statistical signals (p<10(-78)) at SNPs clustering near VKORC1 and the second lowest p-values (p<10(-31)) emanating from CYP2C9. No other SNPs approached genome-wide significance. To enhance detection of weaker effects, we conducted multiple regression adjusting for known influences on warfarin dose (VKORC1, CYP2C9, age, gender) and identified a single SNP (rs2108622) with genome-wide significance (p = 8.3 x 10(-10)) that alters protein coding of the CYP4F2 gene. We confirmed this result in 588 additional Swedish patients (p<0.0029) and, during our investigation, a second group provided independent confirmation from a scan of warfarin-metabolizing genes. We also thoroughly investigated copy number variations, haplotypes, and imputed SNPs, but found no additional highly significant warfarin associations. We present power analysis of our GWAS that is generalizable to other studies, and conclude we had 80% power to detect genome-wide significance for common causative variants or markers explaining at least 1.5% of dose variance. These GWAS results provide further impetus for conducting large-scale trials assessing patient benefit from genotype-based forecasting of warfarin dose.


Asunto(s)
Hidrocarburo de Aril Hidroxilasas/genética , Sistema Enzimático del Citocromo P-450/genética , Estudio de Asociación del Genoma Completo , Oxigenasas de Función Mixta/genética , Farmacogenética/métodos , Polimorfismo de Nucleótido Simple , Warfarina/administración & dosificación , Citocromo P-450 CYP2C9 , Familia 4 del Citocromo P450 , Humanos , Metabolismo/genética , Suecia , Vitamina K Epóxido Reductasas , Warfarina/metabolismo
16.
Bioinformatics ; 26(16): 2069-70, 2010 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-20562413

RESUMEN

SUMMARY: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species. AVAILABILITY: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software.


Asunto(s)
Variación Genética , Genómica , Polimorfismo de Nucleótido Simple , Programas Informáticos , Internet
17.
BMC Bioinformatics ; 11: 238, 2010 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-20459810

RESUMEN

BACKGROUND: Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources. RESULTS: Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments. CONCLUSIONS: Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at http://www.ensembl.org.


Asunto(s)
Bases de Datos Factuales , Genómica/métodos , Genotipo , Análisis de Secuencia de ADN/métodos , Genoma , Fenotipo
18.
BMC Genomics ; 11: 293, 2010 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-20459805

RESUMEN

BACKGROUND: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. DESCRIPTION: The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. CONCLUSIONS: Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Genómica/métodos , Algoritmos , Animales , Secuencia de Bases , Bovinos , Genotipo , Humanos , Internet , Desequilibrio de Ligamiento , Ratones , Fenotipo , Filogenia , Polimorfismo de Nucleótido Simple , Ratas , Análisis de Secuencia de ADN , Interfaz Usuario-Computador
20.
Nat Commun ; 10(1): 2373, 2019 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-31147538

RESUMEN

We aimed to develop an efficient, flexible and scalable approach to diagnostic genome-wide sequence analysis of genetically heterogeneous clinical presentations. Here we present G2P ( www.ebi.ac.uk/gene2phenotype ) as an online system to establish, curate and distribute datasets for diagnostic variant filtering via association of allelic requirement and mutational consequence at a defined locus with phenotypic terms, confidence level and evidence links. An extension to Ensembl Variant Effect Predictor (VEP), VEP-G2P was used to filter both disease-associated and control whole exome sequence (WES) with Developmental Disorders G2P (G2PDD; 2044 entries). VEP-G2PDD shows a sensitivity/precision of 97.3%/33% for de novo and 81.6%/22.7% for inherited pathogenic genotypes respectively. Many of the missing genotypes are likely false-positive pathogenic assignments. The expected number and discriminative features of background genotypes are defined using control WES. Using only human genetic data VEP-G2P performs well compared to other freely-available diagnostic systems and future phenotypic matching capabilities should further enhance performance.


Asunto(s)
Discapacidades del Desarrollo/genética , Secuenciación del Exoma , Pruebas Genéticas , Genoma Humano , Alelos , Genotipo , Humanos , Técnicas de Diagnóstico Molecular , Mutación , Fenotipo , Análisis de Secuencia de ADN , Secuenciación Completa del Genoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA