Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
Nature ; 604(7905): 310-315, 2022 04.
Article in English | MEDLINE | ID: mdl-35388217

ABSTRACT

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.


Subject(s)
Computational Biology , Databases, Genetic , Genomics , Genome , Humans , Information Dissemination , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States
2.
Mol Genet Genomic Med ; 9(12): e1786, 2021 12.
Article in English | MEDLINE | ID: mdl-34435752

ABSTRACT

BACKGROUND: Variant interpretation is dependent on transcript annotation and remains time consuming and challenging. There are major obstacles for historical data reuse and for interpretation of new variants. First, both RefSeq and Ensembl/GENCODE produce transcript sets in common use, but there is currently no easy way to translate between the two. Second, the resources often used for variant interpretation (e.g. ClinVar, gnomAD, UniProt) do not use the same transcript set, nor default transcript or protein sequence. METHOD: Ensembl ran a survey in 2018 to sample attitudes to choosing one default transcript per locus, and to gather data on reference sequences used by the scientific community. This was publicised on the Ensembl and UCSC genome browsers, by email and on social media. RESULTS: The survey had 788 responses from 32 different countries, the results of which we report here. CONCLUSIONS: We present our roadmap to create an effective default set of transcripts for resources, and for reporting interpretation of clinical variants.


Subject(s)
Biomarkers , Computational Biology , Genomics , RNA, Messenger/genetics , Animals , Computational Biology/methods , Databases, Genetic , Genomics/methods , Humans , Software , Web Browser
3.
Am J Hum Genet ; 108(6): 1083-1094, 2021 06 03.
Article in English | MEDLINE | ID: mdl-34022131

ABSTRACT

Clinical genetic testing of protein-coding regions identifies a likely causative variant in only around half of developmental disorder (DD) cases. The contribution of regulatory variation in non-coding regions to rare disease, including DD, remains very poorly understood. We screened 9,858 probands from the Deciphering Developmental Disorders (DDD) study for de novo mutations in the 5' untranslated regions (5' UTRs) of genes within which variants have previously been shown to cause DD through a dominant haploinsufficient mechanism. We identified four single-nucleotide variants and two copy-number variants upstream of MEF2C in a total of ten individual probands. We developed multiple bespoke and orthogonal experimental approaches to demonstrate that these variants cause DD through three distinct loss-of-function mechanisms, disrupting transcription, translation, and/or protein function. These non-coding region variants represent 23% of likely diagnoses identified in MEF2C in the DDD cohort, but these would all be missed in standard clinical genetics approaches. Nonetheless, these variants are readily detectable in exome sequence data, with 30.7% of 5' UTR bases across all genes well covered in the DDD dataset. Our analyses show that non-coding variants upstream of genes within which coding variants are known to cause DD are an important cause of severe disease and demonstrate that analyzing 5' UTRs can increase diagnostic yield. We also show how non-coding variants can help inform both the disease-causing mechanism underlying protein-coding variants and dosage tolerance of the gene.


Subject(s)
5' Untranslated Regions , Developmental Disabilities/etiology , Genetic Predisposition to Disease , Loss of Function Mutation , Child , Cohort Studies , DNA Copy Number Variations , Developmental Disabilities/pathology , Humans , MEF2 Transcription Factors/genetics , Exome Sequencing
4.
Nature ; 583(7814): 96-102, 2020 07.
Article in English | MEDLINE | ID: mdl-32581362

ABSTRACT

Most patients with rare diseases do not receive a molecular diagnosis and the aetiological variants and causative genes for more than half such disorders remain to be discovered1. Here we used whole-genome sequencing (WGS) in a national health system to streamline diagnosis and to discover unknown aetiological variants in the coding and non-coding regions of the genome. We generated WGS data for 13,037 participants, of whom 9,802 had a rare disease, and provided a genetic diagnosis to 1,138 of the 7,065 extensively phenotyped participants. We identified 95 Mendelian associations between genes and rare diseases, of which 11 have been discovered since 2015 and at least 79 are confirmed to be aetiological. By generating WGS data of UK Biobank participants2, we found that rare alleles can explain the presence of some individuals in the tails of a quantitative trait for red blood cells. Finally, we identified four novel non-coding variants that cause disease through the disruption of transcription of ARPC1B, GATA1, LRBA and MPL. Our study demonstrates a synergy by using WGS for diagnosis and aetiological discovery in routine healthcare.


Subject(s)
Internationality , National Health Programs , Rare Diseases/diagnosis , Rare Diseases/genetics , Whole Genome Sequencing , Actin-Related Protein 2-3 Complex/genetics , Adaptor Proteins, Signal Transducing/genetics , Alleles , Databases, Factual , Erythrocytes/metabolism , GATA1 Transcription Factor/genetics , Humans , Phenotype , Quantitative Trait Loci , Receptors, Thrombopoietin/genetics , State Medicine , United Kingdom
6.
Nucleic Acids Res ; 47(D1): D1005-D1012, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30445434

ABSTRACT

The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.


Subject(s)
Databases, Genetic , Genome-Wide Association Study , Disease/genetics , Genetic Variation , Humans , Microarray Analysis , Publications , Software , User-Computer Interface
7.
Nucleic Acids Res ; 47(D1): D745-D751, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30407521

ABSTRACT

The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.


Subject(s)
Databases, Genetic , Genome/genetics , Genomics , Vertebrates/genetics , Animals , Computational Biology/trends , Humans , Mice , Molecular Sequence Annotation , Software
8.
Genome Biol ; 19(1): 21, 2018 02 15.
Article in English | MEDLINE | ID: mdl-29448949

ABSTRACT

The accurate description of ancestry is essential to interpret, access, and integrate human genomics data, and to ensure that these benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the representation of ancestry information. Here we describe a framework for the accurate and standardized description of sample ancestry, and validate it by application to the NHGRI-EBI GWAS Catalog. We confirm known biases and gaps in diversity, and find that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations. It is our hope that widespread adoption of this framework will lead to improved analysis, interpretation, and integration of human genomics data.


Subject(s)
Genome-Wide Association Study/standards , Genomics/standards , Genetic Variation , Humans , Racial Groups
9.
Epigenetics ; 13(2): 117-121, 2018.
Article in English | MEDLINE | ID: mdl-27911167

ABSTRACT

The analysis of DNA methylation has become routine in the pipeline for diagnosis of imprinting disorders, with many publications reporting aberrant methylation associated with imprinted differentially methylated regions (DMRs). However, comparisons between these studies are routinely hampered by the lack of consistency in reporting sites of methylation evaluated. To avoid confusion surrounding nomenclature, special care is needed to communicate results accurately, especially between scientists and other health care professionals. Within the European Network for Human Congenital Imprinting Disorders we have discussed these issues and designed a nomenclature for naming imprinted DMRs as well as for reporting methylation values. We apply these recommendations for imprinted DMRs that are commonly assayed in clinical laboratories and show how they support standardized database submission. The recommendations are in line with existing recommendations, most importantly the Human Genome Variation Society nomenclature, and should facilitate accurate reporting and data exchange among laboratories and thereby help to avoid future confusion.


Subject(s)
DNA Methylation , Epigenomics/standards , Genomic Imprinting , Terminology as Topic , Animals , Humans , Polymorphism, Genetic , Practice Guidelines as Topic
10.
Nucleic Acids Res ; 45(D1): D896-D901, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899670

ABSTRACT

The NHGRI-EBI GWAS Catalog has provided data from published genome-wide association studies since 2008. In 2015, the database was redesigned and relocated to EMBL-EBI. The new infrastructure includes a new graphical user interface (www.ebi.ac.uk/gwas/), ontology supported search functionality and an improved curation interface. These developments have improved the data release frequency by increasing automation of curation and providing scaling improvements. The range of available Catalog data has also been extended with structured ancestry and recruitment information added for all studies. The infrastructure improvements also support scaling for larger arrays, exome and sequencing studies, allowing the Catalog to adapt to the needs of evolving study design, genotyping technologies and user needs in the future.


Subject(s)
Databases, Nucleic Acid , Genome-Wide Association Study/methods , Software , Data Mining , Genomics/methods , Humans , Molecular Sequence Annotation , National Human Genome Research Institute (U.S.) , United States , User-Computer Interface , Web Browser
11.
Nucleic Acids Res ; 42(Database issue): D1001-6, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24316577

ABSTRACT

The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog provides a publicly available manually curated collection of published GWAS assaying at least 100,000 single-nucleotide polymorphisms (SNPs) and all SNP-trait associations with P <1 × 10(-5). The Catalog includes 1751 curated publications of 11 912 SNPs. In addition to the SNP-trait association data, the Catalog also publishes a quarterly diagram of all SNP-trait associations mapped to the SNPs' chromosomal locations. The Catalog can be accessed via a tabular web interface, via a dynamic visualization on the human karyotype, as a downloadable tab-delimited file and as an OWL knowledge base. This article presents a number of recent improvements to the Catalog, including novel ways for users to interact with the Catalog and changes to the curation infrastructure.


Subject(s)
Databases, Nucleic Acid , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Gene Ontology , Genome, Human , Humans , Internet , Karyotype
12.
Nucleic Acids Res ; 42(Database issue): D873-8, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24285302

ABSTRACT

Locus Reference Genomic (LRG; http://www.lrg-sequence.org/) records contain internationally recognized stable reference sequences designed specifically for reporting clinically relevant sequence variants. Each LRG is contained within a single file consisting of a stable 'fixed' section and a regularly updated 'updatable' section. The fixed section contains stable genomic DNA sequence for a genomic region, essential transcripts and proteins for variant reporting and an exon numbering system. The updatable section contains mapping information, annotation of all transcripts and overlapping genes in the region and legacy exon and amino acid numbering systems. LRGs provide a stable framework that is vital for reporting variants, according to Human Genome Variation Society (HGVS) conventions, in genomic DNA, transcript or protein coordinates. To enable translation of information between LRG and genomic coordinates, LRGs include mapping to the human genome assembly. LRGs are compiled and maintained by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). LRG reference sequences are selected in collaboration with the diagnostic and research communities, locus-specific database curators and mutation consortia. Currently >700 LRGs have been created, of which >400 are publicly available. The aim is to create an LRG for every locus with clinical implications.


Subject(s)
Databases, Genetic , Genetic Variation , Genome, Human , Exons , Genetic Loci , Genomics/standards , Humans , Internet , Proteins/genetics , RNA, Messenger/chemistry , Reference Standards
13.
J Neurosci ; 28(41): 10200-5, 2008 Oct 08.
Article in English | MEDLINE | ID: mdl-18842880

ABSTRACT

Fragile X syndrome (FXS) is the most common form of hereditary mental retardation. FXS patients have a deficit for the fragile X mental retardation protein (FMRP) that results in abnormal neuronal dendritic spine morphology and behavioral phenotypes, including sleep abnormalities. In a Drosophila model of FXS, flies lacking the dfmr1 protein (dFMRP) have abnormal circadian rhythms apparently as a result of altered clock output. In this study, we present biochemical and genetic evidence that dFMRP interacts with a known clock output component, the LARK RNA-binding protein. Our studies demonstrate physical interactions between dFMRP and LARK, that the two proteins are present in a complex in vivo, and that LARK promotes the stability of dFMRP. Furthermore, we show genetic interactions between the corresponding genes indicating that dFMRP and LARK function together to regulate eye development and circadian behavior.


Subject(s)
Behavior, Animal/physiology , Circadian Rhythm/physiology , Drosophila Proteins/metabolism , Drosophila/physiology , Eye/growth & development , Fragile X Mental Retardation Protein/metabolism , RNA-Binding Proteins/metabolism , Animals , Circadian Rhythm/genetics , Disease Models, Animal , Drosophila/growth & development , Drosophila Proteins/genetics , Fragile X Mental Retardation Protein/genetics , Fragile X Syndrome , Larva/metabolism , RNA-Binding Proteins/genetics
14.
Neuron ; 34(6): 961-72, 2002 Jun 13.
Article in English | MEDLINE | ID: mdl-12086643

ABSTRACT

Mental retardation is a pervasive societal problem, 25 times more common than blindness for example. Fragile X syndrome, the most common form of inherited mental retardation, is caused by mutations in the FMR1 gene. Fragile X patients display neurite morphology defects in the brain, suggesting that this may be the basis of their mental retardation. Drosophila contains a single homolog of FMR1, dfxr (also called dfmr1). We analyzed the role of dfxr in neurite development in three distinct neuronal classes. We find that DFXR is required for normal neurite extension, guidance, and branching. dfxr mutants also display strong eclosion failure and circadian rhythm defects. Interestingly, distinct neuronal cell types show different phenotypes, suggesting that dfxr differentially regulates diverse targets in the brain.


Subject(s)
Brain/physiology , Drosophila Proteins/physiology , Fragile X Syndrome/genetics , Nerve Tissue Proteins/physiology , Neurons/cytology , RNA-Binding Proteins , Amino Acid Sequence , Animals , Brain/pathology , Circadian Rhythm/genetics , Circadian Rhythm/physiology , Fragile X Mental Retardation Protein , Molecular Sequence Data , Motor Activity/genetics , Motor Activity/physiology , Mutation , Nerve Tissue Proteins/deficiency , Nerve Tissue Proteins/genetics , Neuroglia/metabolism , Neuroglia/pathology , Neuroglia/physiology , Neurons/pathology , Neurons/physiology , Sequence Homology, Amino Acid
SELECTION OF CITATIONS
SEARCH DETAIL