Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
Cell Metab ; 35(4): 695-710.e6, 2023 04 04.
Article in English | MEDLINE | ID: mdl-36963395

ABSTRACT

Associations between human genetic variation and clinical phenotypes have become a foundation of biomedical research. Most repositories of these data seek to be disease-agnostic and therefore lack disease-focused views. The Type 2 Diabetes Knowledge Portal (T2DKP) is a public resource of genetic datasets and genomic annotations dedicated to type 2 diabetes (T2D) and related traits. Here, we seek to make the T2DKP more accessible to prospective users and more useful to existing users. First, we evaluate the T2DKP's comprehensiveness by comparing its datasets with those of other repositories. Second, we describe how researchers unfamiliar with human genetic data can begin using and correctly interpreting them via the T2DKP. Third, we describe how existing users can extend their current workflows to use the full suite of tools offered by the T2DKP. We finally discuss the lessons offered by the T2DKP toward the goal of democratizing access to complex disease genetic results.


Subject(s)
Diabetes Mellitus, Type 2 , Humans , Diabetes Mellitus, Type 2/genetics , Access to Information , Prospective Studies , Genomics/methods , Phenotype
2.
Nucleic Acids Res ; 51(D1): D977-D985, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36350656

ABSTRACT

The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.


Subject(s)
Genome-Wide Association Study , Knowledge Bases , Animals , Humans , Mice , DNA Copy Number Variations , National Human Genome Research Institute (U.S.) , Phenotype , Polymorphism, Single Nucleotide , Software , United States
3.
Nature ; 604(7905): 310-315, 2022 04.
Article in English | MEDLINE | ID: mdl-35388217

ABSTRACT

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.


Subject(s)
Computational Biology , Databases, Genetic , Genomics , Genome , Humans , Information Dissemination , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States
4.
Cell Genom ; 1(1)2021 Oct 13.
Article in English | MEDLINE | ID: mdl-34870259

ABSTRACT

Genome sequencing has recently become a viable genotyping technology for use in genome-wide association studies (GWASs), offering the potential to analyze a broader range of genome-wide variation, including rare variants. To survey current standards, we assessed the content and quality of reporting of statistical methods, analyses, results, and datasets in 167 exome- or genome-wide-sequencing-based GWAS publications published from 2014 to 2020; 81% of publications included tests of aggregate association across multiple variants, with multiple test models frequently used. We observed a lack of standardized terms and incomplete reporting of datasets, particularly for variants analyzed in aggregate tests. We also find a lower frequency of sharing of summary statistics compared with array-based GWASs. Reporting standards and increased data sharing are required to ensure sequencing-based association study data are findable, interoperable, accessible, and reusable (FAIR). To support that, we recommend adopting the standard terminology of sequencing-based GWAS (seqGWAS). Further, we recommend that single-variant analyses be reported following the same standards and conventions as standard array-based GWASs and be shared in the GWAS Catalog. We also provide initial recommended standards for aggregate analyses metadata and summary statistics.

5.
Mol Genet Genomic Med ; 9(12): e1786, 2021 12.
Article in English | MEDLINE | ID: mdl-34435752

ABSTRACT

BACKGROUND: Variant interpretation is dependent on transcript annotation and remains time consuming and challenging. There are major obstacles for historical data reuse and for interpretation of new variants. First, both RefSeq and Ensembl/GENCODE produce transcript sets in common use, but there is currently no easy way to translate between the two. Second, the resources often used for variant interpretation (e.g. ClinVar, gnomAD, UniProt) do not use the same transcript set, nor default transcript or protein sequence. METHOD: Ensembl ran a survey in 2018 to sample attitudes to choosing one default transcript per locus, and to gather data on reference sequences used by the scientific community. This was publicised on the Ensembl and UCSC genome browsers, by email and on social media. RESULTS: The survey had 788 responses from 32 different countries, the results of which we report here. CONCLUSIONS: We present our roadmap to create an effective default set of transcripts for resources, and for reporting interpretation of clinical variants.


Subject(s)
Biomarkers , Computational Biology , Genomics , RNA, Messenger/genetics , Animals , Computational Biology/methods , Databases, Genetic , Genomics/methods , Humans , Software , Web Browser
7.
Cell Genom ; 1(1)2021 Oct 13.
Article in English | MEDLINE | ID: mdl-36082306

ABSTRACT

Genome-wide association studies (GWASs) have enabled robust mapping of complex traits in humans. The open sharing of GWAS summary statistics (SumStats) is essential in facilitating the larger meta-analyses needed for increased power in resolving the genetic basis of disease. However, most GWAS SumStats are not readily accessible because of limited sharing and a lack of defined standards. With the aim of increasing the availability, quality, and utility of GWAS SumStats, the National Human Genome Research Institute-European Bioinformatics Institute (NHGRI-EBI) GWAS Catalog organized a community workshop to address the standards, infrastructure, and incentives required to promote and enable sharing. We evaluated the barriers to SumStats sharing, both technological and sociological, and developed an action plan to address those challenges and ensure that SumStats and study metadata are findable, accessible, interoperable, and reusable (FAIR). We encourage early deposition of datasets in the GWAS Catalog as the recognized central repository. We recommend standard requirements for reporting elements and formats for SumStats and accompanying metadata as guidelines for community standards and a basis for submission to the GWAS Catalog. Finally, we provide recommendations to enable, promote, and incentivize broader data sharing, standards and FAIRness in order to advance genomic medicine.

8.
Nucleic Acids Res ; 49(D1): D884-D891, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33137190

ABSTRACT

The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.


Subject(s)
Computational Biology/methods , Databases, Nucleic Acid , Genomics/methods , SARS-CoV-2/genetics , Vertebrates/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Humans , Internet , Molecular Sequence Annotation/methods , Pandemics , Vertebrates/classification
9.
Nucleic Acids Res ; 48(D1): D682-D688, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31691826

ABSTRACT

The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.


Subject(s)
Computational Biology/methods , Databases, Genetic , Epigenome , Molecular Sequence Annotation , Algorithms , Animals , Computer Graphics , Databases, Protein , Genetic Variation , Genome-Wide Association Study , Genomics , Histones/metabolism , Humans , Imaging, Three-Dimensional , Internet , Ligands , Search Engine , Software , Species Specificity , Transcriptome , User-Computer Interface , Web Browser
10.
Nucleic Acids Res ; 47(D1): D745-D751, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30407521

ABSTRACT

The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.


Subject(s)
Databases, Genetic , Genome/genetics , Genomics , Vertebrates/genetics , Animals , Computational Biology/trends , Humans , Mice , Molecular Sequence Annotation , Software
11.
Nucleic Acids Res ; 47(D1): D1005-D1012, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30445434

ABSTRACT

The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.


Subject(s)
Databases, Genetic , Genome-Wide Association Study , Disease/genetics , Genetic Variation , Humans , Microarray Analysis , Publications , Software , User-Computer Interface
12.
Genet Med ; 21(4): 837-849, 2019 04.
Article in English | MEDLINE | ID: mdl-30206421

ABSTRACT

PURPOSE: Variants in IQSEC2, escaping X inactivation, cause X-linked intellectual disability with frequent epilepsy in males and females. We aimed to investigate sex-specific differences. METHODS: We collected the data of 37 unpublished patients (18 males and 19 females) with IQSEC2 pathogenic variants and 5 individuals with variants of unknown significance and reviewed published variants. We compared variant types and phenotypes in males and females and performed an analysis of IQSEC2 isoforms. RESULTS: IQSEC2 pathogenic variants mainly led to premature truncation and were scattered throughout the longest brain-specific isoform, encoding the synaptic IQSEC2/BRAG1 protein. Variants occurred de novo in females but were either de novo (2/3) or inherited (1/3) in males, with missense variants being predominantly inherited. Developmental delay and intellectual disability were overall more severe in males than in females. Likewise, seizures were more frequently observed and intractable, and started earlier in males than in females. No correlation was observed between the age at seizure onset and severity of intellectual disability or resistance to antiepileptic treatments. CONCLUSION: This study provides a comprehensive overview of IQSEC2-related encephalopathy in males and females, and suggests that an accurate dosage of IQSEC2 at the synapse is crucial during normal brain development.


Subject(s)
Brain Diseases/genetics , Guanine Nucleotide Exchange Factors/genetics , Intellectual Disability/genetics , Seizures/genetics , Brain/growth & development , Brain/metabolism , Brain Diseases/epidemiology , Brain Diseases/physiopathology , Female , Humans , Infant , Infant, Newborn , Intellectual Disability/epidemiology , Intellectual Disability/physiopathology , Male , Mutation , Pedigree , Phenotype , Protein Isoforms/genetics , Seizures/epidemiology , Seizures/physiopathology , Sex Characteristics
14.
PLoS Comput Biol ; 14(8): e1006390, 2018 08.
Article in English | MEDLINE | ID: mdl-30102703

ABSTRACT

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.


Subject(s)
Data Curation/methods , Information Storage and Retrieval/methods , Data Curation/statistics & numerical data , Databases, Genetic , Databases, Protein , Deep Learning , Genomics , Knowledge Bases , Machine Learning , Publications
15.
Neuron ; 99(4): 768-780.e3, 2018 08 22.
Article in English | MEDLINE | ID: mdl-30057203

ABSTRACT

Drosophila NonA and its mammalian ortholog NONO are members of the Drosophila behavior and human splicing (DBHS) family. NONO also has a strong circadian connection: it associates with the circadian repressor protein PERIOD (PER) and contributes to circadian timekeeping. Here, we investigate NonA, which is required for proper levels of evening locomotor activity as well as a normal free-running period in Drosophila. NonA is associated with the positive transcription factor CLOCK/CYCLE (CLK/CYC), interacts directly with complexin (cpx) pre-mRNA, and upregulates gene expression, including the gene cpx. Downregulation of cpx expression in circadian neurons phenocopies NonA downregulation, whereas cpx overexpression rescues the nonA RNAi phenotypes, indicating that cpx is an important NonA target gene. As the cpx protein contributes to proper neurotransmitter and neuropeptide release in response to calcium, these results and others indicate that this control is important for the normal circadian regulation of locomotor activity.


Subject(s)
Adaptor Proteins, Vesicular Transport/biosynthesis , Circadian Clocks/physiology , Circadian Rhythm/physiology , Drosophila Proteins/biosynthesis , Locomotion/physiology , Nerve Tissue Proteins/biosynthesis , Nuclear Proteins/biosynthesis , Adaptor Proteins, Vesicular Transport/genetics , Animals , Animals, Genetically Modified , Drosophila , Drosophila Proteins/genetics , Male , Nerve Tissue Proteins/genetics , Nuclear Proteins/genetics
16.
Genome Biol ; 19(1): 21, 2018 02 15.
Article in English | MEDLINE | ID: mdl-29448949

ABSTRACT

The accurate description of ancestry is essential to interpret, access, and integrate human genomics data, and to ensure that these benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the representation of ancestry information. Here we describe a framework for the accurate and standardized description of sample ancestry, and validate it by application to the NHGRI-EBI GWAS Catalog. We confirm known biases and gaps in diversity, and find that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations. It is our hope that widespread adoption of this framework will lead to improved analysis, interpretation, and integration of human genomics data.


Subject(s)
Genome-Wide Association Study/standards , Genomics/standards , Genetic Variation , Humans , Racial Groups
17.
Nucleic Acids Res ; 45(D1): D896-D901, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899670

ABSTRACT

The NHGRI-EBI GWAS Catalog has provided data from published genome-wide association studies since 2008. In 2015, the database was redesigned and relocated to EMBL-EBI. The new infrastructure includes a new graphical user interface (www.ebi.ac.uk/gwas/), ontology supported search functionality and an improved curation interface. These developments have improved the data release frequency by increasing automation of curation and providing scaling improvements. The range of available Catalog data has also been extended with structured ancestry and recruitment information added for all studies. The infrastructure improvements also support scaling for larger arrays, exome and sequencing studies, allowing the Catalog to adapt to the needs of evolving study design, genotyping technologies and user needs in the future.


Subject(s)
Databases, Nucleic Acid , Genome-Wide Association Study/methods , Software , Data Mining , Genomics/methods , Humans , Molecular Sequence Annotation , National Human Genome Research Institute (U.S.) , United States , User-Computer Interface , Web Browser
19.
Cell ; 165(3): 742-53, 2016 Apr 21.
Article in English | MEDLINE | ID: mdl-27040499

ABSTRACT

RNA transcripts are bound and regulated by RNA-binding proteins (RBPs). Current methods for identifying in vivo targets of an RBP are imperfect and not amenable to examining small numbers of cells. To address these issues, we developed TRIBE (targets of RNA-binding proteins identified by editing), a technique that couples an RBP to the catalytic domain of the Drosophila RNA-editing enzyme ADAR and expresses the fusion protein in vivo. RBP targets are marked with novel RNA editing events and identified by sequencing RNA. We have used TRIBE to identify the targets of three RBPs (Hrp48, dFMR1, and NonA). TRIBE compares favorably to other methods, including CLIP, and we have identified RBP targets from as little as 150 specific fly neurons. TRIBE can be performed without an antibody and in small numbers of specific cells.


Subject(s)
Adenosine Deaminase/metabolism , Drosophila Proteins/metabolism , Drosophila melanogaster/enzymology , Genetic Techniques , RNA Editing , 3' Untranslated Regions , Animals , Heterogeneous-Nuclear Ribonucleoproteins/metabolism , RNA-Binding Proteins
20.
Neuron ; 74(3): 543-56, 2012 May 10.
Article in English | MEDLINE | ID: mdl-22578505

ABSTRACT

It is currently unclear whether the GluN2 subtype influences NMDA receptor (NMDAR) excitotoxicity. We report that the toxicity of NMDAR-mediated Ca(2+) influx is differentially controlled by the cytoplasmic C-terminal domains of GluN2B (CTD(2B)) and GluN2A (CTD(2A)). Studying the effects of acute expression of GluN2A/2B-based chimeric subunits with reciprocal exchanges of their CTDs revealed that CTD(2B) enhances NMDAR toxicity, compared to CTD(2A). Furthermore, the vulnerability of forebrain neurons in vitro and in vivo to NMDAR-dependent Ca(2+) influx is lowered by replacing the CTD of GluN2B with that of GluN2A by targeted exon exchange in a mouse knockin model. Mechanistically, CTD(2B) exhibits stronger physical/functional coupling to the PSD-95-nNOS pathway, which suppresses protective CREB activation. Dependence of NMDAR excitotoxicity on the GluN2 CTD subtype can be overcome by inducing high levels of NMDAR activity. Thus, the identity (2A versus 2B) of the GluN2 CTD controls the toxicity dose-response to episodes of NMDAR activity.


Subject(s)
N-Methylaspartate/pharmacology , Neurons/drug effects , Neurons/physiology , Neurotoxins/pharmacology , Receptors, N-Methyl-D-Aspartate/metabolism , Animals , Calcium/metabolism , Cells, Cultured , Disks Large Homolog 4 Protein , Dizocilpine Maleate/pharmacology , Dose-Response Relationship, Drug , Electric Stimulation , Embryo, Mammalian , Glial Fibrillary Acidic Protein/metabolism , Green Fluorescent Proteins/genetics , Guanylate Kinases/metabolism , Hippocampus/cytology , Membrane Potentials/drug effects , Membrane Potentials/genetics , Membrane Proteins/metabolism , Mice , Mice, Transgenic , Models, Biological , Patch-Clamp Techniques , Protein Structure, Tertiary/genetics , Protein Structure, Tertiary/physiology , Rats , Receptors, N-Methyl-D-Aspartate/genetics , Transfection
SELECTION OF CITATIONS
SEARCH DETAIL
...