RESUMO
RNA transcripts are bound and regulated by RNA-binding proteins (RBPs). Current methods for identifying in vivo targets of an RBP are imperfect and not amenable to examining small numbers of cells. To address these issues, we developed TRIBE (targets of RNA-binding proteins identified by editing), a technique that couples an RBP to the catalytic domain of the Drosophila RNA-editing enzyme ADAR and expresses the fusion protein in vivo. RBP targets are marked with novel RNA editing events and identified by sequencing RNA. We have used TRIBE to identify the targets of three RBPs (Hrp48, dFMR1, and NonA). TRIBE compares favorably to other methods, including CLIP, and we have identified RBP targets from as little as 150 specific fly neurons. TRIBE can be performed without an antibody and in small numbers of specific cells.
Assuntos
Adenosina Desaminase/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/enzimologia , Técnicas Genéticas , Edição de RNA , Regiões 3' não Traduzidas , Animais , Ribonucleoproteínas Nucleares Heterogêneas/metabolismo , Proteínas de Ligação a RNARESUMO
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.
Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genômica , Genoma , Humanos , Disseminação de Informação , Anotação de Sequência Molecular , National Library of Medicine (U.S.) , Estados UnidosRESUMO
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (ß coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.
Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Fenótipo , Diabetes Mellitus Tipo 1/genética , Polimorfismo de Nucleotídeo Único , Aprendizado de MáquinaRESUMO
The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.
Assuntos
Estudo de Associação Genômica Ampla , Bases de Conhecimento , Animais , Humanos , Camundongos , Variações do Número de Cópias de DNA , National Human Genome Research Institute (U.S.) , Fenótipo , Polimorfismo de Nucleotídeo Único , Software , Estados UnidosRESUMO
The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Genômica/métodos , SARS-CoV-2/genética , Vertebrados/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Humanos , Internet , Anotação de Sequência Molecular/métodos , Pandemias , Vertebrados/classificaçãoRESUMO
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Epigenoma , Anotação de Sequência Molecular , Algoritmos , Animais , Gráficos por Computador , Bases de Dados de Proteínas , Variação Genética , Estudo de Associação Genômica Ampla , Genômica , Histonas/metabolismo , Humanos , Imageamento Tridimensional , Internet , Ligantes , Ferramenta de Busca , Software , Especificidade da Espécie , Transcriptoma , Interface Usuário-Computador , NavegadorRESUMO
The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by â¼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.
Assuntos
Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Doença/genética , Variação Genética , Humanos , Análise em Microsséries , Publicações , Software , Interface Usuário-ComputadorRESUMO
The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.
Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Vertebrados/genética , Animais , Biologia Computacional/tendências , Humanos , Camundongos , Anotação de Sequência Molecular , SoftwareRESUMO
This Article was originally published under Nature Research's License to Publish, but has now been made available under a CC BY 4.0 license. The PDF and HTML versions of the Article have been modified accordingly.
RESUMO
PURPOSE: Variants in IQSEC2, escaping X inactivation, cause X-linked intellectual disability with frequent epilepsy in males and females. We aimed to investigate sex-specific differences. METHODS: We collected the data of 37 unpublished patients (18 males and 19 females) with IQSEC2 pathogenic variants and 5 individuals with variants of unknown significance and reviewed published variants. We compared variant types and phenotypes in males and females and performed an analysis of IQSEC2 isoforms. RESULTS: IQSEC2 pathogenic variants mainly led to premature truncation and were scattered throughout the longest brain-specific isoform, encoding the synaptic IQSEC2/BRAG1 protein. Variants occurred de novo in females but were either de novo (2/3) or inherited (1/3) in males, with missense variants being predominantly inherited. Developmental delay and intellectual disability were overall more severe in males than in females. Likewise, seizures were more frequently observed and intractable, and started earlier in males than in females. No correlation was observed between the age at seizure onset and severity of intellectual disability or resistance to antiepileptic treatments. CONCLUSION: This study provides a comprehensive overview of IQSEC2-related encephalopathy in males and females, and suggests that an accurate dosage of IQSEC2 at the synapse is crucial during normal brain development.
Assuntos
Encefalopatias/genética , Fatores de Troca do Nucleotídeo Guanina/genética , Deficiência Intelectual/genética , Convulsões/genética , Encéfalo/crescimento & desenvolvimento , Encéfalo/metabolismo , Encefalopatias/epidemiologia , Encefalopatias/fisiopatologia , Feminino , Humanos , Lactente , Recém-Nascido , Deficiência Intelectual/epidemiologia , Deficiência Intelectual/fisiopatologia , Masculino , Mutação , Linhagem , Fenótipo , Isoformas de Proteínas/genética , Convulsões/epidemiologia , Convulsões/fisiopatologia , Caracteres SexuaisRESUMO
Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.
Assuntos
Curadoria de Dados/métodos , Armazenamento e Recuperação da Informação/métodos , Curadoria de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Aprendizado Profundo , Genômica , Bases de Conhecimento , Aprendizado de Máquina , PublicaçõesRESUMO
The NHGRI-EBI GWAS Catalog has provided data from published genome-wide association studies since 2008. In 2015, the database was redesigned and relocated to EMBL-EBI. The new infrastructure includes a new graphical user interface (www.ebi.ac.uk/gwas/), ontology supported search functionality and an improved curation interface. These developments have improved the data release frequency by increasing automation of curation and providing scaling improvements. The range of available Catalog data has also been extended with structured ancestry and recruitment information added for all studies. The infrastructure improvements also support scaling for larger arrays, exome and sequencing studies, allowing the Catalog to adapt to the needs of evolving study design, genotyping technologies and user needs in the future.
Assuntos
Bases de Dados de Ácidos Nucleicos , Estudo de Associação Genômica Ampla/métodos , Software , Mineração de Dados , Genômica/métodos , Humanos , Anotação de Sequência Molecular , National Human Genome Research Institute (U.S.) , Estados Unidos , Interface Usuário-Computador , NavegadorRESUMO
BACKGROUND: It is unknown if gastrointestinal dysbiosis in diarrheic calves causes disease or is a consequence of the disease. OBJECTIVES: Describe the fecal microbiota of calves before, during, and after recovering from diarrhea. ANIMALS: Fifteen female Holstein calves of 0 to 21 days old from a single farm. Seven calves remained healthy throughout the study, and 8 developed diarrhea on Day 14. METHODS: Longitudinal cohort study. Microbiota composition was characterized by amplifying the V4 region of the 16S rRNA gene. RESULTS: Diversity (Shannon index) increased with age in healthy and diarrheic calves from Day 3 to 21, but diarrheic calves had a lower diversity on the day diarrhea was first observed (Day 14). By Day 21, diversity increased in calves that recovered from diarrhea and was not significantly different from that of their healthy counterparts (P > .05). Weighted UniFrac distance showed significant differences in the fecal microbiota between diarrheic and healthy calves at Day 14 of age (PERMANOVA, P < .05), but not before or after diarrhea (PERMANOVA, P > .05). Lactobacillus, Clostridium Sensu Stricto 1, and Collinsella were differentially abundant on Day 10 in calves that developed diarrhea on Day 14 (P < .05). CONCLUSION AND CLINICAL IMPORTANCE: The fecal microbiota of healthy and diarrheic calves evolved similarly during the first 10 days of age but differed significantly on the day of onset of diarrhea. Enriching Lactobacillus, Clostridium Sensu Stricto 1, and Collinsella before diarrhea onset could have been contributed to the development of diarrhea.
RESUMO
Polygenic scores (PGS) have transformed human genetic research and have multiple potential clinical applications, including risk stratification for disease prevention and prediction of treatment response. Here, we present a series of recent enhancements to the PGS Catalog (www.PGSCatalog.org), the largest findable, accessible, interoperable, and reusable (FAIR) repository of PGS. These include expansions in data content and ancestral diversity as well as the addition of new features. We further present the PGS Catalog Calculator (pgsc_calc, https://github.com/PGScatalog/pgsc_calc), an open-source, scalable and portable pipeline to reproducibly calculate PGS that securely democratizes equitable PGS applications by implementing genetic ancestry estimation and score normalization using reference data. With the PGS Catalog & calculator users can now quantify an individual's genetic predisposition for hundreds of common diseases and clinically relevant traits. Taken together, these updates and tools facilitate the next generation of PGS, thus lowering barriers to the clinical studies necessary to identify where PGS may be integrated into clinical practice.
RESUMO
Associations between human genetic variation and clinical phenotypes have become a foundation of biomedical research. Most repositories of these data seek to be disease-agnostic and therefore lack disease-focused views. The Type 2 Diabetes Knowledge Portal (T2DKP) is a public resource of genetic datasets and genomic annotations dedicated to type 2 diabetes (T2D) and related traits. Here, we seek to make the T2DKP more accessible to prospective users and more useful to existing users. First, we evaluate the T2DKP's comprehensiveness by comparing its datasets with those of other repositories. Second, we describe how researchers unfamiliar with human genetic data can begin using and correctly interpreting them via the T2DKP. Third, we describe how existing users can extend their current workflows to use the full suite of tools offered by the T2DKP. We finally discuss the lessons offered by the T2DKP toward the goal of democratizing access to complex disease genetic results.
Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/genética , Acesso à Informação , Estudos Prospectivos , Genômica/métodos , FenótipoRESUMO
Pituitary adenylate cyclase-activating peptide (PACAP) is a neuroprotective peptide which exerts its effects mainly through the cAMP-protein kinase A (PKA) pathway. Here, we show that in cortical neurons, PACAP-induced PKA signaling exerts a major part of its neuroprotective effects indirectly, by triggering action potential (AP) firing. Treatment of cortical neurons with PACAP induces a rapid and sustained PKA-dependent increase in AP firing and associated intracellular Ca(2+) transients, which are essential for the anti-apoptotic actions of PACAP. Transient exposure to PACAP induces long-lasting neuroprotection in the face of apoptotic insults which is reliant on AP firing and the activation of cAMP response element (CRE) binding protein (CREB)-mediated gene expression. Although direct, activity-independent PKA signaling is sufficient to trigger phosphorylation on CREB's activating serine-133 site, this is insufficient for activation of CREB-mediated gene expression. Full activation is dependent on CREB-regulated transcription co-activator 1 (CRTC1), whose PACAP-induced nuclear import is dependent on firing activity-dependent calcineurin signaling. Over-expression of CRTC1 is sufficient to rescue PACAP-induced CRE-mediated gene expression in the face of activity-blockade, while dominant negative CRTC1 interferes with PACAP-induced, CREB-mediated neuroprotection. Thus, the enhancement of AP firing may play a significant role in the neuroprotective actions of PACAP and other adenylate cyclase-coupled ligands.
Assuntos
Fármacos Neuroprotetores , Polipeptídeo Hipofisário Ativador de Adenilato Ciclase/farmacologia , Transdução de Sinais/efeitos dos fármacos , Fatores de Transcrição/fisiologia , Potenciais de Ação/fisiologia , Animais , Apoptose/efeitos dos fármacos , Western Blotting , Calcineurina/fisiologia , Cálcio/metabolismo , Morte Celular/efeitos dos fármacos , Núcleo Celular/efeitos dos fármacos , Núcleo Celular/metabolismo , Células Cultivadas , Córtex Cerebral/citologia , Córtex Cerebral/fisiologia , Meios de Cultura , Proteína de Ligação ao Elemento de Resposta ao AMP Cíclico/fisiologia , Proteínas Quinases Dependentes de AMP Cíclico/fisiologia , Fenômenos Eletrofisiológicos , Técnicas de Patch-Clamp , Fosforilação , Ratos , Ratos Sprague-Dawley , Estaurosporina/antagonistas & inibidores , Estaurosporina/toxicidade , TransfecçãoRESUMO
BACKGROUND: Variant interpretation is dependent on transcript annotation and remains time consuming and challenging. There are major obstacles for historical data reuse and for interpretation of new variants. First, both RefSeq and Ensembl/GENCODE produce transcript sets in common use, but there is currently no easy way to translate between the two. Second, the resources often used for variant interpretation (e.g. ClinVar, gnomAD, UniProt) do not use the same transcript set, nor default transcript or protein sequence. METHOD: Ensembl ran a survey in 2018 to sample attitudes to choosing one default transcript per locus, and to gather data on reference sequences used by the scientific community. This was publicised on the Ensembl and UCSC genome browsers, by email and on social media. RESULTS: The survey had 788 responses from 32 different countries, the results of which we report here. CONCLUSIONS: We present our roadmap to create an effective default set of transcripts for resources, and for reporting interpretation of clinical variants.
Assuntos
Biomarcadores , Biologia Computacional , Genômica , RNA Mensageiro/genética , Animais , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Humanos , Software , NavegadorRESUMO
Genome sequencing has recently become a viable genotyping technology for use in genome-wide association studies (GWASs), offering the potential to analyze a broader range of genome-wide variation, including rare variants. To survey current standards, we assessed the content and quality of reporting of statistical methods, analyses, results, and datasets in 167 exome- or genome-wide-sequencing-based GWAS publications published from 2014 to 2020; 81% of publications included tests of aggregate association across multiple variants, with multiple test models frequently used. We observed a lack of standardized terms and incomplete reporting of datasets, particularly for variants analyzed in aggregate tests. We also find a lower frequency of sharing of summary statistics compared with array-based GWASs. Reporting standards and increased data sharing are required to ensure sequencing-based association study data are findable, interoperable, accessible, and reusable (FAIR). To support that, we recommend adopting the standard terminology of sequencing-based GWAS (seqGWAS). Further, we recommend that single-variant analyses be reported following the same standards and conventions as standard array-based GWASs and be shared in the GWAS Catalog. We also provide initial recommended standards for aggregate analyses metadata and summary statistics.
RESUMO
Genome-wide association studies (GWASs) have enabled robust mapping of complex traits in humans. The open sharing of GWAS summary statistics (SumStats) is essential in facilitating the larger meta-analyses needed for increased power in resolving the genetic basis of disease. However, most GWAS SumStats are not readily accessible because of limited sharing and a lack of defined standards. With the aim of increasing the availability, quality, and utility of GWAS SumStats, the National Human Genome Research Institute-European Bioinformatics Institute (NHGRI-EBI) GWAS Catalog organized a community workshop to address the standards, infrastructure, and incentives required to promote and enable sharing. We evaluated the barriers to SumStats sharing, both technological and sociological, and developed an action plan to address those challenges and ensure that SumStats and study metadata are findable, accessible, interoperable, and reusable (FAIR). We encourage early deposition of datasets in the GWAS Catalog as the recognized central repository. We recommend standard requirements for reporting elements and formats for SumStats and accompanying metadata as guidelines for community standards and a basis for submission to the GWAS Catalog. Finally, we provide recommendations to enable, promote, and incentivize broader data sharing, standards and FAIRness in order to advance genomic medicine.