RESUMO
Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.
Assuntos
Códon sem Sentido/genética , Regulação da Expressão Gênica , Doenças Genéticas Inatas/patologia , Variação Genética , Mutação , Degradação do RNAm Mediada por Códon sem Sentido , RNA Mensageiro/genética , Frequência do Gene , Doenças Genéticas Inatas/genética , HumanosRESUMO
Centromeres play an essential function in cell division by specifying the site of kinetochore formation on each chromosome for mitotic spindle attachment. Centromeres are defined epigenetically by the histone H3 variant Centromere Protein A (Cenpa). Cenpa nucleosomes maintain the centromere by designating the site for new Cenpa assembly after dilution by replication. Vertebrate centromeres assemble on tandem arrays of repetitive sequences, but the function of repeat DNA in centromere formation has been challenging to dissect due to the difficulty in manipulating centromeres in cells. Xenopus laevis egg extracts assemble centromeres in vitro, providing a system for studying centromeric DNA functions. However, centromeric sequences in Xenopus laevis have not been extensively characterized. In this study, we combine Cenpa ChIP-seq with a k-mer based analysis approach to identify the Xenopus laevis centromere repeat sequences. By in situ hybridization, we show that Xenopus laevis centromeres contain diverse repeat sequences, and we map the centromere position on each Xenopus laevis chromosome using the distribution of centromere-enriched k-mers. Our identification of Xenopus laevis centromere sequences enables previously unapproachable centromere genomic studies. Our approach should be broadly applicable for the analysis of centromere and other repetitive sequences in any organism.
Assuntos
Centrômero , Nucleossomos , Animais , Centrômero/genética , Proteína Centromérica A/genética , Proteína Centromérica A/metabolismo , Cromatina/genética , Cromatina/metabolismo , Nucleossomos/genética , Nucleossomos/metabolismo , Sequências Repetitivas de Ácido Nucleico , Xenopus laevis/genética , Xenopus laevis/metabolismoRESUMO
Multiple studies have confirmed the contribution of rare de novo copy number variations to the risk for autism spectrum disorders. But whereas de novo single nucleotide variants have been identified in affected individuals, their contribution to risk has yet to be clarified. Specifically, the frequency and distribution of these mutations have not been well characterized in matched unaffected controls, and such data are vital to the interpretation of de novo coding mutations observed in probands. Here we show, using whole-exome sequencing of 928 individuals, including 200 phenotypically discordant sibling pairs, that highly disruptive (nonsense and splice-site) de novo mutations in brain-expressed genes are associated with autism spectrum disorders and carry large effects. On the basis of mutation rates in unaffected individuals, we demonstrate that multiple independent de novo single nucleotide variants in the same gene among unrelated probands reliably identifies risk alleles, providing a clear path forward for gene discovery. Among a total of 279 identified de novo coding mutations, there is a single instance in probands, and none in siblings, in which two independent nonsense variants disrupt the same gene, SCN2A (sodium channel, voltage-gated, type II, α subunit), a result that is highly unlikely by chance.
Assuntos
Transtorno Autístico/genética , Exoma/genética , Éxons/genética , Predisposição Genética para Doença/genética , Mutação/genética , Proteínas do Tecido Nervoso/genética , Canais de Sódio/genética , Alelos , Códon sem Sentido/genética , Heterogeneidade Genética , Humanos , Canal de Sódio Disparado por Voltagem NAV1.2 , Sítios de Splice de RNA/genética , IrmãosRESUMO
It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.
Assuntos
Doenças Raras/genética , Ceramidase Ácida/genética , Estudos de Casos e Controles , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Variação Genética , Humanos , Masculino , Modelos Genéticos , Mutação , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/genética , Canais de Potássio/genética , RNA/sangue , RNA/genética , Splicing de RNA/genética , Doenças Raras/sangue , Análise de Sequência de RNA , Sequenciamento do ExomaRESUMO
RNA is a critical component of chromatin in eukaryotes, both as a product of transcription, and as an essential constituent of ribonucleoprotein complexes that regulate both local and global chromatin states. Here, we present a proximity ligation and sequencing method called Chromatin-Associated RNA sequencing (ChAR-seq) that maps all RNA-to-DNA contacts across the genome. Using Drosophila cells, we show that ChAR-seq provides unbiased, de novo identification of targets of chromatin-bound RNAs including nascent transcripts, chromosome-specific dosage compensation ncRNAs, and genome-wide trans-associated RNAs involved in co-transcriptional RNA processing.
Assuntos
Cromatina/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA/metabolismo , Animais , Cromatina/genética , DNA/genética , DNA/metabolismo , Mecanismo Genético de Compensação de Dose , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Feminino , Masculino , RNA/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcrição GênicaRESUMO
Analysis of de novo CNVs (dnCNVs) from the full Simons Simplex Collection (SSC) (N = 2,591 families) replicates prior findings of strong association with autism spectrum disorders (ASDs) and confirms six risk loci (1q21.1, 3q29, 7q11.23, 16p11.2, 15q11.2-13, and 22q11.2). The addition of published CNV data from the Autism Genome Project (AGP) and exome sequencing data from the SSC and the Autism Sequencing Consortium (ASC) shows that genes within small de novo deletions, but not within large dnCNVs, significantly overlap the high-effect risk genes identified by sequencing. Alternatively, large dnCNVs are found likely to contain multiple modest-effect risk genes. Overall, we find strong evidence that de novo mutations are associated with ASD apart from the risk for intellectual disability. Extending the transmission and de novo association test (TADA) to include small de novo deletions reveals 71 ASD risk loci, including 6 CNV regions (noted above) and 65 risk genes (FDR ≤ 0.1).
Assuntos
Transtorno do Espectro Autista/diagnóstico , Transtorno do Espectro Autista/genética , Loci Gênicos/genética , Variação Genética/genética , Mapas de Interação de Proteínas/genética , Feminino , Humanos , MasculinoRESUMO
Whole-exome sequencing (WES) studies have demonstrated the contribution of de novo loss-of-function single-nucleotide variants (SNVs) to autism spectrum disorder (ASD). However, challenges in the reliable detection of de novo insertions and deletions (indels) have limited inclusion of these variants in prior analyses. By applying a robust indel detection method to WES data from 787 ASD families (2,963 individuals), we demonstrate that de novo frameshift indels contribute to ASD risk (OR = 1.6; 95% CI = 1.0-2.7; p = 0.03), are more common in female probands (p = 0.02), are enriched among genes encoding FMRP targets (p = 6 × 10(-9)), and arise predominantly on the paternal chromosome (p < 0.001). On the basis of mutation rates in probands versus unaffected siblings, we conclude that de novo frameshift indels contribute to risk in approximately 3% of individuals with ASD. Finally, by observing clustering of mutations in unrelated probands, we uncover two ASD-associated genes: KMT2E (MLL5), a chromatin regulator, and RIMS1, a regulator of synaptic vesicle release.