RESUMO
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
Assuntos
Exoma/genética , Genes Essenciais/genética , Variação Genética/genética , Genoma Humano/genética , Adulto , Encéfalo/metabolismo , Doenças Cardiovasculares/genética , Estudos de Coortes , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Mutação com Perda de Função/genética , Masculino , Taxa de Mutação , Pró-Proteína Convertase 9/genética , RNA Mensageiro/genética , Reprodutibilidade dos Testes , Sequenciamento do Exoma , Sequenciamento Completo do GenomaRESUMO
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Assuntos
Perfilação da Expressão Gênica , Variação Genética/genética , Especificidade de Órgãos/genética , Teorema de Bayes , Feminino , Genoma Humano/genética , Genômica , Genótipo , Humanos , Masculino , Modelos Genéticos , Análise de Sequência de RNARESUMO
Rationale: Common genetic variants have been associated with idiopathic pulmonary fibrosis (IPF). Objectives: To determine functional relevance of the 10 IPF-associated common genetic variants we previously identified. Methods: We performed expression quantitative trait loci (eQTL) and methylation quantitative trait loci (mQTL) mapping, followed by co-localization of eQTL and mQTL with genetic association signals and functional validation by luciferase reporter assays. Illumina multi-ethnic genotyping arrays, mRNA sequencing, and Illumina 850k methylation arrays were performed on lung tissue of participants with IPF (234 RNA and 345 DNA samples) and non-diseased controls (188 RNA and 202 DNA samples). Measurements and Main Results: Focusing on genetic variants within 10 IPF-associated genetic loci, we identified 27 eQTLs in controls and 24 eQTLs in cases (false-discovery-rate-adjusted P < 0.05). Among these signals, we identified associations of lead variants rs35705950 with expression of MUC5B and rs2076295 with expression of DSP in both cases and controls. mQTL analysis identified CpGs in gene bodies of MUC5B (cg17589883) and DSP (cg08964675) associated with the lead variants in these two loci. We also demonstrated strong co-localization of eQTL/mQTL and genetic signal in MUC5B (rs35705950) and DSP (rs2076295). Functional validation of the mQTL in MUC5B using luciferase reporter assays demonstrates that the CpG resides within a putative internal repressor element. Conclusions: We have established a relationship of the common IPF genetic risk variants rs35705950 and rs2076295 with respective changes in MUC5B and DSP expression and methylation. These results provide additional evidence that both MUC5B and DSP are involved in the etiology of IPF.
Assuntos
Fibrose Pulmonar Idiopática , Humanos , DNA , Metilação de DNA/genética , Expressão Gênica , Predisposição Genética para Doença/genética , Fibrose Pulmonar Idiopática/genética , Mucina-5B/genética , Locos de Características Quantitativas/genética , RNAAssuntos
Sistemas CRISPR-Cas , Terapia Baseada em Transplante de Células e Tecidos , Edição de Genes , Transplante de Células-Tronco Hematopoéticas , Humanos , Edição de Genes/métodos , Proteína 9 Associada à CRISPR , Terapia Baseada em Transplante de Células e Tecidos/efeitos adversos , Terapia Baseada em Transplante de Células e Tecidos/métodos , Antígenos CD34 , Células-Tronco Hematopoéticas , Transplante de Células-Tronco Hematopoéticas/métodos , Complicações Pós-Operatórias/genética , Complicações Pós-Operatórias/prevenção & controleRESUMO
ATP synthase, H+ transporting, mitochondrial F1 complex, δ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynδ, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.
Assuntos
Alelos , Doenças Metabólicas/genética , ATPases Mitocondriais Próton-Translocadoras/genética , Mutação/genética , Subunidades Proteicas/genética , Sequência de Aminoácidos , Sequência de Bases , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Mutação com Perda de Função/genética , Masculino , Mitocôndrias/metabolismo , Mitocôndrias/ultraestrutura , ATPases Mitocondriais Próton-Translocadoras/química , Subunidades Proteicas/químicaRESUMO
The X Chromosome, with its unique mode of inheritance, contributes to differences between the sexes at a molecular level, including sex-specific gene expression and sex-specific impact of genetic variation. Improving our understanding of these differences offers to elucidate the molecular mechanisms underlying sex-specific traits and diseases. However, to date, most studies have either ignored the X Chromosome or had insufficient power to test for the sex-specific impact of genetic variation. By analyzing whole blood transcriptomes of 922 individuals, we have conducted the first large-scale, genome-wide analysis of the impact of both sex and genetic variation on patterns of gene expression, including comparison between the X Chromosome and autosomes. We identified a depletion of expression quantitative trait loci (eQTL) on the X Chromosome, especially among genes under high selective constraint. In contrast, we discovered an enrichment of sex-specific regulatory variants on the X Chromosome. To resolve the molecular mechanisms underlying such effects, we generated chromatin accessibility data through ATAC-sequencing to connect sex-specific chromatin accessibility to sex-specific patterns of expression and regulatory variation. As sex-specific regulatory variants discovered in our study can inform sex differences in heritable disease prevalence, we integrated our data with genome-wide association study data for multiple immune traits identifying several traits with significant sex biases in genetic susceptibilities. Together, our study provides genome-wide insight into how genetic variation, the X Chromosome, and sex shape human gene regulation and disease.
Assuntos
Cromossomos Humanos X/genética , Transcriptoma , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Predisposição Genética para Doença , Genoma Humano , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Caracteres SexuaisRESUMO
PurposeCurrent clinical genomics assays primarily utilize short-read sequencing (SRS), but SRS has limited ability to evaluate repetitive regions and structural variants. Long-read sequencing (LRS) has complementary strengths, and we aimed to determine whether LRS could offer a means to identify overlooked genetic variation in patients undiagnosed by SRS.MethodsWe performed low-coverage genome LRS to identify structural variants in a patient who presented with multiple neoplasia and cardiac myxomata, in whom the results of targeted clinical testing and genome SRS were negative.ResultsThis LRS approach yielded 6,971 deletions and 6,821 insertions > 50 bp. Filtering for variants that are absent in an unrelated control and overlap a disease gene coding exon identified three deletions and three insertions. One of these, a heterozygous 2,184 bp deletion, overlaps the first coding exon of PRKAR1A, which is implicated in autosomal dominant Carney complex. RNA sequencing demonstrated decreased PRKAR1A expression. The deletion was classified as pathogenic based on guidelines for interpretation of sequence variants.ConclusionThis first successful application of genome LRS to identify a pathogenic variant in a patient suggests that LRS has significant potential for the identification of disease-causing structural variation. Larger studies will ultimately be required to evaluate the potential clinical utility of LRS.
Assuntos
Estudos de Associação Genética , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Variação Genética , Genoma Humano , Genômica , Análise de Sequência de DNA , Criança , Subunidade RIalfa da Proteína Quinase Dependente de AMP Cíclico/genética , Ecocardiografia , Genômica/métodos , Humanos , Masculino , Fenótipo , Análise de Sequência de DNA/métodos , Deleção de SequênciaRESUMO
At least 15% of the disease-causing mutations affect mRNA splicing. Many splicing mutations are missed in a clinical setting due to limitations of in silico prediction algorithms or their location in noncoding regions. Whole-transcriptome sequencing is a promising new tool to identify these mutations; however, it will be a challenge to obtain disease-relevant tissue for RNA. Here, we describe an individual with a sporadic atypical spinal muscular atrophy, in whom clinical DNA sequencing reported one pathogenic ASAH1 mutation (c.458A>G;p.Tyr153Cys). Transcriptome sequencing on patient leukocytes identified a highly significant and atypical ASAH1 isoform not explained by c.458A>G(p<10-16 ). Subsequent Sanger-sequencing identified the splice mutation responsible for the isoform (c.504A>C;p.Lys168Asn) and provided a molecular diagnosis of autosomal-recessive spinal muscular atrophy with progressive myoclonic epilepsy. Our findings demonstrate the utility of RNA sequencing from blood to identify splice-impacting disease mutations for nonhematological conditions, providing a diagnosis for these otherwise unsolved patients.
Assuntos
Ceramidase Ácida/genética , Atrofia Muscular Espinal/sangue , Epilepsias Mioclônicas Progressivas/sangue , Splicing de RNA/genética , Ceramidase Ácida/sangue , Pré-Escolar , Humanos , Masculino , Atrofia Muscular Espinal/complicações , Atrofia Muscular Espinal/genética , Mutação , Epilepsias Mioclônicas Progressivas/complicações , Epilepsias Mioclônicas Progressivas/genética , Patologia Molecular , Análise de Sequência de DNA , Transcriptoma/genéticaRESUMO
Whole-genome and exome sequencing in human populations has revealed the tolerance of each gene for loss-of-function variation. By understanding this tolerance, it has become increasingly possible to identify genes that would make safe therapeutic targets and to identify rare genetic risk factors and phenotypes at the scale of individual genomes. To date, the vast majority of surveyed loss-of-function variants are in protein-coding regions of the genome mainly due to the focus on these regions by exome-based sequencing projects and their relative ease of interpretability. As whole-genome sequencing becomes more prevalent, new strategies will be required to uncover impactful variation in non-coding regions of the genome where the architecture of genome function is more complex. In this review, we investigate recent studies of loss-of-function variation and emerging approaches for interpreting whole-genome sequencing data to identify rare and impactful non-coding loss-of-function variants.
Assuntos
DNA Intergênico/genética , Variação Genética , Genoma Humano , Exoma/genética , Genética Populacional , Genômica , HumanosRESUMO
Hundreds of thousands of genetic variants have been reported to cause severe monogenic diseases, but the probability that a variant carrier develops the disease (termed penetrance) is unknown for virtually all of them. Additionally, the clinical utility of common polygenetic variation remains uncertain. Using exome sequencing from 77,184 adult individuals (38,618 multi-ancestral individuals from a type 2 diabetes case-control study and 38,566 participants from the UK Biobank, for whom genotype array data were also available), we apply clinical standard-of-care gene variant curation for eight monogenic metabolic conditions. Rare variants causing monogenic diabetes and dyslipidemias display effect sizes significantly larger than the top 1% of the corresponding polygenic scores. Nevertheless, penetrance estimates for monogenic variant carriers average 60% or lower for most conditions. We assess epidemiologic and genetic factors contributing to risk prediction in monogenic variant carriers, demonstrating that inclusion of polygenic variation significantly improves biomarker estimation for two monogenic dyslipidemias.
Assuntos
Diabetes Mellitus Tipo 2/genética , Dislipidemias/genética , Predisposição Genética para Doença/genética , Adulto , Variação Biológica da População , Biomarcadores/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Dislipidemias/metabolismo , Exoma/genética , Genótipo , Humanos , Herança Multifatorial , Penetrância , Medição de RiscoRESUMO
G-quadruplex motifs in the RNA play significant roles in key cellular processes and human disease. While sequences capable of forming G-quadruplexes in the pre-mRNA are involved in regulation of polyadenylation and splicing events in mammalian transcripts, the G-quadruplex motifs in the UTRs may help regulate mRNA expression. GRSDB2 is a second-generation database containing information on the composition and distribution of putative Quadruplex-forming G-Rich Sequences (QGRS) mapped in approximately 29 000 eukaryotic pre-mRNA sequences, many of which are alternatively processed. The data stored in the GRSDB2 is based on computational analysis of NCBI Entrez Gene entries with the help of an improved version of the QGRS Mapper program. The database allows complex queries with a wide variety of parameters, including Gene Ontology terms. The data is displayed in a variety of formats with several additional computational capabilities. We have also developed a new database, GRS_UTRdb, containing information on the composition and distribution patterns of putative QGRS in the 5'- and 3'-UTRs of eukaryotic mRNA sequences. The goal of these experiments has been to build freely accessible resources for exploring the role of G-quadruplex structure in regulation of gene expression at post-transcriptional level. The databases can be accessed at the G-Quadruplex Resource Site at: http://bioinformatics.ramapo.edu/GQRS/.
Assuntos
Bases de Dados de Ácidos Nucleicos , Quadruplex G , Precursores de RNA/química , RNA Mensageiro/química , Regiões não Traduzidas/química , Processamento Alternativo , Animais , Humanos , Internet , Ratos , Interface Usuário-ComputadorRESUMO
It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.
Assuntos
Doenças Raras/genética , Ceramidase Ácida/genética , Estudos de Casos e Controles , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Variação Genética , Humanos , Masculino , Modelos Genéticos , Mutação , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/genética , Canais de Potássio/genética , RNA/sangue , RNA/genética , Splicing de RNA/genética , Doenças Raras/sangue , Análise de Sequência de RNA , Sequenciamento do ExomaRESUMO
Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.
Assuntos
Perfilação da Expressão Gênica/métodos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas/genética , Processamento Alternativo , Mapeamento Cromossômico , Saúde da Família , Feminino , Predisposição Genética para Doença/genética , Genética Populacional , Genótipo , Humanos , Itália , Masculino , Polimorfismo de Nucleotídeo Único , Sítio de Iniciação de TranscriçãoRESUMO
The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.