Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 344
Filtrar
1.
Proc Natl Acad Sci U S A ; 117(46): 28930-28938, 2020 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-33139556

RESUMO

Common genetic variants interact with environmental factors to impact risk of heritable diseases. A notable example of this is a single-nucleotide variant in the Solute Carrier Family 39 Member 8 (SLC39A8) gene encoding the missense variant A391T, which is associated with a variety of traits ranging from Parkinson's disease and neuropsychiatric disease to cardiovascular and metabolic diseases and Crohn's disease. The remarkable extent of pleiotropy exhibited by SLC39A8 A391T raises key questions regarding how a single coding variant can contribute to this diversity of clinical outcomes and what is the mechanistic basis for this pleiotropy. Here, we generate a murine model for the Slc39a8 A391T allele and demonstrate that these mice exhibit Mn deficiency in the colon associated with impaired intestinal barrier function and epithelial glycocalyx disruption. Consequently, Slc39a8 A391T mice exhibit increased sensitivity to epithelial injury and pathological inflammation in the colon. Taken together, our results link a genetic variant with a dietary trace element to shed light on a tissue-specific mechanism of disease risk based on impaired intestinal barrier integrity.

3.
Eur J Hum Genet ; 2020 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-33110245

RESUMO

Multivariate methods are known to increase the statistical power to detect associations in the case of shared genetic basis between phenotypes. They have, however, lacked essential analytic tools to follow-up and understand the biology underlying these associations. We developed a novel computational workflow for multivariate GWAS follow-up analyses, including fine-mapping and identification of the subset of traits driving associations (driver traits). Many follow-up tools require univariate regression coefficients which are lacking from multivariate results. Our method overcomes this problem by using Canonical Correlation Analysis to turn each multivariate association into its optimal univariate Linear Combination Phenotype (LCP). This enables an LCP-GWAS, which in turn generates the statistics required for follow-up analyses. We implemented our method on 12 highly correlated inflammatory biomarkers in a Finnish population-based study. Altogether, we identified 11 associations, four of which (F5, ABO, C1orf140 and PDGFRB) were not detected by biomarker-specific analyses. Fine-mapping identified 19 signals within the 11 loci and driver trait analysis determined the traits contributing to the associations. A phenome-wide association study on the 19 representative variants from the signals in 176,899 individuals from the FinnGen study revealed 53 disease associations (p < 1 × 10-4). Several reported pQTLs in the 11 loci provided orthogonal evidence for the biologically relevant functions of the representative variants. Our novel multivariate analysis workflow provides a powerful addition to standard univariate GWAS analyses by enabling multivariate GWAS follow-up and thus promoting the advancement of powerful multivariate methods in genomics.

5.
Epilepsia ; 2020 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-33090489

RESUMO

Focal epilepsy (FE) is clinically highly heterogeneous. It has been shown recently that not only rare but also a subset of common genetic variants confer risk for FE. The relatively modest power of genetic studies in FE suggests a high genetic heterogeneity of FE when grouped as one disorder. We hypothesize that the clinical heterogeneity of FE is correlated with genetic heterogeneity on a common risk variant level. To test the hypothesis, we used an FE polygenic risk score "FE-PRS" that combines small effect sizes of thousands of common variants from the largest FE-GWAS (genome-wide association study) into a single measure. We grouped 414 individuals with FE according to common clinical features into subgroups, either by one feature at a time or by all features combined in a cluster analysis. We examined their association with FE-PRS compared to 20 435 matched population controls and observed heterogeneous FE-PRS burden among the subgroups. The highest phenotypic variance explained by FE-PRS was identified in a cluster analysis-defined FE subgroup where all individuals had unknown etiologies and psychiatric comorbidities, and the majority had early onset seizures. Our results indicate that genetic factors associated with FE have differential burden among FE subtypes. Future studies using better-powered FE-PRS might have clinical utility.

6.
Proc Natl Acad Sci U S A ; 117(45): 28201-28211, 2020 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-33106425

RESUMO

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.

7.
Nature ; 586(7831): 769-775, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33057200

RESUMO

Myeloproliferative neoplasms (MPNs) are blood cancers that are characterized by the excessive production of mature myeloid cells and arise from the acquisition of somatic driver mutations in haematopoietic stem cells (HSCs). Epidemiological studies indicate a substantial heritable component of MPNs that is among the highest known for cancers1. However, only a limited number of genetic risk loci have been identified, and the underlying biological mechanisms that lead to the acquisition of MPNs remain unclear. Here, by conducting a large-scale genome-wide association study (3,797 cases and 1,152,977 controls), we identify 17 MPN risk loci (P < 5.0 × 10-8), 7 of which have not been previously reported. We find that there is a shared genetic architecture between MPN risk and several haematopoietic traits from distinct lineages; that there is an enrichment for MPN risk variants within accessible chromatin of HSCs; and that increased MPN risk is associated with longer telomere length in leukocytes and other clonal haematopoietic states-collectively suggesting that MPN risk is associated with the function and self-renewal of HSCs. We use gene mapping to identify modulators of HSC biology linked to MPN risk, and show through targeted variant-to-function assays that CHEK2 and GFI1B have roles in altering the function of HSCs to confer disease risk. Overall, our results reveal a previously unappreciated mechanism for inherited MPN risk through the modulation of HSC function.

8.
Sci Rep ; 10(1): 15205, 2020 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938993

RESUMO

Psychogenic nonepileptic seizures (PNES) are diagnosed in approximately 30% of patients referred to tertiary care epilepsy centers. Little is known about the molecular pathology of PNES, much less about possible underlying genetic factors. We generated whole-exome sequencing and whole-genome genotyping data to identify rare, pathogenic (P) or likely pathogenic (LP) variants in 102 individuals with PNES and 448 individuals with focal (FE) or generalized (GE) epilepsy. Variants were classified for all individuals based on the ACMG-AMP 2015 guidelines. For research purposes only, we considered genes associated with neurological or psychiatric disorders as candidate genes for PNES. We observe in this first genetic investigation of PNES that six (5.88%) individuals with PNES without coexistent epilepsy carry P/LP variants (deletions at 10q11.22-q11.23, 10q23.1-q23.2, distal 16p11.2, and 17p13.3, and nonsynonymous variants in NSD1 and GABRA5). Notably, the burden of P/LP variants among the individuals with PNES was similar and not significantly different to the burden observed in the individuals with FE (3.05%) or GE (1.82%) (PNES vs. FE vs. GE (3 × 2 χ2), P = 0.30; PNES vs. epilepsy (2 × 2 χ2), P = 0.14). The presence of variants in genes associated with monogenic forms of neurological and psychiatric disorders in individuals with PNES shows that genetic factors are likely to play a role in PNES or its comorbidities in a subset of individuals. Future large-scale genetic research studies are needed to further corroborate these interesting findings in PNES.

9.
Sci Transl Med ; 12(556)2020 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-32801145

RESUMO

Malfunctions of voltage-gated sodium and calcium channels (encoded by SCNxA and CACNA1x family genes, respectively) have been associated with severe neurologic, psychiatric, cardiac, and other diseases. Altered channel activity is frequently grouped into gain or loss of ion channel function (GOF or LOF, respectively) that often corresponds not only to clinical disease manifestations but also to differences in drug response. Experimental studies of channel function are therefore important, but laborious and usually focus only on a few variants at a time. On the basis of known gene-disease mechanisms of 19 different diseases, we inferred LOF (n = 518) and GOF (n = 309) likely pathogenic variants from the disease phenotypes of variant carriers. By training a machine learning model on sequence- and structure-based features, we predicted LOF or GOF effects [area under the receiver operating characteristics curve (ROC) = 0.85] of likely pathogenic missense variants. Our LOF versus GOF prediction corresponded to molecular LOF versus GOF effects for 87 functionally tested variants in SCN1/2/8A and CACNA1I (ROC = 0.73) and was validated in exome-wide data from 21,703 cases and 128,957 controls. We showed respective regional clustering of inferred LOF and GOF nucleotide variants across the alignment of the entire gene family, suggesting shared pathomechanisms in the SCNxA/CACNA1x family genes.

10.
Sci Rep ; 10(1): 13162, 2020 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-32753748

RESUMO

A common missense variant in SLC39A8 is convincingly associated with schizophrenia and several additional phenotypes. Homozygous loss-of-function mutations in SLC39A8 result in undetectable serum manganese (Mn) and a Congenital Disorder of Glycosylation (CDG) due to the exquisite sensitivity of glycosyltransferases to Mn concentration. Here, we identified several Mn-related changes in human carriers of the common SLC39A8 missense allele. Analysis of structural brain MRI scans showed a dose-dependent change in the ratio of T2w to T1w signal in several regions. Comprehensive trace element analysis confirmed a specific reduction of only serum Mn, and plasma protein N-glycome profiling revealed reduced complexity and branching. N-glycome profiling from two individuals with SLC39A8-CDG showed similar but more severe alterations in branching that improved with Mn supplementation, suggesting that the common variant exists on a spectrum of hypofunction with potential for reversibility. Characterizing the functional impact of this variant will enhance our understanding of schizophrenia pathogenesis and identify novel therapeutic targets and biomarkers.

11.
Gut ; 2020 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-32651235

RESUMO

OBJECTIVE: Both the gut microbiome and host genetics are known to play significant roles in the pathogenesis of IBD. However, the interaction between these two factors and its implications in the aetiology of IBD remain underexplored. Here, we report on the influence of host genetics on the gut microbiome in IBD. DESIGN: To evaluate the impact of host genetics on the gut microbiota of patients with IBD, we combined whole exome sequencing of the host genome and whole genome shotgun sequencing of 1464 faecal samples from 525 patients with IBD and 939 population-based controls. We followed a four-step analysis: (1) exome-wide microbial quantitative trait loci (mbQTL) analyses, (2) a targeted approach focusing on IBD-associated genomic regions and protein truncating variants (PTVs, minor allele frequency (MAF) >5%), (3) gene-based burden tests on PTVs with MAF <5% and exome copy number variations (CNVs) with site frequency <1%, (4) joint analysis of both cohorts to identify the interactions between disease and host genetics. RESULTS: We identified 12 mbQTLs, including variants in the IBD-associated genes IL17REL, MYRF, SEC16A and WDR78. For example, the decrease of the pathway acetyl-coenzyme A biosynthesis, which is involved in short chain fatty acids production, was associated with variants in the gene MYRF (false discovery rate <0.05). Changes in functional pathways involved in the metabolic potential were also observed in participants carrying rare PTVs or CNVs in CYP2D6, GPR151 and CD160 genes. These genes are known for their function in the immune system. Moreover, interaction analyses confirmed previously known IBD disease-specific mbQTLs in TNFSF15. CONCLUSION: This study highlights that both common and rare genetic variants affecting the immune system are key factors in shaping the gut microbiota in the context of IBD and pinpoints towards potential mechanisms for disease treatment.

12.
Brain ; 143(7): 2106-2118, 2020 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-32568404

RESUMO

Cytogenic testing is routinely applied in most neurological centres for severe paediatric epilepsies. However, which characteristics of copy number variants (CNVs) confer most epilepsy risk and which epilepsy subtypes carry the most CNV burden, have not been explored on a genome-wide scale. Here, we present the largest CNV investigation in epilepsy to date with 10 712 European epilepsy cases and 6746 ancestry-matched controls. Patients with genetic generalized epilepsy, lesional focal epilepsy, non-acquired focal epilepsy, and developmental and epileptic encephalopathy were included. All samples were processed with the same technology and analysis pipeline. All investigated epilepsy types, including lesional focal epilepsy patients, showed an increase in CNV burden in at least one tested category compared to controls. However, we observed striking differences in CNV burden across epilepsy types and investigated CNV categories. Genetic generalized epilepsy patients have the highest CNV burden in all categories tested, followed by developmental and epileptic encephalopathy patients. Both epilepsy types also show association for deletions covering genes intolerant for truncating variants. Genome-wide CNV breakpoint association showed not only significant loci for genetic generalized and developmental and epileptic encephalopathy patients but also for lesional focal epilepsy patients. With a 34-fold risk for developing genetic generalized epilepsy, we show for the first time that the established epilepsy-associated 15q13.3 deletion represents the strongest risk CNV for genetic generalized epilepsy across the whole genome. Using the human interactome, we examined the largest connected component of the genes overlapped by CNVs in the four epilepsy types. We observed that genetic generalized epilepsy and non-acquired focal epilepsy formed disease modules. In summary, we show that in all common epilepsy types, 1.5-3% of patients carry epilepsy-associated CNVs. The characteristics of risk CNVs vary tremendously across and within epilepsy types. Thus, we advocate genome-wide genomic testing to identify all disease-associated types of CNVs.

13.
Mol Psychiatry ; 2020 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-32377000

RESUMO

Advances in genomics are opening new windows into the biology of schizophrenia. Though common variants individually have small effects on disease risk, GWAS provide a powerful opportunity to explore pathways and mechanisms contributing to pathophysiology. Here, we highlight an underappreciated biological theme emerging from GWAS: the role of glycosylation in schizophrenia. The strongest coding variant in schizophrenia GWAS is a missense mutation in the manganese transporter SLC39A8, which is associated with altered glycosylation patterns in humans. Furthermore, variants near several genes encoding glycosylation enzymes are unambiguously associated with schizophrenia: FUT9, MAN2A1, TMTC1, GALNT10, and B3GAT1. Here, we summarize the known biological functions, target substrates, and expression patterns of these enzymes as a primer for future studies. We also highlight a subset of schizophrenia-associated proteins critically modified by glycosylation including glutamate receptors, voltage-gated calcium channels, the dopamine D2 receptor, and complement glycoproteins. We hypothesize that common genetic variants alter brain glycosylation and play a fundamental role in the development of schizophrenia. Leveraging these findings will advance our mechanistic understanding of disease and may provide novel avenues for treatment development.

14.
PLoS Genet ; 16(5): e1008682, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32369491

RESUMO

Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through rare protein-altering variant association analysis, we find a missense variant in ANGPTL7 in UK Biobank (rs28991009, p.Gln175His, MAF = 0.8%, genotyped in 82,253 individuals with measured IOP and an independent set of 4,238 glaucoma patients and 250,660 controls) that significantly lowers IOP (ß = -0.53 and -0.67 mmHg for heterozygotes, -3.40 and -2.37 mmHg for homozygotes, P = 5.96 x 10-9 and 1.07 x 10-13 for corneal compensated and Goldman-correlated IOP, respectively) and is associated with 34% reduced risk of glaucoma (P = 0.0062). In FinnGen, we identify an ANGPTL7 missense variant at a greater than 50-fold increased frequency in Finland compared with other populations (rs147660927, p.Arg220Cys, MAF Finland = 4.3%), which was genotyped in 6,537 glaucoma patients and 170,362 controls and is associated with a 29% lower glaucoma risk (P = 1.9 x 10-12 for all glaucoma types and also protection against its subtypes including exfoliation, primary open-angle, and primary angle-closure). We further find three rarer variants in UK Biobank, including a protein-truncating variant, which confer a strong composite lowering of IOP (P = 0.0012 and 0.24 for Goldman-correlated and corneal compensated IOP, respectively), suggesting the protective mechanism likely resides in the loss of interaction or function. Our results support inhibition or down-regulation of ANGPTL7 as a therapeutic strategy for glaucoma.


Assuntos
Proteínas Semelhantes a Angiopoietina/genética , Glaucoma/genética , Glaucoma/prevenção & controle , Pressão Intraocular/genética , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Idoso de 80 Anos ou mais , Bancos de Espécimes Biológicos/estatística & dados numéricos , Estudos de Casos e Controles , Estudos de Coortes , Feminino , Finlândia/epidemiologia , Frequência do Gene , Predisposição Genética para Doença , Genética Populacional , Estudo de Associação Genômica Ampla , Glaucoma/epidemiologia , Humanos , Mutação com Perda de Função/genética , Masculino , Pessoa de Meia-Idade , Mutação de Sentido Incorreto , Reino Unido/epidemiologia
15.
Nucleic Acids Res ; 48(W1): W132-W139, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32402084

RESUMO

Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.


Assuntos
Mutação de Sentido Incorreto , Conformação Proteica , Software , Humanos , Internet , Proteínas/química , Proteínas/genética
16.
Nat Genet ; 52(6): 634-639, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32424355

RESUMO

With very large sample sizes, biobanks provide an exciting opportunity to identify genetic components of complex traits. To analyze rare variants, region-based multiple-variant aggregate tests are commonly used to increase power for association tests. However, because of the substantial computational cost, existing region-based tests cannot analyze hundreds of thousands of samples while accounting for confounders such as population stratification and sample relatedness. Here we propose a scalable generalized mixed-model region-based association test, SAIGE-GENE, that is applicable to exome-wide and genome-wide region-based analysis for hundreds of thousands of samples and can account for unbalanced case-control ratios for binary traits. Through extensive simulation studies and analysis of the HUNT study with 69,716 Norwegian samples and the UK Biobank data with 408,910 White British samples, we show that SAIGE-GENE can efficiently analyze large-sample data (N > 400,000) with type I error rates well controlled.


Assuntos
Bancos de Espécimes Biológicos/estatística & dados numéricos , Estudos de Casos e Controles , Exoma , Modelos Lineares , Marcadores Genéticos , Humanos , Lipoproteínas HDL/genética , Modelos Genéticos , Herança Multifatorial , Noruega , Reino Unido , Relação Cintura-Quadril
17.
Nature ; 581(7809): 444-451, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461652

RESUMO

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.


Assuntos
Doença/genética , Variação Genética , Genética Médica/normas , Genética Populacional/normas , Genoma Humano/genética , Grupos de Populações Continentais/genética , Feminino , Testes Genéticos , Técnicas de Genotipagem , Humanos , Masculino , Pessoa de Meia-Idade , Mutação , Polimorfismo de Nucleotídeo Único/genética , Padrões de Referência , Seleção Genética , Sequenciamento Completo do Genoma
18.
Nature ; 581(7809): 459-464, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461653

RESUMO

Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous 'knockout' humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development.


Assuntos
Genes Essenciais/efeitos dos fármacos , Genes Essenciais/genética , Mutação com Perda de Função/genética , Terapia de Alvo Molecular , Artefatos , Automação , Consanguinidade , Éxons/genética , Mutação com Ganho de Função/genética , Frequência do Gene , Técnicas de Silenciamento de Genes , Heterozigoto , Homozigoto , Humanos , Proteína Huntingtina/genética , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Doenças Neurodegenerativas/genética , Proteínas Priônicas/genética , Reprodutibilidade dos Testes , Tamanho da Amostra , Proteínas tau/genética
19.
Nature ; 581(7809): 452-458, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461655

RESUMO

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.


Assuntos
Doença/genética , Haploinsuficiência/genética , Mutação com Perda de Função/genética , Anotação de Sequência Molecular , Transcrição Genética , Transcriptoma/genética , Transtorno do Espectro Autista/genética , Conjuntos de Dados como Assunto , Deficiências do Desenvolvimento/genética , Éxons/genética , Feminino , Genótipo , Humanos , Deficiência Intelectual/genética , Masculino , Anotação de Sequência Molecular/normas , Distribuição de Poisson , RNA Mensageiro/análise , RNA Mensageiro/genética , Doenças Raras/diagnóstico , Doenças Raras/genética , Reprodutibilidade dos Testes , Sequenciamento Completo do Exoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA