ABSTRACT
Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.
Subject(s)
Genetic Variation , Lipids , Computer Simulation , Genetic Association Studies , Phenotype , Genome-Wide Association StudyABSTRACT
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
Subject(s)
Databases, Genetic , Exome Sequencing , Exome/genetics , Loss of Function Mutation/genetics , Phenotype , Aged , Bone Density/genetics , Collagen Type VI/genetics , Demography , Female , Genes, BRCA1 , Genes, BRCA2 , Genotype , Humans , Ion Channels/genetics , Male , Middle Aged , Neoplasms/genetics , Penetrance , Peptide Fragments/genetics , United Kingdom , Varicose Veins/genetics , ras GTPase-Activating Proteins/geneticsABSTRACT
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19), a respiratory illness that can result in hospitalization or death. We used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with COVID-19. After accounting for multiple testing, we did not identify any clear associations with rare variants either exome wide or when specifically focusing on (1) 13 interferon pathway genes in which rare deleterious variants have been reported in individuals with severe COVID-19, (2) 281 genes located in susceptibility loci identified by the COVID-19 Host Genetics Initiative, or (3) 32 additional genes of immunologic relevance and/or therapeutic potential. Our analyses indicate there are no significant associations with rare protein-coding variants with detectable effect sizes at our current sample sizes. Analyses will be updated as additional data become available, and results are publicly available through the Regeneron Genetics Center COVID-19 Results Browser.
Subject(s)
COVID-19/diagnosis , COVID-19/genetics , Exome Sequencing , Exome/genetics , Genetic Predisposition to Disease , Hospitalization/statistics & numerical data , COVID-19/immunology , COVID-19/therapy , Female , Humans , Interferons/genetics , Male , Prognosis , SARS-CoV-2 , Sample SizeABSTRACT
Serum alanine aminotransferase (ALT) and aspartate aminotransferase (AST) are biomarkers for liver health. Here we report the largest genome-wide association analysis to date of serum ALT and AST levels in over 388k people of European ancestry from UK biobank and DiscovEHR. Eleven million imputed markers with a minor allele frequency (MAF) ≥ 0.5% were analyzed. Overall, 300 ALT and 336 AST independent genome-wide significant associations were identified. Among them, 81 ALT and 61 AST associations are reported for the first time. Genome-wide interaction study identified 9 ALT and 12 AST independent associations significantly modified by body mass index (BMI), including several previously reported potential liver disease therapeutic targets, for example, PNPLA3, HSD17B13, and MARC1. While further work is necessary to understand the effect of ALT and AST-associated variants on liver disease, the weighted burden of significant BMI-modified signals is significantly associated with liver disease outcomes. In summary, this study identifies genetic associations which offer an important step forward in understanding the genetic architecture of serum ALT and AST levels. Significant interactions between BMI and genetic loci not only highlight the important role of adiposity in liver damage but also shed light on the genetic etiology of liver disease in obese individuals.
Subject(s)
Alanine Transaminase/blood , Aspartate Aminotransferases/blood , Body Mass Index , Genome-Wide Association Study , HumansABSTRACT
Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified â¼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in â¼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.
Subject(s)
Exome/genetics , Precision Medicine , Cohort Studies , Computer Simulation , Electronic Health Records , Exons/genetics , Family , Female , Genetics, Population , Geography , Heterozygote , Humans , Male , Mutation/genetics , Pedigree , Phenotype , Reproducibility of ResultsABSTRACT
Atrial fibrillation (AF) is a common cardiac arrhythmia and a major risk factor for stroke, heart failure, and premature death. The pathogenesis of AF remains poorly understood, which contributes to the current lack of highly effective treatments. To understand the genetic variation and biology underlying AF, we undertook a genome-wide association study (GWAS) of 6,337 AF individuals and 61,607 AF-free individuals from Norway, including replication in an additional 30,679 AF individuals and 278,895 AF-free individuals. Through genotyping and dense imputation mapping from whole-genome sequencing, we tested almost nine million genetic variants across the genome and identified seven risk loci, including two novel loci. One novel locus (lead single-nucleotide variant [SNV] rs12614435; p = 6.76 × 10-18) comprised intronic and several highly correlated missense variants situated in the I-, A-, and M-bands of titin, which is the largest protein in humans and responsible for the passive elasticity of heart and skeletal muscle. The other novel locus (lead SNV rs56202902; p = 1.54 × 10-11) covered a large, gene-dense chromosome 1 region that has previously been linked to cardiac conduction. Pathway and functional enrichment analyses suggested that many AF-associated genetic variants act through a mechanism of impaired muscle cell differentiation and tissue formation during fetal heart development.
Subject(s)
Atrial Fibrillation/genetics , Genetic Loci , Genetic Predisposition to Disease , Genome-Wide Association Study , Heart/embryology , Regulatory Sequences, Nucleic Acid/genetics , Humans , Inheritance Patterns/genetics , Multifactorial Inheritance/genetics , Organ Specificity/genetics , Physical Chromosome Mapping , Quantitative Trait Loci/genetics , Reproducibility of Results , Risk FactorsABSTRACT
BACKGROUND: Elucidation of the genetic factors underlying chronic liver disease may reveal new therapeutic targets. METHODS: We used exome sequence data and electronic health records from 46,544 participants in the DiscovEHR human genetics study to identify genetic variants associated with serum levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST). Variants that were replicated in three additional cohorts (12,527 persons) were evaluated for association with clinical diagnoses of chronic liver disease in DiscovEHR study participants and two independent cohorts (total of 37,173 persons) and with histopathological severity of liver disease in 2391 human liver samples. RESULTS: A splice variant (rs72613567:TA) in HSD17B13, encoding the hepatic lipid droplet protein hydroxysteroid 17-beta dehydrogenase 13, was associated with reduced levels of ALT (P=4.2×10-12) and AST (P=6.2×10-10). Among DiscovEHR study participants, this variant was associated with a reduced risk of alcoholic liver disease (by 42% [95% confidence interval {CI}, 20 to 58] among heterozygotes and by 53% [95% CI, 3 to 77] among homozygotes), nonalcoholic liver disease (by 17% [95% CI, 8 to 25] among heterozygotes and by 30% [95% CI, 13 to 43] among homozygotes), alcoholic cirrhosis (by 42% [95% CI, 14 to 61] among heterozygotes and by 73% [95% CI, 15 to 91] among homozygotes), and nonalcoholic cirrhosis (by 26% [95% CI, 7 to 40] among heterozygotes and by 49% [95% CI, 15 to 69] among homozygotes). Associations were confirmed in two independent cohorts. The rs72613567:TA variant was associated with a reduced risk of nonalcoholic steatohepatitis, but not steatosis, in human liver samples. The rs72613567:TA variant mitigated liver injury associated with the risk-increasing PNPLA3 p.I148M allele and resulted in an unstable and truncated protein with reduced enzymatic activity. CONCLUSIONS: A loss-of-function variant in HSD17B13 was associated with a reduced risk of chronic liver disease and of progression from steatosis to steatohepatitis. (Funded by Regeneron Pharmaceuticals and others.).
Subject(s)
17-Hydroxysteroid Dehydrogenases/genetics , Fatty Liver/genetics , Genetic Predisposition to Disease , Liver Diseases/genetics , Loss of Function Mutation , 17-Hydroxysteroid Dehydrogenases/metabolism , Alanine Transaminase/blood , Aspartate Aminotransferases/blood , Biomarkers/blood , Chronic Disease , Disease Progression , Female , Genetic Variation , Genotype , Humans , Linear Models , Liver/pathology , Liver Diseases/pathology , Male , Sequence Analysis, RNA , Exome SequencingABSTRACT
Schizophrenia is a common, chronic and debilitating neuropsychiatric syndrome affecting tens of millions of individuals worldwide. While rare genetic variants play a role in the etiology of schizophrenia, most of the currently explained liability is within common variation, suggesting that variation predating the human diaspora out of Africa harbors a large fraction of the common variant attributable heritability. However, common variant association studies in schizophrenia have concentrated mainly on cohorts of European descent. We describe genome-wide association studies of 6152 cases and 3918 controls of admixed African ancestry, and of 1234 cases and 3090 controls of Latino ancestry, representing the largest such study in these populations to date. Combining results from the samples with African ancestry with summary statistics from the Psychiatric Genomics Consortium (PGC) study of schizophrenia yielded seven newly genome-wide significant loci, and we identified an additional eight loci by incorporating the results from samples with Latino ancestry. Leveraging population differences in patterns of linkage disequilibrium, we achieve improved fine-mapping resolution at 22 previously reported and 4 newly significant loci. Polygenic risk score profiling revealed improved prediction based on trans-ancestry meta-analysis results for admixed African (Nagelkerke's R2 = 0.032; liability R2 = 0.017; P < 10-52), Latino (Nagelkerke's R2 = 0.089; liability R2 = 0.021; P < 10-58), and European individuals (Nagelkerke's R2 = 0.089; liability R2 = 0.037; P < 10-113), further highlighting the advantages of incorporating data from diverse human populations.
Subject(s)
Black People/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study , Hispanic or Latino/genetics , Schizophrenia/genetics , Female , Genetic Loci , Humans , Male , Polymorphism, Single Nucleotide/geneticsABSTRACT
BACKGROUND: Loss-of-function variants in the angiopoietin-like 3 gene (ANGPTL3) have been associated with decreased plasma levels of triglycerides, low-density lipoprotein (LDL) cholesterol, and high-density lipoprotein (HDL) cholesterol. It is not known whether such variants or therapeutic antagonism of ANGPTL3 are associated with a reduced risk of atherosclerotic cardiovascular disease. METHODS: We sequenced the exons of ANGPTL3 in 58,335 participants in the DiscovEHR human genetics study. We performed tests of association for loss-of-function variants in ANGPTL3 with lipid levels and with coronary artery disease in 13,102 case patients and 40,430 controls from the DiscovEHR study, with follow-up studies involving 23,317 case patients and 107,166 controls from four population studies. We also tested the effects of a human monoclonal antibody, evinacumab, against Angptl3 in dyslipidemic mice and against ANGPTL3 in healthy human volunteers with elevated levels of triglycerides or LDL cholesterol. RESULTS: In the DiscovEHR study, participants with heterozygous loss-of-function variants in ANGPTL3 had significantly lower serum levels of triglycerides, HDL cholesterol, and LDL cholesterol than participants without these variants. Loss-of-function variants were found in 0.33% of case patients with coronary artery disease and in 0.45% of controls (adjusted odds ratio, 0.59; 95% confidence interval, 0.41 to 0.85; P=0.004). These results were confirmed in the follow-up studies. In dyslipidemic mice, inhibition of Angptl3 with evinacumab resulted in a greater decrease in atherosclerotic lesion area and necrotic content than a control antibody. In humans, evinacumab caused a dose-dependent placebo-adjusted reduction in fasting triglyceride levels of up to 76% and LDL cholesterol levels of up to 23%. CONCLUSIONS: Genetic and therapeutic antagonism of ANGPTL3 in humans and of Angptl3 in mice was associated with decreased levels of all three major lipid fractions and decreased odds of atherosclerotic cardiovascular disease. (Funded by Regeneron Pharmaceuticals and others; ClinicalTrials.gov number, NCT01749878 .).
Subject(s)
Angiopoietins/antagonists & inhibitors , Antibodies, Monoclonal/administration & dosage , Atherosclerosis/drug therapy , Coronary Artery Disease/genetics , Dyslipidemias/drug therapy , Lipids/blood , Mutation , Aged , Angiopoietin-Like Protein 3 , Angiopoietin-like Proteins , Angiopoietins/genetics , Animals , Antibodies, Monoclonal/adverse effects , Antibodies, Monoclonal/pharmacology , Atherosclerosis/metabolism , Cardiovascular Diseases/prevention & control , Coronary Artery Disease/metabolism , Disease Models, Animal , Dose-Response Relationship, Drug , Double-Blind Method , Dyslipidemias/blood , Female , Humans , Lipid Metabolism/drug effects , Male , Mice , Mice, Inbred Strains , Middle AgedABSTRACT
Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.
Subject(s)
Multifactorial Inheritance/genetics , Mutation/genetics , Schizophrenia/genetics , Autistic Disorder/genetics , Calcium Channels/genetics , Cytoskeletal Proteins/genetics , DNA Copy Number Variations/genetics , Disks Large Homolog 4 Protein , Female , Fragile X Mental Retardation Protein/metabolism , Genome-Wide Association Study , Humans , Intellectual Disability/genetics , Intracellular Signaling Peptides and Proteins/genetics , Male , Membrane Proteins/genetics , Nerve Tissue Proteins/genetics , Receptors, N-Methyl-D-Aspartate/geneticsABSTRACT
BACKGROUND: Higher-than-normal levels of circulating triglycerides are a risk factor for ischemic cardiovascular disease. Activation of lipoprotein lipase, an enzyme that is inhibited by angiopoietin-like 4 (ANGPTL4), has been shown to reduce levels of circulating triglycerides. METHODS: We sequenced the exons of ANGPTL4 in samples obtain from 42,930 participants of predominantly European ancestry in the DiscovEHR human genetics study. We performed tests of association between lipid levels and the missense E40K variant (which has been associated with reduced plasma triglyceride levels) and other inactivating mutations. We then tested for associations between coronary artery disease and the E40K variant and other inactivating mutations in 10,552 participants with coronary artery disease and 29,223 controls. We also tested the effect of a human monoclonal antibody against ANGPTL4 on lipid levels in mice and monkeys. RESULTS: We identified 1661 heterozygotes and 17 homozygotes for the E40K variant and 75 participants who had 13 other monoallelic inactivating mutations in ANGPTL4. The levels of triglycerides were 13% lower and the levels of high-density lipoprotein (HDL) cholesterol were 7% higher among carriers of the E40K variant than among noncarriers. Carriers of the E40K variant were also significantly less likely than noncarriers to have coronary artery disease (odds ratio, 0.81; 95% confidence interval, 0.70 to 0.92; P=0.002). K40 homozygotes had markedly lower levels of triglycerides and higher levels of HDL cholesterol than did heterozygotes. Carriers of other inactivating mutations also had lower triglyceride levels and higher HDL cholesterol levels and were less likely to have coronary artery disease than were noncarriers. Monoclonal antibody inhibition of Angptl4 in mice and monkeys reduced triglyceride levels. CONCLUSIONS: Carriers of E40K and other inactivating mutations in ANGPTL4 had lower levels of triglycerides and a lower risk of coronary artery disease than did noncarriers. The inhibition of Angptl4 in mice and monkeys also resulted in corresponding reductions in these values. (Funded by Regeneron Pharmaceuticals.).
Subject(s)
Angiopoietins/genetics , Coronary Artery Disease/genetics , Gene Silencing , Mutation , Aged , Angiopoietin-Like Protein 4 , Angiopoietins/antagonists & inhibitors , Animals , Cholesterol/blood , Disease Models, Animal , Female , Heterozygote , Humans , Macaca mulatta , Male , Mice , Middle Aged , Risk Factors , Triglycerides/bloodABSTRACT
It is well known that inbreeding increases the risk of recessive monogenic diseases, but it is less certain whether it contributes to the etiology of complex diseases such as schizophrenia. One way to estimate the effects of inbreeding is to examine the association between disease diagnosis and genome-wide autozygosity estimated using runs of homozygosity (ROH) in genome-wide single nucleotide polymorphism arrays. Using data for schizophrenia from the Psychiatric Genomics Consortium (n = 21,868), Keller et al. (2012) estimated that the odds of developing schizophrenia increased by approximately 17% for every additional percent of the genome that is autozygous (ß = 16.1, CI(ß) = [6.93, 25.7], Z = 3.44, p = 0.0006). Here we describe replication results from 22 independent schizophrenia case-control datasets from the Psychiatric Genomics Consortium (n = 39,830). Using the same ROH calling thresholds and procedures as Keller et al. (2012), we were unable to replicate the significant association between ROH burden and schizophrenia in the independent PGC phase II data, although the effect was in the predicted direction, and the combined (original + replication) dataset yielded an attenuated but significant relationship between Froh and schizophrenia (ß = 4.86,CI(ß) = [0.90,8.83],Z = 2.40,p = 0.02). Since Keller et al. (2012), several studies reported inconsistent association of ROH burden with complex traits, particularly in case-control data. These conflicting results might suggest that the effects of autozygosity are confounded by various factors, such as socioeconomic status, education, urbanicity, and religiosity, which may be associated with both real inbreeding and the outcome measures of interest.
Subject(s)
Consanguinity , Genome-Wide Association Study , Schizophrenia/genetics , Female , Genome, Human , Genomics , Homozygote , Humans , Male , Polymorphism, Single Nucleotide , Schizophrenia/epidemiology , Schizophrenia/pathologyABSTRACT
Structural variation (SV) is a significant component of the genetic etiology of both neurodevelopmental and psychiatric disorders; however, routine guidelines for clinical genetic screening have been established only in the former category. Genome-wide chromosomal microarray (CMA) can detect genomic imbalances such as copy-number variants (CNVs), but balanced chromosomal abnormalities (BCAs) still require karyotyping for clinical detection. Moreover, submicroscopic BCAs and subarray threshold CNVs are intractable, or cryptic, to both CMA and karyotyping. Here, we performed whole-genome sequencing using large-insert jumping libraries to delineate both cytogenetically visible and cryptic SVs in a single test among 30 clinically referred youth representing a range of severe neuropsychiatric conditions. We detected 96 SVs per person on average that passed filtering criteria above our highest-confidence resolution (6,305 bp) and an additional 111 SVs per genome below this resolution. These SVs rearranged 3.8 Mb of genomic sequence and resulted in 42 putative loss-of-function (LoF) or gain-of-function mutations per person. We estimate that 80% of the LoF variants were cryptic to clinical CMA. We found myriad complex and cryptic rearrangements, including a "paired" duplication (360 kb, 169 kb) that flanks a 5.25 Mb inversion that appears in 7 additional cases from clinical CNV data among 47,562 individuals. Following convergent genomic profiling of these independent clinical CNV data, we interpreted three SVs to be of potential clinical significance. These data indicate that sequence-based delineation of the full SV mutational spectrum warrants exploration in youth referred for neuropsychiatric evaluation and clinical diagnostic SV screening more broadly.
Subject(s)
Age of Onset , Chromosome Aberrations , Chromosomes, Human/genetics , DNA Copy Number Variations/genetics , Mental Disorders/genetics , Neurodegenerative Diseases/genetics , Comparative Genomic Hybridization , Genome, Human , Humans , Mental Disorders/epidemiology , Microarray Analysis , Neurodegenerative Diseases/epidemiology , Phenotype , United States/epidemiologyABSTRACT
MOTIVATION: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. To address these issues, we developed a new algorithm--Copy number estimation using Lattice-Aligned Mixture Models (CLAMMS)--which is highly scalable and suitable for detecting CNVs across the whole allele frequency spectrum. RESULTS: In this note, we summarize the methods and intended use-case of CLAMMS, compare it to previous algorithms and briefly describe results of validation experiments. We evaluate the adherence of CNV calls from CLAMMS and four other algorithms to Mendelian inheritance patterns on a pedigree; we compare calls from CLAMMS and other algorithms to calls from SNP genotyping arrays for a set of 3164 samples; and we use TaqMan quantitative polymerase chain reaction to validate CNVs predicted by CLAMMS at 39 loci (95% of rare variants validate; across 19 common variant loci, the mean precision and recall are 99% and 94%, respectively). In the Supplementary Materials (available at the CLAMMS Github repository), we present our methods and validation results in greater detail. AVAILABILITY AND IMPLEMENTATION: https://github.com/rgcgithub/clamms (implemented in C). CONTACT: jeffrey.reid@regeneron.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Algorithms , DNA Copy Number Variations/genetics , Exome/genetics , Sequence Analysis, DNA/methods , Humans , Markov Chains , Reproducibility of ResultsABSTRACT
We performed whole-genome sequencing on an individual from a family with variable psychiatric phenotypes that had a sensory processing disorder, apraxia, and autism. The proband harbored a maternally inherited balanced translocation (46,XY,t(11;14)(p12;p12)mat) that disrupted LRRC4C, a member of the highly specialized netrin G family of axon guidance molecules. The proband also inherited a paternally derived chromosomal inversion that disrupted DPP6, a potassium channel interacting protein. Copy Number (CN) analysis in 14,077 cases with neurodevelopmental disorders and 8,960 control subjects revealed that 60% of cases with exonic deletions in LRRC4C had a second clinically recognizable syndrome associated with variable clinical phenotypes, including 16p11.2, 1q44, and 2q33.1 CN syndromes, suggesting LRRC4C deletion variants may be modifiers of neurodevelopmental disorders. In vitro, functional assessments modeling patient deletions in LRRC4C suggest a negative regulatory role of these exons found in the untranslated region of LRRC4C, which has a single, terminal coding exon. These data suggest that the proband's autism may be due to the inheritance of disruptions in both DPP6 and LRRC4C, and may highlight the importance of the netrin G family and potassium channel interacting molecules in neurodevelopmental disorders. © 2016 Wiley Periodicals, Inc.
Subject(s)
Dipeptidyl-Peptidases and Tripeptidyl-Peptidases/genetics , Genetic Association Studies , Nerve Tissue Proteins/genetics , Neurodevelopmental Disorders/diagnosis , Neurodevelopmental Disorders/genetics , Phenotype , Potassium Channels/genetics , Receptors, Cell Surface/genetics , 5' Untranslated Regions , Adolescent , Adult , Apraxias/diagnosis , Apraxias/genetics , Autistic Disorder/diagnosis , Autistic Disorder/genetics , Child , Child, Preschool , Chromosome Breakpoints , Chromosome Inversion , Comparative Genomic Hybridization , DNA Copy Number Variations , Female , Gene Expression , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Humans , Karyotype , Male , Middle Aged , Multigene Family , Pedigree , Translocation, Genetic , Young AdultABSTRACT
Importance: The activity of lipoprotein lipase (LPL) is the rate-determining step in clearing triglyceride-rich lipoproteins from the circulation. Mutations that damage the LPL gene (LPL) lead to lifelong deficiency in enzymatic activity and can provide insight into the relationship of LPL to human disease. Objective: To determine whether rare and/or common variants in LPL are associated with early-onset coronary artery disease (CAD). Design, Setting, and Participants: In a cross-sectional study, LPL was sequenced in 10 CAD case-control cohorts of the multinational Myocardial Infarction Genetics Consortium and a nested CAD case-control cohort of the Geisinger Health System DiscovEHR cohort between 2010 and 2015. Common variants were genotyped in up to 305â¯699 individuals of the Global Lipids Genetics Consortium and up to 120â¯600 individuals of the CARDIoGRAM Exome Consortium between 2012 and 2014. Study-specific estimates were pooled via meta-analysis. Exposures: Rare damaging mutations in LPL included loss-of-function variants and missense variants annotated as pathogenic in a human genetics database or predicted to be damaging by computer prediction algorithms trained to identify mutations that impair protein function. Common variants in the LPL gene region included those independently associated with circulating triglyceride levels. Main Outcomes and Measures: Circulating lipid levels and CAD. Results: Among 46â¯891 individuals with LPL gene sequencing data available, the mean (SD) age was 50 (12.6) years and 51% were female. A total of 188 participants (0.40%; 95% CI, 0.35%-0.46%) carried a damaging mutation in LPL, including 105 of 32â¯646 control participants (0.32%) and 83 of 14â¯245 participants with early-onset CAD (0.58%). Compared with 46â¯703 noncarriers, the 188 heterozygous carriers of an LPL damaging mutation displayed higher plasma triglyceride levels (19.6 mg/dL; 95% CI, 4.6-34.6 mg/dL) and higher odds of CAD (odds ratio = 1.84; 95% CI, 1.35-2.51; P < .001). An analysis of 6 common LPL variants resulted in an odds ratio for CAD of 1.51 (95% CI, 1.39-1.64; P = 1.1 × 10-22) per 1-SD increase in triglycerides. Conclusions and Relevance: The presence of rare damaging mutations in LPL was significantly associated with higher triglyceride levels and presence of coronary artery disease. However, further research is needed to assess whether there are causal mechanisms by which heterozygous lipoprotein lipase deficiency could lead to coronary artery disease.
Subject(s)
Coronary Artery Disease/genetics , Lipoprotein Lipase/genetics , Mutation , Adult , Age of Onset , Case-Control Studies , Cross-Sectional Studies , Female , Genotype , Heterozygote , Humans , Lipoproteins/blood , Male , Middle Aged , Odds Ratio , Triglycerides/bloodABSTRACT
Large and rare copy number variants (CNVs) at several loci have been shown to increase risk for schizophrenia. Aiming to discover novel susceptibility CNV loci, we analyzed 6882 cases and 11 255 controls genotyped on Illumina arrays, most of which have not been used for this purpose before. We identified genes enriched for rare exonic CNVs among cases, and then attempted to replicate the findings in additional 14 568 cases and 15 274 controls. In a combined analysis of all samples, 12 distinct loci were enriched among cases with nominal levels of significance (P < 0.05); however, none would survive correction for multiple testing. These loci include recurrent deletions at 16p12.1, a locus previously associated with neurodevelopmental disorders (P = 0.0084 in the discovery sample and P = 0.023 in the replication sample). Other plausible candidates include non-recurrent deletions at the glutamate transporter gene SLC1A1, a CNV locus recently suggested to be involved in schizophrenia through linkage analysis, and duplications at 1p36.33 and CGNL1. A burden analysis of large (>500 kb), rare CNVs showed a 1.2% excess in cases after excluding known schizophrenia-associated loci, suggesting that additional susceptibility loci exist. However, even larger samples are required for their discovery.
Subject(s)
Chromosome Deletion , Chromosomes, Human, Pair 16 , Chromosomes, Human, Pair 1 , Cytoskeletal Proteins/genetics , Excitatory Amino Acid Transporter 3/genetics , Gene Duplication , Schizophrenia/genetics , DNA Copy Number Variations , Female , Gene Deletion , Gene Dosage , Genetic Association Studies , Genetic Predisposition to Disease , Genetic Variation , Humans , MaleABSTRACT
Identifying rare, highly penetrant risk mutations may be an important step in dissecting the molecular etiology of schizophrenia. We conducted a gene-based analysis of large (>100 kb), rare copy-number variants (CNVs) in the Wellcome Trust Case Control Consortium 2 (WTCCC2) schizophrenia sample of 1564 cases and 1748 controls all from Ireland, and further extended the analysis to include an additional 5196 UK controls. We found association with duplications at chr20p12.2 (P = 0.007) and evidence of replication in large independent European schizophrenia (P = 0.052) and UK bipolar disorder case-control cohorts (P = 0.047). A combined analysis of Irish/UK subjects including additional psychosis cases (schizophrenia and bipolar disorder) identified 22 carriers in 11 707 cases and 10 carriers in 21 204 controls [meta-analysis Cochran-Mantel-Haenszel P-value = 2 × 10(-4); odds ratio (OR) = 11.3, 95% CI = 3.7, ∞]. Nineteen of the 22 cases and 8 of the 10 controls carried duplications starting at 9.68 Mb with similar breakpoints across samples. By haplotype analysis and sequencing, we identified a tandem ~149 kb duplication overlapping the gene p21 Protein-Activated Kinase 7 (PAK7, also called PAK5) which was in linkage disequilibrium with local haplotypes (P = 2.5 × 10(-21)), indicative of a single ancestral duplication event. We confirmed the breakpoints in 8/8 carriers tested and found co-segregation of the duplication with illness in two additional family members of one of the affected probands. We demonstrate that PAK7 is developmentally co-expressed with another known psychosis risk gene (DISC1) suggesting a potential molecular mechanism involving aberrant synapse development and plasticity.
Subject(s)
Bipolar Disorder/genetics , Chromosome Duplication , Nerve Tissue Proteins/metabolism , Psychotic Disorders/genetics , Schizophrenia/genetics , p21-Activated Kinases/genetics , p21-Activated Kinases/metabolism , Bipolar Disorder/pathology , Case-Control Studies , Chromosome Breakpoints , DNA Copy Number Variations , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Linkage Disequilibrium , Male , Neuronal Plasticity , Psychotic Disorders/pathology , Schizophrenia/pathology , White People/geneticsABSTRACT
SUMMARY: Here we present INRICH (INterval enRICHment analysis), a pathway-based genome-wide association analysis tool that tests for enriched association signals of predefined gene-sets across independent genomic intervals. INRICH has wide applicability, fast running time and, most importantly, robustness to potential genomic biases and confounding factors. Such factors, including varying gene size and single-nucleotide polymorphism density, linkage disequilibrium within and between genes and overlapping genes with similar annotations, are often not accounted for by existing gene-set enrichment methods. By using a genomic permutation procedure, we generate experiment-wide empirical significance values, corrected for the total number of sets tested, implicitly taking overlap of sets into account. By simulation we confirm a properly controlled type I error rate and reasonable power of INRICH under diverse parameter settings. As a proof of principle, we describe the application of INRICH on the NHGRI GWAS catalog. AVAILABILITY: A standalone C++ program, user manual and datasets can be freely downloaded from: http://atgu.mgh.harvard.edu/inrich/.
Subject(s)
Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Software , Genes , Genomics/methods , Humans , Linkage DisequilibriumABSTRACT
SUMMARY: zCall is a variant caller specifically designed for calling rare single-nucleotide polymorphisms from array-based technology. This caller is implemented as a post-processing step after a default calling algorithm has been applied. The algorithm uses the intensity profile of the common allele homozygote cluster to define the location of the other two genotype clusters. We demonstrate improved detection of rare alleles when applying zCall to samples that have both Illumina Infinium HumanExome BeadChip and exome sequencing data available. AVAILABILITY: http://atguweb.mgh.harvard.edu/apps/zcall. CONTACT: bneale@broadinstitute.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.