ABSTRACT
High-throughput proteomics platforms measuring thousands of proteins in plasma combined with genomic and phenotypic information have the power to bridge the gap between the genome and diseases. Here we performed association studies of Olink Explore 3072 data generated by the UK Biobank Pharma Proteomics Project1 on plasma samples from more than 50,000 UK Biobank participants with phenotypic and genotypic data, stratifying on British or Irish, African and South Asian ancestries. We compared the results with those of a SomaScan v4 study on plasma from 36,000 Icelandic people2, for 1,514 of whom Olink data were also available. We found modest correlation between the two platforms. Although cis protein quantitative trait loci were detected for a similar absolute number of assays on the two platforms (2,101 on Olink versus 2,120 on SomaScan), the proportion of assays with such supporting evidence for assay performance was higher on the Olink platform (72% versus 43%). A considerable number of proteins had genomic associations that differed between the platforms. We provide examples where differences between platforms may influence conclusions drawn from the integration of protein levels with the study of diseases. We demonstrate how leveraging the diverse ancestries of participants in the UK Biobank helps to detect novel associations and refine genomic location. Our results show the value of the information provided by the two most commonly used high-throughput proteomics platforms and demonstrate the differences between them that at times provides useful complementarity.
Subject(s)
Blood Proteins , Disease Susceptibility , Genomics , Genotype , Phenotype , Proteomics , Humans , Africa/ethnology , Asia, Southern/ethnology , Biological Specimen Banks , Blood Proteins/analysis , Blood Proteins/genetics , Datasets as Topic , Genome, Human/genetics , Iceland/ethnology , Ireland/ethnology , Plasma/chemistry , Proteome/analysis , Proteome/genetics , Proteomics/methods , Quantitative Trait Loci , United KingdomABSTRACT
BACKGROUND: In 2021, the American College of Medical Genetics and Genomics (ACMG) recommended reporting actionable genotypes in 73 genes associated with diseases for which preventive or therapeutic measures are available. Evaluations of the association of actionable genotypes in these genes with life span are currently lacking. METHODS: We assessed the prevalence of coding and splice variants in genes on the ACMG Secondary Findings, version 3.0 (ACMG SF v3.0), list in the genomes of 57,933 Icelanders. We assigned pathogenicity to all reviewed variants using reported evidence in the ClinVar database, the frequency of variants, and their associations with disease to create a manually curated set of actionable genotypes (variants). We assessed the relationship between these genotypes and life span and further examined the specific causes of death among carriers. RESULTS: Through manual curation of 4405 sequence variants in the ACMG SF v3.0 genes, we identified 235 actionable genotypes in 53 genes. Of the 57,933 participants, 2306 (4.0%) carried at least one actionable genotype. We found shorter median survival among persons carrying actionable genotypes than among noncarriers. Specifically, we found that carrying an actionable genotype in a cancer gene was associated with survival that was 3 years shorter than that among noncarriers, with causes of death among carriers attributed primarily to cancer-related conditions. Furthermore, we found evidence of association between carrying an actionable genotype in certain genes in the cardiovascular disease group and a reduced life span. CONCLUSIONS: On the basis of the ACMG SF v3.0 guidelines, we found that approximately 1 in 25 Icelanders carried an actionable genotype and that carrying such a genotype was associated with a reduced life span. (Funded by deCODE Genetics-Amgen.).
Subject(s)
Disease , Genomics , Longevity , Humans , Alleles , Genetic Testing , Genetic Variation , Genotype , Iceland/epidemiology , Longevity/genetics , Disease/genetics , Cardiovascular Diseases/genetics , Neoplasms/geneticsABSTRACT
Autoimmune thyroid disease is the most common autoimmune disease and is highly heritable1. Here, by using a genome-wide association study of 30,234 cases and 725,172 controls from Iceland and the UK Biobank, we find 99 sequence variants at 93 loci, of which 84 variants are previously unreported2-7. A low-frequency (1.36%) intronic variant in FLT3 (rs76428106-C) has the largest effect on risk of autoimmune thyroid disease (odds ratio (OR) = 1.46, P = 2.37 × 10-24). rs76428106-C is also associated with systemic lupus erythematosus (OR = 1.90, P = 6.46 × 10-4), rheumatoid factor and/or anti-CCP-positive rheumatoid arthritis (OR = 1.41, P = 4.31 × 10-4) and coeliac disease (OR = 1.62, P = 1.20 × 10-4). FLT3 encodes fms-related tyrosine kinase 3, a receptor that regulates haematopoietic progenitor and dendritic cells. RNA sequencing revealed that rs76428106-C generates a cryptic splice site, which introduces a stop codon in 30% of transcripts that are predicted to encode a truncated protein, which lacks its tyrosine kinase domains. Each copy of rs76428106-C doubles the plasma levels of the FTL3 ligand. Activating somatic mutations in FLT3 are associated with acute myeloid leukaemia8 with a poor prognosis and rs76428106-C also predisposes individuals to acute myeloid leukaemia (OR = 1.90, P = 5.40 × 10-3). Thus, a predicted loss-of-function germline mutation in FLT3 causes a reduction in full-length FLT3, with a compensatory increase in the levels of its ligand and an increased disease risk, similar to that of a gain-of-function mutation.
Subject(s)
Codon, Nonsense/genetics , Genetic Predisposition to Disease/genetics , Ligands , Mutation , Thyroiditis, Autoimmune/genetics , fms-Like Tyrosine Kinase 3/genetics , fms-Like Tyrosine Kinase 3/metabolism , Alleles , Autoimmune Diseases/genetics , Databases, Factual , Genome-Wide Association Study , Germ-Line Mutation , Humans , Iceland , Introns/genetics , Leukemia, Myeloid, Acute , Loss of Function Mutation , RNA Splice Sites/genetics , United KingdomABSTRACT
BACKGROUND: During the current worldwide pandemic, coronavirus disease 2019 (Covid-19) was first diagnosed in Iceland at the end of February. However, data are limited on how SARS-CoV-2, the virus that causes Covid-19, enters and spreads in a population. METHODS: We targeted testing to persons living in Iceland who were at high risk for infection (mainly those who were symptomatic, had recently traveled to high-risk countries, or had contact with infected persons). We also carried out population screening using two strategies: issuing an open invitation to 10,797 persons and sending random invitations to 2283 persons. We sequenced SARS-CoV-2 from 643 samples. RESULTS: As of April 4, a total of 1221 of 9199 persons (13.3%) who were recruited for targeted testing had positive results for infection with SARS-CoV-2. Of those tested in the general population, 87 (0.8%) in the open-invitation screening and 13 (0.6%) in the random-population screening tested positive for the virus. In total, 6% of the population was screened. Most persons in the targeted-testing group who received positive tests early in the study had recently traveled internationally, in contrast to those who tested positive later in the study. Children under 10 years of age were less likely to receive a positive result than were persons 10 years of age or older, with percentages of 6.7% and 13.7%, respectively, for targeted testing; in the population screening, no child under 10 years of age had a positive result, as compared with 0.8% of those 10 years of age or older. Fewer females than males received positive results both in targeted testing (11.0% vs. 16.7%) and in population screening (0.6% vs. 0.9%). The haplotypes of the sequenced SARS-CoV-2 viruses were diverse and changed over time. The percentage of infected participants that was determined through population screening remained stable for the 20-day duration of screening. CONCLUSIONS: In a population-based study in Iceland, children under 10 years of age and females had a lower incidence of SARS-CoV-2 infection than adolescents or adults and males. The proportion of infected persons identified through population screening did not change substantially during the screening period, which was consistent with a beneficial effect of containment efforts. (Funded by deCODE Genetics-Amgen.).
Subject(s)
Coronavirus Infections/epidemiology , Epidemiological Monitoring , Pneumonia, Viral/epidemiology , Adolescent , Adult , Aged , Aged, 80 and over , Betacoronavirus/genetics , COVID-19 , Child , Child, Preschool , Contact Tracing , Female , Haplotypes , Humans , Iceland/epidemiology , Infant , Male , Mass Screening , Middle Aged , Pandemics , SARS-CoV-2 , Travel , Young AdultABSTRACT
The characterization of mutational processes that generate sequence diversity in the human genome is of paramount importance both to medical genetics and to evolutionary studies. To understand how the age and sex of transmitting parents affect de novo mutations, here we sequence 1,548 Icelanders, their parents, and, for a subset of 225, at least one child, to 35× genome-wide coverage. We find 108,778 de novo mutations, both single nucleotide polymorphisms and indels, and determine the parent of origin of 42,961. The number of de novo mutations from mothers increases by 0.37 per year of age (95% CI 0.32-0.43), a quarter of the 1.51 per year from fathers (95% CI 1.45-1.57). The number of clustered mutations increases faster with the mother's age than with the father's, and the genomic span of maternal de novo mutation clusters is greater than that of paternal ones. The types of de novo mutation from mothers change substantially with age, with a 0.26% (95% CI 0.19-0.33%) decrease in cytosine-phosphate-guanine to thymine-phosphate-guanine (CpG>TpG) de novo mutations and a 0.33% (95% CI 0.28-0.38%) increase in C>G de novo mutations per year, respectively. Remarkably, these age-related changes are not distributed uniformly across the genome. A striking example is a 20 megabase region on chromosome 8p, with a maternal C>G mutation rate that is up to 50-fold greater than the rest of the genome. The age-related accumulation of maternal non-crossover gene conversions also mostly occurs within these regions. Increased sequence diversity and linkage disequilibrium of C>G variants within regions affected by excess maternal mutations indicate that the underlying mutational process has persisted in humans for thousands of years. Moreover, the regional excess of C>G variation in humans is largely shared by chimpanzees, less by gorillas, and is almost absent from orangutans. This demonstrates that sequence diversity in humans results from evolving interactions between age, sex, mutation type, and genomic location.
Subject(s)
Aging/genetics , Germ-Line Mutation/genetics , Maternal Age , Mutagenesis , Parents , Paternal Age , Adolescent , Adult , Aged , Animals , Child , Chromosomes, Human, Pair 8/genetics , Evolution, Molecular , Female , GC Rich Sequence , Genome, Human/genetics , Gorilla gorilla/genetics , Humans , INDEL Mutation , Iceland , Linkage Disequilibrium/genetics , Male , Middle Aged , Mutation Rate , Pan troglodytes/genetics , Polymorphism, Single Nucleotide , Pongo/genetics , Young AdultABSTRACT
Importance: Whether protein risk scores derived from a single plasma sample could be useful for risk assessment for atherosclerotic cardiovascular disease (ASCVD), in conjunction with clinical risk factors and polygenic risk scores, is uncertain. Objective: To develop protein risk scores for ASCVD risk prediction and compare them to clinical risk factors and polygenic risk scores in primary and secondary event populations. Design, Setting, and Participants: The primary analysis was a retrospective study of primary events among 13â¯540 individuals in Iceland (aged 40-75 years) with proteomics data and no history of major ASCVD events at recruitment (study duration, August 23, 2000 until October 26, 2006; follow-up through 2018). We also analyzed a secondary event population from a randomized, double-blind lipid-lowering clinical trial (2013-2016), consisting of individuals with stable ASCVD receiving statin therapy and for whom proteomic data were available for 6791 individuals. Exposures: Protein risk scores (based on 4963 plasma protein levels and developed in a training set in the primary event population); polygenic risk scores for coronary artery disease and stroke; and clinical risk factors that included age, sex, statin use, hypertension treatment, type 2 diabetes, body mass index, and smoking status at the time of plasma sampling. Main Outcomes and Measures: Outcomes were composites of myocardial infarction, stroke, and coronary heart disease death or cardiovascular death. Performance was evaluated using Cox survival models and measures of discrimination and reclassification that accounted for the competing risk of non-ASCVD death. Results: In the primary event population test set (4018 individuals [59.0% women]; 465 events; median follow-up, 15.8 years), the protein risk score had a hazard ratio (HR) of 1.93 per SD (95% CI, 1.75 to 2.13). Addition of protein risk score and polygenic risk scores significantly increased the C index when added to a clinical risk factor model (C index change, 0.022 [95% CI, 0.007 to 0.038]). Addition of the protein risk score alone to a clinical risk factor model also led to a significantly increased C index (difference, 0.014 [95% CI, 0.002 to 0.028]). Among White individuals in the secondary event population (6307 participants; 432 events; median follow-up, 2.2 years), the protein risk score had an HR of 1.62 per SD (95% CI, 1.48 to 1.79) and significantly increased C index when added to a clinical risk factor model (C index change, 0.026 [95% CI, 0.011 to 0.042]). The protein risk score was significantly associated with major adverse cardiovascular events among individuals of African and Asian ancestries in the secondary event population. Conclusions and Relevance: A protein risk score was significantly associated with ASCVD events in primary and secondary event populations. When added to clinical risk factors, the protein risk score and polygenic risk score both provided statistically significant but modest improvement in discrimination.
Subject(s)
Atherosclerosis , Cardiovascular Diseases , Proteomics , Female , Humans , Male , Atherosclerosis/epidemiology , Atherosclerosis/genetics , Diabetes Mellitus, Type 2/complications , Diabetes Mellitus, Type 2/epidemiology , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Retrospective Studies , Stroke , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/etiology , Cardiovascular Diseases/mortality , Cardiovascular Diseases/therapy , Risk Assessment , Adult , Middle Aged , Aged , Iceland/epidemiology , Randomized Controlled Trials as TopicABSTRACT
OBJECTIVES: To find causal genes for rheumatoid arthritis (RA) and its seropositive (RF and/or ACPA positive) and seronegative subsets. METHODS: We performed a genome-wide association study (GWAS) of 31 313 RA cases (68% seropositive) and ~1 million controls from Northwestern Europe. We searched for causal genes outside the HLA-locus through effect on coding, mRNA expression in several tissues and/or levels of plasma proteins (SomaScan) and did network analysis (Qiagen). RESULTS: We found 25 sequence variants for RA overall, 33 for seropositive and 2 for seronegative RA, altogether 37 sequence variants at 34 non-HLA loci, of which 15 are novel. Genomic, transcriptomic and proteomic analysis of these yielded 25 causal genes in seropositive RA and additional two overall. Most encode proteins in the network of interferon-alpha/beta and IL-12/23 that signal through the JAK/STAT-pathway. Highlighting those with largest effect on seropositive RA, a rare missense variant in STAT4 (rs140675301-A) that is independent of reported non-coding STAT4-variants, increases the risk of seropositive RA 2.27-fold (p=2.1×10-9), more than the rs2476601-A missense variant in PTPN22 (OR=1.59, p=1.3×10-160). STAT4 rs140675301-A replaces hydrophilic glutamic acid with hydrophobic valine (Glu128Val) in a conserved, surface-exposed loop. A stop-mutation (rs76428106-C) in FLT3 increases seropositive RA risk (OR=1.35, p=6.6×10-11). Independent missense variants in TYK2 (rs34536443-C, rs12720356-C, rs35018800-A, latter two novel) associate with decreased risk of seropositive RA (ORs=0.63-0.87, p=10-9-10-27) and decreased plasma levels of interferon-alpha/beta receptor 1 that signals through TYK2/JAK1/STAT4. CONCLUSION: Sequence variants pointing to causal genes in the JAK/STAT pathway have largest effect on seropositive RA, while associations with seronegative RA remain scarce.
Subject(s)
Arthritis, Rheumatoid , Genome-Wide Association Study , Arthritis, Rheumatoid/genetics , Genetic Predisposition to Disease/genetics , Humans , Interferon-alpha , Janus Kinases/genetics , Protein Tyrosine Phosphatase, Non-Receptor Type 22/genetics , Proteomics , STAT Transcription Factors/genetics , Signal Transduction/geneticsABSTRACT
Urine dipstick tests are widely used in routine medical care to diagnose kidney and urinary tract and metabolic diseases. Several environmental factors are known to affect the test results, whereas the effects of genetic diversity are largely unknown. We tested 32.5 million sequence variants for association with urinary biomarkers in a set of 150 274 Icelanders with urine dipstick measurements. We detected 20 association signals, of which 14 are novel, associating with at least one of five clinical entities defined by the urine dipstick: glucosuria, ketonuria, proteinuria, hematuria and urine pH. These include three independent glucosuria variants at SLC5A2, the gene encoding the sodium-dependent glucose transporter (SGLT2), a protein targeted pharmacologically to increase urinary glucose excretion in the treatment of diabetes. Two variants associating with proteinuria are in LRP2 and CUBN, encoding the co-transporters megalin and cubilin, respectively, that mediate proximal tubule protein uptake. One of the hematuria-associated variants is a rare, previously unreported 2.5 kb exonic deletion in COL4A3. Of the four signals associated with urine pH, we note that the pH-increasing alleles of two variants (POU2AF1, WDR72) associate significantly with increased risk of kidney stones. Our results reveal that genetic factors affect variability in urinary biomarkers, in both a disease dependent and independent context.
Subject(s)
Biomarkers/analysis , Biomarkers/urine , Genetic Variation/genetics , Adult , Aged , Alleles , Female , Hematuria/genetics , Hematuria/urine , Humans , Hydrogen-Ion Concentration , Iceland , Ketosis/genetics , Ketosis/urine , Kidney/metabolism , Male , Middle Aged , Proteinuria/genetics , Proteinuria/urine , Sodium-Glucose Transporter 2/genetics , Whole Genome Sequencing/methodsABSTRACT
Epidemiological and genetic association studies show that genetics play an important role in the attainment of education. Here, we investigate the effect of this genetic component on the reproductive history of 109,120 Icelanders and the consequent impact on the gene pool over time. We show that an educational attainment polygenic score, POLYEDU, constructed from results of a recent study is associated with delayed reproduction (P < 10-100) and fewer children overall. The effect is stronger for women and remains highly significant after adjusting for educational attainment. Based on 129,808 Icelanders born between 1910 and 1990, we find that the average POLYEDU has been declining at a rate of â¼0.010 standard units per decade, which is substantial on an evolutionary timescale. Most importantly, because POLYEDU only captures a fraction of the overall underlying genetic component the latter could be declining at a rate that is two to three times faster.
Subject(s)
Educational Status , Genetic Variation , Adolescent , Adult , Female , Fertility , Genome, Human , Genotype , Humans , Iceland , Intelligence , Male , Young AdultABSTRACT
IL-33 is a tissue-derived cytokine that induces and amplifies eosinophilic inflammation and has emerged as a promising new drug target for asthma and allergic disease. Common variants at IL33 and IL1RL1, encoding the IL-33 receptor ST2, associate with eosinophil counts and asthma. Through whole-genome sequencing and imputation into the Icelandic population, we found a rare variant in IL33 (NM_001199640:exon7:c.487-1G>C (rs146597587-C), allele frequency = 0.65%) that disrupts a canonical splice acceptor site before the last coding exon. It is also found at low frequency in European populations. rs146597587-C associates with lower eosinophil counts (ß = -0.21 SD, P = 2.5×10-16, N = 103,104), and reduced risk of asthma in Europeans (OR = 0.47; 95%CI: 0.32, 0.70, P = 1.8×10-4, N cases = 6,465, N controls = 302,977). Heterozygotes have about 40% lower total IL33 mRNA expression than non-carriers and allele-specific analysis based on RNA sequencing and phased genotypes shows that only 20% of the total expression is from the mutated chromosome. In half of those transcripts the mutation causes retention of the last intron, predicted to result in a premature stop codon that leads to truncation of 66 amino acids. The truncated IL-33 has normal intracellular localization but neither binds IL-33R/ST2 nor activates ST2-expressing cells. Together these data demonstrate that rs146597587-C is a loss of function mutation and support the hypothesis that IL-33 haploinsufficiency protects against asthma.
Subject(s)
Asthma/genetics , Eosinophils/metabolism , Interleukin-33/genetics , Mutation , Adolescent , Adult , Aged , Aged, 80 and over , Alternative Splicing , Animals , Binding Sites , Biological Assay , Child , Child, Preschool , Denmark , Female , Gene Frequency , Genetic Predisposition to Disease , Genotype , Heterozygote , Humans , Iceland , Infant , Infant, Newborn , Introns , Male , Mice , Mice, Transgenic , Middle Aged , Netherlands , Young AdultABSTRACT
Common sequence variants at the haptoglobin gene (HP) have been associated with blood lipid levels. Through whole-genome sequencing of 8,453 Icelanders, we discovered a splice donor founder mutation in HP (NM_001126102.1:c.190 + 1G > C, minor allele frequency = 0.56%). This mutation occurs on the HP1 allele of the common copy number variant in HP and leads to a loss of function of HP1. It associates with lower levels of haptoglobin (P = 2.1 × 10-54), higher levels of non-high density lipoprotein cholesterol (ß = 0.26 mmol/l, P = 2.6 × 10-9) and greater risk of coronary artery disease (odds ratio = 1.30, 95% confidence interval: 1.10-1.54, P = 0.0024). Through haplotype analysis and with RNA sequencing, we provide evidence of a causal relationship between one of the two haptoglobin isoforms, namely Hp1, and lower levels of non-HDL cholesterol. Furthermore, we show that the HP1 allele associates with various other quantitative biological traits.
Subject(s)
Coronary Artery Disease/genetics , Haptoglobins/genetics , Adult , Alleles , Base Sequence , Coronary Artery Disease/metabolism , DNA Copy Number Variations/genetics , Female , Gene Frequency/genetics , Genetic Association Studies/methods , Genetic Variation , Haptoglobins/metabolism , Humans , Iceland , Lipids/blood , Lipids/genetics , Lipoproteins/genetics , Male , Mutation , Odds Ratio , RNA Splice Sites/genetics , Risk FactorsABSTRACT
BACKGROUND: Several sequence variants are known to have effects on serum levels of non-high-density lipoprotein (HDL) cholesterol that alter the risk of coronary artery disease. METHODS: We sequenced the genomes of 2636 Icelanders and found variants that we then imputed into the genomes of approximately 398,000 Icelanders. We tested for association between these imputed variants and non-HDL cholesterol levels in 119,146 samples. We then performed replication testing in two populations of European descent. We assessed the effects of an implicated loss-of-function variant on the risk of coronary artery disease in 42,524 case patients and 249,414 controls from five European ancestry populations. An augmented set of genomes was screened for additional loss-of-function variants in a target gene. We evaluated the effect of an implicated variant on protein stability. RESULTS: We found a rare noncoding 12-base-pair (bp) deletion (del12) in intron 4 of ASGR1, which encodes a subunit of the asialoglycoprotein receptor, a lectin that plays a role in the homeostasis of circulating glycoproteins. The del12 mutation activates a cryptic splice site, leading to a frameshift mutation and a premature stop codon that renders a truncated protein prone to degradation. Heterozygous carriers of the mutation (1 in 120 persons in our study population) had a lower level of non-HDL cholesterol than noncarriers, a difference of 15.3 mg per deciliter (0.40 mmol per liter) (P=1.0×10(-16)), and a lower risk of coronary artery disease (by 34%; 95% confidence interval, 21 to 45; P=4.0×10(-6)). In a larger set of sequenced samples from Icelanders, we found another loss-of-function ASGR1 variant (p.W158X, carried by 1 in 1850 persons) that was also associated with lower levels of non-HDL cholesterol (P=1.8×10(-3)). CONCLUSIONS: ASGR1 haploinsufficiency was associated with reduced levels of non-HDL cholesterol and a reduced risk of coronary artery disease. (Funded by the National Institutes of Health and others.).
Subject(s)
Asialoglycoprotein Receptor/genetics , Cholesterol/blood , Coronary Artery Disease/genetics , Haploinsufficiency , Adult , Aged , Aged, 80 and over , Base Sequence , Female , Genetic Predisposition to Disease , Humans , Iceland , Kaplan-Meier Estimate , Male , Middle Aged , Molecular Sequence Data , Myocardial Infarction/genetics , Risk , Sequence Analysis, DNA , White People/geneticsABSTRACT
Clonal hematopoiesis (CH) arises when a substantial proportion of mature blood cells is derived from a single dominant hematopoietic stem cell lineage. Somatic mutations in candidate driver (CD) genes are thought to be responsible for at least some cases of CH. Using whole-genome sequencing of 11 262 Icelanders, we found 1403 cases of CH by using barcodes of mosaic somatic mutations in peripheral blood, whether or not they have a mutation in a CD gene. We find that CH is very common in the elderly, trending toward inevitability. We show that somatic mutations in TET2, DNMT3A, ASXL1, and PPM1D are associated with CH at high significance. However, known CD mutations were evident in only a fraction of CH cases. Nevertheless, the highly prevalent CH we detect associates with increased mortality rates, risk for hematological malignancy, smoking behavior, telomere length, Y-chromosome loss, and other phenotypic characteristics. Modeling suggests some CH cases could arise in the absence of CD mutations as a result of neutral drift acting on a small population of active hematopoietic stem cells. Finally, we find a germline deletion in intron 3 of the telomerase reverse transcriptase (TERT) gene that predisposes to CH (rs34002450; P = 7.4 × 10-12; odds ratio, 1.37).
Subject(s)
DNA (Cytosine-5-)-Methyltransferases/genetics , DNA-Binding Proteins/genetics , Hematopoiesis , Hematopoietic Stem Cells/cytology , Mutation , Protein Phosphatase 2C/genetics , Proto-Oncogene Proteins/genetics , Repressor Proteins/genetics , Adult , Age Factors , Aged , Aged, 80 and over , Clone Cells , DNA Methyltransferase 3A , Dioxygenases , Female , Hematologic Neoplasms/epidemiology , Hematologic Neoplasms/genetics , Hematopoietic Stem Cells/metabolism , Humans , Male , Middle Aged , Risk FactorsABSTRACT
Low bone mineral density (BMD) is used as a parameter of osteoporosis. Genome-wide association studies of BMD have hitherto focused on BMD as a quantitative trait, yielding common variants of small effects that contribute to the population diversity in BMD. Here we use BMD as a dichotomous trait, searching for variants that may have a direct effect on the risk of pathologically low BMD rather than on the regulation of BMD in the healthy population. Through whole-genome sequencing of Icelandic individuals, we found a rare nonsense mutation within the leucine-rich-repeat-containing G-protein-coupled receptor 4 (LGR4) gene (c.376C>T) that is strongly associated with low BMD, and with osteoporotic fractures. This mutation leads to termination of LGR4 at position 126 and fully disrupts its function. The c.376C>T mutation is also associated with electrolyte imbalance, late onset of menarche and reduced testosterone levels, as well as an increased risk of squamous cell carcinoma of the skin and biliary tract cancer. Interestingly, the phenotype of carriers of the c.376C>T mutation overlaps that of Lgr4 mutant mice.
Subject(s)
Biliary Tract Neoplasms/genetics , Bone Density/genetics , Carcinoma, Squamous Cell/genetics , Codon, Nonsense/genetics , Osteoporotic Fractures/genetics , Receptors, G-Protein-Coupled/genetics , Skin Neoplasms/genetics , Water-Electrolyte Imbalance/genetics , Animals , Australia , Denmark , Down-Regulation/genetics , Female , Heterozygote , Humans , Iceland , Male , Menarche/genetics , Mice , Mice, Knockout , Phenotype , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/deficiency , Receptors, G-Protein-Coupled/metabolism , Testosterone/analysisABSTRACT
Mutation of the DNA molecule is one of the most fundamental processes in biology. In this study, we use 283 parent-offspring trios to estimate the rate of mutation for both single nucleotide variants (SNVs) and short length variants (indels) in humans and examine the mutation process. We found 17812 SNVs, corresponding to a mutation rate of 1.29 × 10-8 per position per generation (PPPG) and 1282 indels corresponding to a rate of 9.29 × 10-10 PPPG. We estimate that around 3% of human de novo SNVs are part of a multi-nucleotide mutation (MNM), with 558 (3.1%) of mutations positioned less than 20kb from another mutation in the same individual (median distance of 525bp). The rate of de novo mutations is greater in late replicating regions (p = 8.29 × 10-19) and nearer recombination events (p = 0.0038) than elsewhere in the genome.
Subject(s)
Genome, Human , INDEL Mutation/genetics , Mutation Rate , DNA Mutational Analysis , High-Throughput Nucleotide Sequencing , Humans , Polymorphism, Single Nucleotide/genetics , Recombination, Genetic/geneticsABSTRACT
Transcriptional and splicing anomalies have been observed in intron 8 of the CASP8 gene (encoding procaspase-8) in association with cutaneous basal-cell carcinoma (BCC) and linked to a germline SNP rs700635. Here, we show that the rs700635[C] allele, which is associated with increased risk of BCC and breast cancer, is protective against prostate cancer [odds ratio (OR) = 0.91, P = 1.0 × 10(-6)]. rs700635[C] is also associated with failures to correctly splice out CASP8 intron 8 in breast and prostate tumours and in corresponding normal tissues. Investigation of rs700635[C] carriers revealed that they have a human-specific short interspersed element-variable number of tandem repeat-Alu (SINE-VNTR-Alu), subfamily-E retrotransposon (SVA-E) inserted into CASP8 intron 8. The SVA-E shows evidence of prior activity, because it has transduced some CASP8 sequences during subsequent retrotransposition events. Whole-genome sequence (WGS) data were used to tag the SVA-E with a surrogate SNP rs1035142[T] (r(2) = 0.999), which showed associations with both the splicing anomalies (P = 6.5 × 10(-32)) and with protection against prostate cancer (OR = 0.91, P = 3.8 × 10(-7)).
Subject(s)
Breast Neoplasms/genetics , Carcinoma, Basal Cell/genetics , Caspase 8/genetics , Prostatic Neoplasms/genetics , RNA Splicing , Retroelements , Skin Neoplasms/genetics , Adult , Aged , Aged, 80 and over , Alleles , Base Sequence , Breast Neoplasms/metabolism , Breast Neoplasms/pathology , Carcinoma, Basal Cell/metabolism , Carcinoma, Basal Cell/pathology , Caspase 8/metabolism , Female , Genome-Wide Association Study , Humans , Introns , Male , Middle Aged , Molecular Sequence Data , Odds Ratio , Polymorphism, Single Nucleotide , Prostatic Neoplasms/metabolism , Prostatic Neoplasms/pathology , Prostatic Neoplasms/prevention & control , Protective Factors , Skin Neoplasms/metabolism , Skin Neoplasms/pathologyABSTRACT
Mutations generate sequence diversity and provide a substrate for selection. The rate of de novo mutations is therefore of major importance to evolution. Here we conduct a study of genome-wide mutation rates by sequencing the entire genomes of 78 Icelandic parent-offspring trios at high coverage. We show that in our samples, with an average father's age of 29.7, the average de novo mutation rate is 1.20 × 10(-8) per nucleotide per generation. Most notably, the diversity in mutation rate of single nucleotide polymorphisms is dominated by the age of the father at conception of the child. The effect is an increase of about two mutations per year. An exponential model estimates paternal mutations doubling every 16.5 years. After accounting for random Poisson variation, father's age is estimated to explain nearly all of the remaining variation in the de novo mutation counts. These observations shed light on the importance of the father's age on the risk of diseases such as schizophrenia and autism.
Subject(s)
Autistic Disorder/genetics , Genetic Predisposition to Disease , Mutation Rate , Paternal Age , Schizophrenia/genetics , Adult , Autistic Disorder/epidemiology , Autistic Disorder/etiology , Chromosomes, Human/genetics , Female , Genome, Human/genetics , Humans , Iceland/epidemiology , Male , Middle Aged , Mothers , Ovum/metabolism , Pedigree , Polymorphism, Single Nucleotide/genetics , Risk Factors , Schizophrenia/epidemiology , Schizophrenia/etiology , Selection, Genetic/genetics , Sequence Analysis, DNA , Spermatozoa/metabolism , Young AdultABSTRACT
AIMS: Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia in man, causing substantial morbidity and mortality with a major worldwide public health impact. It is increasingly recognized as a highly heritable condition. This study aimed to determine genetic risk factors for early-onset AF. METHODS AND RESULTS: We sequenced the whole genomes of 8453 Icelanders and imputed genotypes of the 25.5 million sequence variants we discovered into 1799 Icelanders with early-onset AF (diagnosed before 60 years of age) and 337 453 controls. Each sequence variant was tested for association based on multiplicative and recessive inheritance models. We discovered a rare frameshift deletion in the myosin MYL4 gene (c.234delC) that associates with early-onset AF under a recessive mode of inheritance (allelic frequency = 0.58%). We found eight homozygous carriers of the mutation, all of whom had early-onset AF. Six of the homozygotes were diagnosed by the age of 30 and the remaining two in their 50s. Three of the homozygotes had received pacemaker implantations due to sick sinus syndrome, three had suffered an ischemic stroke, and one suffered sudden cardiac death. CONCLUSIONS: Through a population approach we found a loss of function mutation in the myosin gene MYL4 that, in the homozygous state, is completely penetrant for early-onset AF. The finding may provide novel mechanistic insight into the pathophysiology of this complex arrhythmia.
Subject(s)
Atrial Fibrillation/genetics , Frameshift Mutation/genetics , Myosin Light Chains/genetics , Aged , Atrial Fibrillation/ethnology , Case-Control Studies , Death, Sudden, Cardiac/ethnology , Death, Sudden, Cardiac/etiology , Female , Gene Deletion , Genes, Recessive/genetics , Genome-Wide Association Study/methods , Heterozygote , Homozygote , Humans , Iceland/ethnology , Male , Middle Aged , Pedigree , Risk Factors , Sarcomeres , Sequence Alignment/methods , Sick Sinus Syndrome/ethnology , Sick Sinus Syndrome/genetics , Stroke/ethnology , Stroke/geneticsABSTRACT
SUMMARY: Large resequencing projects require a significant amount of storage for raw sequences, as well as alignment files. Because the raw sequences are redundant once the alignment has been generated, it is possible to keep only the alignment files. We present BamHash, a checksum based method to ensure that the read pairs in FASTQ files match exactly the read pairs stored in BAM files, regardless of the ordering of reads. BamHash can be used to verify the integrity of the files stored and discover any discrepancies. Thus, BamHash can be used to determine if it is safe to delete the FASTQ files storing raw sequencing read after alignment, without the loss of data. AVAILABILITY AND IMPLEMENTATION: The software is implemented in C++, GPL licensed and available at https://github.com/DecodeGenetics/BamHash CONTACT: pmelsted@hi.is.