Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
2.
Commun Biol ; 6(1): 703, 2023 07 10.
Article in English | MEDLINE | ID: mdl-37430141

ABSTRACT

Urticaria is a skin disorder characterized by outbreaks of raised pruritic wheals. In order to identify sequence variants associated with urticaria, we performed a meta-analysis of genome-wide association studies for urticaria with a total of 40,694 cases and 1,230,001 controls from Iceland, the UK, Finland, and Japan. We also performed transcriptome- and proteome-wide analyses in Iceland and the UK. We found nine sequence variants at nine loci associating with urticaria. The variants are at genes participating in type 2 immune responses and/or mast cell biology (CBLB, FCER1A, GCSAML, STAT6, TPSD1, ZFPM1), the innate immunity (C4), and NF-κB signaling. The most significant association was observed for the splice-donor variant rs56043070[A] (hg38: chr1:247556467) in GCSAML (MAF = 6.6%, OR = 1.24 (95%CI: 1.20-1.28), P-value = 3.6 × 10-44). We assessed the effects of the variants on transcripts, and levels of proteins relevant to urticaria pathophysiology. Our results emphasize the role of type 2 immune response and mast cell activation in the pathogenesis of urticaria. Our findings may point to an IgE-independent urticaria pathway that could help address unmet clinical need.


Subject(s)
Genome-Wide Association Study , Urticaria , Humans , Mast Cells , Urticaria/genetics , RNA Splicing , Proteome
3.
Nat Commun ; 14(1): 3855, 2023 06 29.
Article in English | MEDLINE | ID: mdl-37386006

ABSTRACT

Microsatellites are polymorphic tracts of short tandem repeats with one to six base-pair (bp) motifs and are some of the most polymorphic variants in the genome. Using 6084 Icelandic parent-offspring trios we estimate 63.7 (95% CI: 61.9-65.4) microsatellite de novo mutations (mDNMs) per offspring per generation, excluding one bp repeats motifs (homopolymers) the estimate is 48.2 mDNMs (95% CI: 46.7-49.6). Paternal mDNMs occur at longer repeats than maternal ones, which are in turn larger with a mean size of 3.4 bp vs 3.1 bp for paternal ones. mDNMs increase by 0.97 (95% CI: 0.90-1.04) and 0.31 (95% CI: 0.25-0.37) per year of father's and mother's age at conception, respectively. Here, we find two independent coding variants that associate with the number of mDNMs transmitted to offspring; The minor allele of a missense variant (allele frequency (AF) = 1.9%) in MSH2, a mismatch repair gene, increases transmitted mDNMs from both parents (effect: 13.1 paternal and 7.8 maternal mDNMs). A synonymous variant (AF = 20.3%) in NEIL2, a DNA damage repair gene, increases paternally transmitted mDNMs (effect: 4.4 mDNMs). Thus, the microsatellite mutation rate in humans is in part under genetic control.


Subject(s)
DNA Mismatch Repair , Germ-Line Mutation , Humans , Alleles , Germ-Line Mutation/genetics , Microsatellite Repeats/genetics , Germ Cells
4.
Nat Commun ; 14(1): 3453, 2023 06 10.
Article in English | MEDLINE | ID: mdl-37301908

ABSTRACT

Genotypes causing pregnancy loss and perinatal mortality are depleted among living individuals and are therefore difficult to find. To explore genetic causes of recessive lethality, we searched for sequence variants with deficit of homozygosity among 1.52 million individuals from six European populations. In this study, we identified 25 genes harboring protein-altering sequence variants with a strong deficit of homozygosity (10% or less of predicted homozygotes). Sequence variants in 12 of the genes cause Mendelian disease under a recessive mode of inheritance, two under a dominant mode, but variants in the remaining 11 have not been reported to cause disease. Sequence variants with a strong deficit of homozygosity are over-represented among genes essential for growth of human cell lines and genes orthologous to mouse genes known to affect viability. The function of these genes gives insight into the genetics of intrauterine lethality. We also identified 1077 genes with homozygous predicted loss-of-function genotypes not previously described, bringing the total set of genes completely knocked out in humans to 4785.


Subject(s)
Proteins , Humans , Animals , Mice , Homozygote , Genotype , Proteins/genetics , Genes, Recessive
5.
Nature ; 607(7920): 732-740, 2022 07.
Article in English | MEDLINE | ID: mdl-35859178

ABSTRACT

Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data1,2. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank3. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.


Subject(s)
Biological Specimen Banks , Databases, Genetic , Genetic Variation , Genome, Human , Genomics , Whole Genome Sequencing , Africa/ethnology , Asia/ethnology , Cohort Studies , Conserved Sequence , Exons/genetics , Genome, Human/genetics , Haplotypes/genetics , Humans , INDEL Mutation , Ireland/ethnology , Microsatellite Repeats , Polymorphism, Single Nucleotide/genetics , United Kingdom
6.
Arterioscler Thromb Vasc Biol ; 41(10): 2616-2628, 2021 10.
Article in English | MEDLINE | ID: mdl-34407635

ABSTRACT

Objective: Familial hypercholesterolemia (FH) is traditionally defined as a monogenic disease characterized by severely elevated LDL-C (low-density lipoprotein cholesterol) levels. In practice, FH is commonly a clinical diagnosis without confirmation of a causative mutation. In this study, we sought to characterize and compare monogenic and clinically defined FH in a large sample of Icelanders. Approach and Results: We whole-genome sequenced 49 962 Icelanders and imputed the identified variants into an overall sample of 166 281 chip-genotyped Icelanders. We identified 20 FH mutations in LDLR, APOB, and PCSK9 with combined prevalence of 1 in 836. Monogenic FH was associated with severely elevated LDL-C levels and increased risk of premature coronary disease, aortic valve stenosis, and high burden of coronary atherosclerosis. We used a modified version of the Dutch Lipid Clinic Network criteria to screen for the clinical FH phenotype among living adult participants (N=79 058). Clinical FH was found in 2.2% of participants, of whom only 5.2% had monogenic FH. Mutation-negative clinical FH has a strong polygenic basis. Both individuals with monogenic FH and individuals with mutation-negative clinical FH were markedly undertreated with cholesterol-lowering medications and only a minority attained an LDL-C target of <2.6 mmol/L (<100 mg/dL; 11.0% and 24.9%, respectively) or <1.8 mmol/L (<70 mg/dL; 0.0% and 5.2%, respectively), as recommended for primary prevention by European Society of Cardiology/European Atherosclerosis Society cholesterol guidelines. Conclusions: Clinically defined FH is a relatively common phenotype that is explained by monogenic FH in only a minority of cases. Both monogenic and clinical FH confer high cardiovascular risk but are markedly undertreated.


Subject(s)
Apolipoprotein B-100/genetics , Cardiovascular Diseases/genetics , Hyperlipoproteinemia Type II/genetics , Lipids/blood , Mutation , Proprotein Convertase 9/genetics , Receptors, LDL/genetics , Adult , Aged , Aged, 80 and over , Biomarkers/blood , Cardiovascular Diseases/diagnosis , Cardiovascular Diseases/ethnology , Cardiovascular Diseases/therapy , Female , Genetic Association Studies , Genetic Predisposition to Disease , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Hyperlipoproteinemia Type II/diagnosis , Hyperlipoproteinemia Type II/drug therapy , Hyperlipoproteinemia Type II/ethnology , Iceland/epidemiology , Male , Middle Aged , Phenotype , Prevalence , Prognosis , Risk Assessment , Risk Factors , Young Adult
7.
Nat Genet ; 53(6): 779-786, 2021 06.
Article in English | MEDLINE | ID: mdl-33972781

ABSTRACT

Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.


Subject(s)
Disease/genetics , Genomic Structural Variation , High-Throughput Nucleotide Sequencing , Quantitative Trait, Heritable , Alleles , Cholesterol, LDL/metabolism , Chromosomes, Human/genetics , Female , Gene Frequency/genetics , Humans , Iceland , Linear Models , Male , Proprotein Convertase 9/genetics , Recombination, Genetic/genetics , Sequence Deletion/genetics
8.
Commun Biol ; 4(1): 156, 2021 02 03.
Article in English | MEDLINE | ID: mdl-33536631

ABSTRACT

Iron is essential for many biological functions and iron deficiency and overload have major health implications. We performed a meta-analysis of three genome-wide association studies from Iceland, the UK and Denmark of blood levels of ferritin (N = 246,139), total iron binding capacity (N = 135,430), iron (N = 163,511) and transferrin saturation (N = 131,471). We found 62 independent sequence variants associating with iron homeostasis parameters at 56 loci, including 46 novel loci. Variants at DUOX2, F5, SLC11A2 and TMPRSS6 associate with iron deficiency anemia, while variants at TF, HFE, TFR2 and TMPRSS6 associate with iron overload. A HBS1L-MYB intergenic region variant associates both with increased risk of iron overload and reduced risk of iron deficiency anemia. The DUOX2 missense variant is present in 14% of the population, associates with all iron homeostasis biomarkers, and increases the risk of iron deficiency anemia by 29%. The associations implicate proteins contributing to the main physiological processes involved in iron homeostasis: iron sensing and storage, inflammation, absorption of iron from the gut, iron recycling, erythropoiesis and bleeding/menstruation.


Subject(s)
Anemia, Iron-Deficiency/genetics , Genetic Loci , Genetic Variation , Iron Overload/genetics , Iron/blood , Anemia, Iron-Deficiency/blood , Anemia, Iron-Deficiency/diagnosis , Biomarkers/blood , Denmark , Ferritins/blood , Genome-Wide Association Study , Genotype , Homeostasis , Humans , Iceland , Iron Overload/blood , Iron Overload/diagnosis , Phenotype , Risk Assessment , Risk Factors , Transferrin/metabolism , United Kingdom
9.
Cancer Res ; 81(8): 1954-1964, 2021 04 15.
Article in English | MEDLINE | ID: mdl-33602785

ABSTRACT

The success of genome-wide association studies (GWAS) in identifying common, low-penetrance variant-cancer associations for the past decade is undisputed. However, discovering additional high-penetrance cancer mutations in unknown cancer predisposing genes requires detection of variant-cancer association of ultra-rare coding variants. Consequently, large-scale next-generation sequence data with associated phenotype information are needed. Here, we used genotype data on 166,281 Icelanders, of which, 49,708 were whole-genome sequenced and 408,595 individuals from the UK Biobank, of which, 41,147 were whole-exome sequenced, to test for association between loss-of-function burden in autosomal genes and basal cell carcinoma (BCC), the most common cancer in Caucasians. A total of 25,205 BCC cases and 683,058 controls were tested. Rare germline loss-of-function variants in PTPN14 conferred substantial risks of BCC (OR, 8.0; P = 1.9 × 10-12), with a quarter of carriers getting BCC before age 70 and over half in their lifetime. Furthermore, common variants at the PTPN14 locus were associated with BCC, suggesting PTPN14 as a new, high-impact BCC predisposition gene. A follow-up investigation of 24 cancers and three benign tumor types showed that PTPN14 loss-of-function variants are associated with high risk of cervical cancer (OR, 12.7, P = 1.6 × 10-4) and low age at diagnosis. Our findings, using power-increasing methods with high-quality rare variant genotypes, highlight future prospects for new discoveries on carcinogenesis. SIGNIFICANCE: This study identifies the tumor-suppressor gene PTPN14 as a high-impact BCC predisposition gene and indicates that inactivation of PTPN14 by germline sequence variants may also lead to increased risk of cervical cancer.


Subject(s)
Carcinoma, Basal Cell/genetics , Loss of Function Mutation , Penetrance , Protein Tyrosine Phosphatases, Non-Receptor/genetics , Skin Neoplasms/genetics , Uterine Cervical Neoplasms/genetics , Age Factors , Carcinoma, Basal Cell/epidemiology , Case-Control Studies , Female , Gene Frequency , Genes, Tumor Suppressor , Genetic Predisposition to Disease , Genetic Testing , Genome-Wide Association Study , Genotype , Germ-Line Mutation , Humans , Iceland/epidemiology , Male , Odds Ratio , Skin Neoplasms/epidemiology , Tissue Banks/statistics & numerical data , United Kingdom/epidemiology , Uterine Cervical Neoplasms/epidemiology , Exome Sequencing/statistics & numerical data , Whole Genome Sequencing/statistics & numerical data
10.
Genome Biol ; 22(1): 28, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33419473

ABSTRACT

A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.


Subject(s)
Chimera , Genome, Human , Female , Genomics , Humans , Male , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
11.
Bioinformatics ; 36(7): 2269-2271, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31804671

ABSTRACT

SUMMARY: popSTR2 is an update and augmentation of our previous work 'popSTR: a population-based microsatellite genotyper'. To make genotyping sensitive to inter-sample differences, we supply a kernel to estimate sample-specific slippage rates. For clinical sequencing purposes, a panel of known pathogenic repeat expansions is provided along with a script that scans and flags for manual inspection markers indicative of a pathogenic expansion. Like its predecessor, popSTR2 allows for joint genotyping of samples at a population scale. We now provide a binning method that makes the microsatellite genotypes more amenable to analysis within standard association pipelines and can increase association power. AVAILABILITY AND IMPLEMENTATION: https://github.com/DecodeGenetics/popSTR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Microsatellite Repeats , Software , Genotype
12.
J Am Coll Cardiol ; 74(24): 2982-2994, 2019 12 17.
Article in English | MEDLINE | ID: mdl-31865966

ABSTRACT

BACKGROUND: Lipoprotein(a) [Lp(a)] is a causal risk factor for cardiovascular diseases that has no established therapy. The attribute of Lp(a) that affects cardiovascular risk is not established. Low levels of Lp(a) have been associated with type 2 diabetes (T2D). OBJECTIVES: This study investigated whether cardiovascular risk is conferred by Lp(a) molar concentration or apolipoprotein(a) [apo(a)] size, and whether the relationship between Lp(a) and T2D risk is causal. METHODS: This was a case-control study of 143,087 Icelanders with genetic information, including 17,715 with coronary artery disease (CAD) and 8,734 with T2D. This study used measured and genetically imputed Lp(a) molar concentration, kringle IV type 2 (KIV-2) repeats (which determine apo(a) size), and a splice variant in LPA associated with small apo(a) but low Lp(a) molar concentration to disentangle the relationship between Lp(a) and cardiovascular risk. Loss-of-function homozygotes and other subjects genetically predicted to have low Lp(a) levels were evaluated to assess the relationship between Lp(a) and T2D. RESULTS: Lp(a) molar concentration was associated dose-dependently with CAD risk, peripheral artery disease, aortic valve stenosis, heart failure, and lifespan. Lp(a) molar concentration fully explained the Lp(a) association with CAD, and there was no residual association with apo(a) size. Homozygous carriers of loss-of-function mutations had little or no Lp(a) and increased the risk of T2D. CONCLUSIONS: Molar concentration is the attribute of Lp(a) that affects risk of cardiovascular diseases. Low Lp(a) concentration (bottom 10%) increases T2D risk. Pharmacologic reduction of Lp(a) concentration in the 20% of individuals with the greatest concentration down to the population median is predicted to decrease CAD risk without increasing T2D risk.


Subject(s)
Coronary Artery Disease/blood , DNA Copy Number Variations , Diabetes Mellitus, Type 2/blood , Lipoprotein(a)/blood , Case-Control Studies , Coronary Artery Disease/genetics , Diabetes Mellitus, Type 2/genetics , Humans , Iceland , Kringles , Lipoprotein(a)/genetics , Mendelian Randomization Analysis , Molecular Weight , Protein Isoforms/blood , Risk Factors
13.
Nat Commun ; 10(1): 5402, 2019 11 27.
Article in English | MEDLINE | ID: mdl-31776332

ABSTRACT

Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.


Subject(s)
Genome, Human , Genomic Structural Variation , Genotyping Techniques/methods , Software , Computer Graphics , Databases, Genetic , Genetics, Population , Genotyping Techniques/statistics & numerical data , Humans , Iceland , Pedigree , Polymorphism, Single Nucleotide , Reproducibility of Results , Workflow
14.
Nat Commun ; 10(1): 1284, 2019 03 20.
Article in English | MEDLINE | ID: mdl-30894546

ABSTRACT

The corneal endothelium is vital for transparency and proper hydration of the cornea. Here, we conduct a genome-wide association study of corneal endothelial cell density (cells/mm2), coefficient of cell size variation (CV), percentage of hexagonal cells (HEX) and central corneal thickness (CCT) in 6,125 Icelanders and find associations at 10 loci, including 7 novel. We assess the effects of these variants on various ocular biomechanics such as corneal hysteresis (CH), as well as eye diseases such as glaucoma and corneal dystrophies. Most notably, an intergenic variant close to ANAPC1 (rs78658973[A], frequency = 28.3%) strongly associates with decreased cell density and accounts for 24% of the population variance in cell density (ß = -0.77 SD, P = 1.8 × 10-314) and associates with increased CH (ß = 0.19 SD, P = 2.6 × 10-19) without affecting risk of corneal diseases and glaucoma. Our findings indicate that despite correlations between cell density and eye diseases, low cell density does not increase the risk of disease.


Subject(s)
Apc1 Subunit, Anaphase-Promoting Complex-Cyclosome/genetics , Corneal Dystrophies, Hereditary/genetics , Endothelium, Corneal/metabolism , Glaucoma/genetics , Polymorphism, Genetic , Adult , Aged , Aged, 80 and over , Apc1 Subunit, Anaphase-Promoting Complex-Cyclosome/metabolism , Case-Control Studies , Cell Count , Cell Size , Corneal Dystrophies, Hereditary/diagnosis , Corneal Dystrophies, Hereditary/pathology , Endothelial Cells/metabolism , Endothelial Cells/pathology , Endothelium, Corneal/pathology , Female , Gene Expression , Gene Expression Profiling , Genetic Loci , Genome-Wide Association Study , Glaucoma/diagnosis , Glaucoma/pathology , Humans , Intraocular Pressure , Male , Middle Aged , Whole Genome Sequencing
15.
Hum Mol Genet ; 28(7): 1199-1211, 2019 04 01.
Article in English | MEDLINE | ID: mdl-30476138

ABSTRACT

Urine dipstick tests are widely used in routine medical care to diagnose kidney and urinary tract and metabolic diseases. Several environmental factors are known to affect the test results, whereas the effects of genetic diversity are largely unknown. We tested 32.5 million sequence variants for association with urinary biomarkers in a set of 150 274 Icelanders with urine dipstick measurements. We detected 20 association signals, of which 14 are novel, associating with at least one of five clinical entities defined by the urine dipstick: glucosuria, ketonuria, proteinuria, hematuria and urine pH. These include three independent glucosuria variants at SLC5A2, the gene encoding the sodium-dependent glucose transporter (SGLT2), a protein targeted pharmacologically to increase urinary glucose excretion in the treatment of diabetes. Two variants associating with proteinuria are in LRP2 and CUBN, encoding the co-transporters megalin and cubilin, respectively, that mediate proximal tubule protein uptake. One of the hematuria-associated variants is a rare, previously unreported 2.5 kb exonic deletion in COL4A3. Of the four signals associated with urine pH, we note that the pH-increasing alleles of two variants (POU2AF1, WDR72) associate significantly with increased risk of kidney stones. Our results reveal that genetic factors affect variability in urinary biomarkers, in both a disease dependent and independent context.


Subject(s)
Biomarkers/analysis , Biomarkers/urine , Genetic Variation/genetics , Adult , Aged , Alleles , Female , Hematuria/genetics , Hematuria/urine , Humans , Hydrogen-Ion Concentration , Iceland , Ketosis/genetics , Ketosis/urine , Kidney/metabolism , Male , Middle Aged , Proteinuria/genetics , Proteinuria/urine , Sodium-Glucose Transporter 2/genetics , Whole Genome Sequencing/methods
16.
Nat Genet ; 50(12): 1674-1680, 2018 12.
Article in English | MEDLINE | ID: mdl-30397338

ABSTRACT

De novo mutations (DNMs) cause a large proportion of severe rare diseases of childhood. DNMs that occur early may result in mosaicism of both somatic and germ cells. Such early mutations can cause recurrence of disease. We scanned 1,007 sibling pairs from 251 families and identified 878 DNMs shared by siblings (ssDNMs) at 448 genomic sites. We estimated DNM recurrence probability based on parental mosaicism, sharing of DNMs among siblings, parent-of-origin, mutation type and genomic position. We detected 57.2% of ssDNMs in the parental blood. The recurrence probability of a DNM decreases by 2.27% per year for paternal DNMs and 1.78% per year for maternal DNMs. Maternal ssDNMs are more likely to be T>C mutations than paternal ssDNMs, and less likely to be C>T mutations. Depending on the properties of the DNM, the recurrence probability ranges from 0.011% to 28.5%. We have launched an online calculator to allow estimation of DNM recurrence probability for research purposes.


Subject(s)
Family , Inheritance Patterns , Mutation , Parent-Child Relations , Adult , Child , Embryonic Germ Cells/metabolism , Family Characteristics , Female , Germ-Line Mutation , Humans , Inheritance Patterns/genetics , Male , Mosaicism , Pedigree
17.
Nat Genet ; 50(11): 1542-1552, 2018 11.
Article in English | MEDLINE | ID: mdl-30349119

ABSTRACT

Imprinting is the preferential expression of one parental allele over the other. It is controlled primarily through differential methylation of cytosine at CpG dinucleotides. Here we combine 285 methylomes and 11,617 transcriptomes from peripheral blood samples with parent-of-origin phased haplotypes, to produce a new map of imprinted methylation and gene expression patterns across the human genome. We demonstrate how imprinted methylation is a continuous rather than a binary characteristic. We describe at high resolution the parent-of-origin methylation pattern at the 15q11.2 Prader-Willi/Angelman syndrome locus, with nearly confluent stochastic paternal methylation punctuated by 'spikes' of maternal methylation. We find examples of polymorphic imprinted methylation unrelated (at VTRNA2-1 and PARD6G) or related (at CHRNE) to nearby SNP genotypes. We observe RNA isoform-specific imprinted expression patterns suggestive of a methylation-sensitive transcriptional elongation block. Finally, we gain new insights into parent-of-origin-specific effects on phenotypes at the DLK1/MEG3 and GNAS loci.


Subject(s)
DNA Methylation/genetics , Genome, Human , Genomic Imprinting/physiology , Inheritance Patterns/genetics , Parents , Transcriptome/genetics , Angelman Syndrome/genetics , Case-Control Studies , Chromosomes, Human, Pair 15 , Cohort Studies , CpG Islands/genetics , Female , Genetic Loci , Humans , Iceland , Male , Polymorphism, Single Nucleotide , Prader-Willi Syndrome/genetics , Quantitative Trait Loci/genetics
18.
Sci Data ; 4: 170115, 2017 09 21.
Article in English | MEDLINE | ID: mdl-28933420

ABSTRACT

Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.


Subject(s)
Genome, Human , Humans , INDEL Mutation , Iceland , Polymorphism, Single Nucleotide
19.
Nat Genet ; 49(11): 1654-1660, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28945251

ABSTRACT

A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.


Subject(s)
Algorithms , Genome, Human , Genotyping Techniques/instrumentation , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/statistics & numerical data , Alleles , Base Sequence , Computer Graphics , HLA Antigens/genetics , Haplotypes , High-Throughput Nucleotide Sequencing , Humans , Sequence Alignment , Sequence Analysis, DNA/methods , Software
20.
Nature ; 549(7673): 519-522, 2017 09 28.
Article in English | MEDLINE | ID: mdl-28959963

ABSTRACT

The characterization of mutational processes that generate sequence diversity in the human genome is of paramount importance both to medical genetics and to evolutionary studies. To understand how the age and sex of transmitting parents affect de novo mutations, here we sequence 1,548 Icelanders, their parents, and, for a subset of 225, at least one child, to 35× genome-wide coverage. We find 108,778 de novo mutations, both single nucleotide polymorphisms and indels, and determine the parent of origin of 42,961. The number of de novo mutations from mothers increases by 0.37 per year of age (95% CI 0.32-0.43), a quarter of the 1.51 per year from fathers (95% CI 1.45-1.57). The number of clustered mutations increases faster with the mother's age than with the father's, and the genomic span of maternal de novo mutation clusters is greater than that of paternal ones. The types of de novo mutation from mothers change substantially with age, with a 0.26% (95% CI 0.19-0.33%) decrease in cytosine-phosphate-guanine to thymine-phosphate-guanine (CpG>TpG) de novo mutations and a 0.33% (95% CI 0.28-0.38%) increase in C>G de novo mutations per year, respectively. Remarkably, these age-related changes are not distributed uniformly across the genome. A striking example is a 20 megabase region on chromosome 8p, with a maternal C>G mutation rate that is up to 50-fold greater than the rest of the genome. The age-related accumulation of maternal non-crossover gene conversions also mostly occurs within these regions. Increased sequence diversity and linkage disequilibrium of C>G variants within regions affected by excess maternal mutations indicate that the underlying mutational process has persisted in humans for thousands of years. Moreover, the regional excess of C>G variation in humans is largely shared by chimpanzees, less by gorillas, and is almost absent from orangutans. This demonstrates that sequence diversity in humans results from evolving interactions between age, sex, mutation type, and genomic location.


Subject(s)
Aging/genetics , Germ-Line Mutation/genetics , Maternal Age , Mutagenesis , Parents , Paternal Age , Adolescent , Adult , Aged , Animals , Child , Chromosomes, Human, Pair 8/genetics , Evolution, Molecular , Female , GC Rich Sequence , Genome, Human/genetics , Gorilla gorilla/genetics , Humans , INDEL Mutation , Iceland , Linkage Disequilibrium/genetics , Male , Middle Aged , Mutation Rate , Pan troglodytes/genetics , Polymorphism, Single Nucleotide , Pongo/genetics , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...