ABSTRACT
Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation in the human genome and modify risk of common human traits. To address these questions, we first analyzed two independent data sets comprising PCR-free whole-genome sequencing (WGS) and genome-wide DNA methylation levels derived from whole-blood samples in 245 (discovery cohort) and 484 individuals (replication cohort). Using genotypes for 131,635 polymorphic STRs derived from WGS using HipSTR, we identified 11,870 STRs that associated with DNA methylation levels (mSTRs) of 11,774 CpGs (Bonferroni P < 0.001) in our discovery cohort, with 90% successfully replicating in our second cohort. Subsequently, through fine-mapping using CAVIAR we defined 585 of these mSTRs as the likely causal variants underlying the observed associations (fm-mSTRs) and linked a fraction of these to previously reported genome-wide association study signals, providing insights into the mechanisms underlying complex human traits. Furthermore, by integrating gene expression data, we observed that 12.5% of the tested fm-mSTRs also modulate expression levels of nearby genes, reinforcing their regulatory potential. Overall, our findings expand the catalog of functional sequence variants that affect genome regulation, highlighting the importance of incorporating STRs in future genetic association analysis and epigenetics data for the interpretation of trait-associated variants.
Subject(s)
DNA Methylation , Genome-Wide Association Study , Humans , Microsatellite Repeats , Genome, Human , GenotypeABSTRACT
There is growing recognition that epivariations, most often recognized as promoter hypermethylation events that lead to gene silencing, are associated with a number of human diseases. However, little information exists on the prevalence and distribution of rare epigenetic variation in the human population. In order to address this, we performed a survey of methylation profiles from 23,116 individuals using the Illumina 450k array. Using a robust outlier approach, we identified 4,452 unique autosomal epivariations, including potentially inactivating promoter methylation events at 384 genes linked to human disease. For example, we observed promoter hypermethylation of BRCA1 and LDLR at population frequencies of â¼1 in 3,000 and â¼1 in 6,000, respectively, suggesting that epivariations may underlie a fraction of human disease which would be missed by purely sequence-based approaches. Using expression data, we confirmed that many epivariations are associated with outlier gene expression. Analysis of variation data and monozygous twin pairs suggests that approximately two-thirds of epivariations segregate in the population secondary to underlying sequence mutations, while one-third are likely sporadic events that occur post-zygotically. We identified 25 loci where rare hypermethylation coincided with the presence of an unstable CGG tandem repeat, validated the presence of CGG expansions at several loci, and identified the putative molecular defect underlying most of the known folate-sensitive fragile sites in the genome. Our study provides a catalog of rare epigenetic changes in the human genome, gives insight into the underlying origins and consequences of epivariations, and identifies many hypermethylated CGG repeat expansions.
Subject(s)
BRCA1 Protein/genetics , Epigenesis, Genetic , Genetic Diseases, Inborn/genetics , Genome, Human , Receptors, LDL/genetics , Trinucleotide Repeat Expansion , BRCA1 Protein/metabolism , DNA Methylation , Female , Folic Acid/metabolism , Gene Silencing , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/pathology , Genetic Loci , Genetic Variation , High-Throughput Nucleotide Sequencing , Humans , Male , Promoter Regions, Genetic , Receptors, LDL/metabolism , Twins, MonozygoticABSTRACT
Although DNA methylation is the best characterized epigenetic mark, the mechanism by which it is targeted to specific regions in the genome remains unclear. Recent studies have revealed that local DNA methylation profiles might be dictated by cis-regulatory DNA sequences that mainly operate via DNA-binding factors. Consistent with this finding, we have recently shown that disruption of CTCF-binding sites by rare single nucleotide variants (SNVs) can underlie cis-linked DNA methylation changes in patients with congenital anomalies. These data raise the hypothesis that rare genetic variation at transcription factor binding sites (TFBSs) might contribute to local DNA methylation patterning. In this work, by combining blood genome-wide DNA methylation profiles, whole genome sequencing-derived SNVs from 247 unrelated individuals along with 133 predicted TFBS motifs derived from ENCODE ChIP-Seq data, we observed an association between the disruption of binding sites for multiple TFs by rare SNVs and extreme DNA methylation values at both local and, to a lesser extent, distant CpGs. While the majority of these changes affected only single CpGs, 24% were associated with multiple outlier CpGs within ±1kb of the disrupted TFBS. Interestingly, disruption of functionally constrained sites within TF motifs lead to larger DNA methylation changes at nearby CpG sites. Altogether, these findings suggest that rare SNVs at TFBS negatively influence TF-DNA binding, which can lead to an altered local DNA methylation profile. Furthermore, subsequent integration of DNA methylation and RNA-Seq profiles from cardiac tissues enabled us to observe an association between rare SNV-directed DNA methylation and outlier expression of nearby genes. In conclusion, our findings not only provide insights into the effect of rare genetic variation at TFBS on shaping local DNA methylation and its consequences on genome regulation, but also provide a rationale to incorporate DNA methylation data to interpret the functional role of rare variants.
Subject(s)
CpG Islands/genetics , DNA Methylation , Epigenesis, Genetic , Genome, Human/genetics , Transcription Factors/metabolism , Adolescent , Adult , Binding Sites/genetics , Child , Child, Preschool , Chromatin Immunoprecipitation Sequencing , Cohort Studies , Female , Heart Defects, Congenital/blood , Heart Defects, Congenital/genetics , Humans , Infant , Infant, Newborn , Male , Middle Aged , Polymorphism, Single Nucleotide , Whole Genome Sequencing , Young AdultABSTRACT
PURPOSE: The purpose of this study is to use a genotype-first approach to explore highly penetrant, autosomal dominant cardiovascular diseases with external features, the RASopathies and Marfan syndrome (MFS), using biobank data. METHODS: This study uses exome sequencing and corresponding phenotypic data from Mount Sinai's BioMe (n = 32,344) and the United Kingdom Biobank (UKBB; n = 49,960). Variant curation identified pathogenic/likely pathogenic (P/LP) variants in RASopathy genes and FBN1. RESULTS: Twenty-one subjects harbored P/LP RASopathy variants; three (14%) were diagnosed, and another 46% had ≥1 classic Noonan syndrome (NS) feature. Major NS features (short stature [9.5% p = 7e-5] and heart anomalies [19%, p < 1e-5]) were less frequent than expected. Prevalence of hypothyroidism/autoimmune disorders was enriched compared with biobank populations (p = 0.007). For subjects with FBN1 P/LP variants, 14/41 (34%) had a MFS diagnosis or highly suggestive features. Five of 15 participants (33%) with echocardiographic data had aortic dilation, fewer than expected (p = 8e-6). Ectopia lentis affected only 15% (p < 1e-5). CONCLUSIONS: Substantial fractions of individuals harboring P/LP variants with partial or full phenotypic matches to a RASopathy or MFS remain undiagnosed, some not meeting diagnostic criteria. Routine population genotyping would enable multidisciplinary care and avoid life-threatening events.
Subject(s)
Marfan Syndrome , Fibrillin-1/genetics , Genotype , Humans , Marfan Syndrome/diagnosis , Marfan Syndrome/genetics , Mutation , Phenotype , United Kingdom/epidemiologyABSTRACT
The mechanisms underlying de novo insertion/deletion (indel) genesis, such as polymerase slippage, have been hypothesized but not well characterized in the human genome. We implemented two methodological improvements, which were leveraged to dissect indel mutagenesis. We assigned de novo variants to parent-of-origin (i.e., phasing) with low-coverage long-read whole-genome sequencing, achieving better phasing compared to short-read sequencing (medians of 84% and 23%, respectively). We then wrote an application programming interface to classify indels into three subtypes according to sequence context. Across three cohorts with different phasing methods (Ntrios = 540, all cohorts), we observed that one de novo indel subtype, change in copy count (CCC), was significantly correlated with father's (p = 7.1 × 10-4 ) but not mother's (p = .45) age at conception. We replicated this effect in three cohorts without de novo phasing (ppaternal = 1.9 × 10-9 , pmaternal = .61; Ntrios = 3,391, all cohorts). Although this is consistent with polymerase slippage during spermatogenesis, the percentage of variance explained by paternal age was low, and we did not observe an association with replication timing. These results suggest that spermatogenesis-specific events have a minor role in CCC indel mutagenesis, one not observed for other indel subtypes nor for maternal age in general. These results have implications for indel modeling in evolution and disease.
Subject(s)
Computational Biology/methods , Genome, Human , Genomics/methods , INDEL Mutation , Software , High-Throughput Nucleotide Sequencing , Humans , Polymorphism, Single NucleotideABSTRACT
Multiple tools have been developed to identify copy number variants (CNVs) from whole exome (WES) and whole genome sequencing (WGS) data. Current tools such as XHMM for WES and CNVnator for WGS identify CNVs based on changes in read depth. For WGS, other methods to identify CNVs include utilizing discordant read pairs and split reads and genome-wide local assembly with tools such as Lumpy and SvABA, respectively. Here, we introduce a new method to identify deletion CNVs from WES and WGS trio data based on the clustering of Mendelian errors (MEs). Using our Mendelian Error Method (MEM), we identified 127 deletions (inherited and de novo) in 2,601 WES trios from the Pediatric Cardiac Genomics Consortium, with a validation rate of 88% by digital droplet PCR. MEM identified additional de novo deletions compared with XHMM, and a significant enrichment of 15q11.2 deletions compared with controls. In addition, MEM identified eight cases of uniparental disomy, sample switches, and DNA contamination. We applied MEM to WGS data from the Genome In A Bottle Ashkenazi trio and identified deletions with 97% specificity. MEM provides a robust, computationally inexpensive method for identifying deletions, and an orthogonal approach for verifying deletions called by other tools.
Subject(s)
DNA Copy Number Variations/genetics , DNA Mutational Analysis/methods , Genome, Human/genetics , Sequence Deletion/genetics , Chromosome Mapping , Exome/genetics , Female , Heart Defects, Congenital/genetics , Humans , Male , Exome Sequencing , Whole Genome SequencingABSTRACT
BACKGROUND: Germline HRAS gain-of-function pathogenic variants cause Costello syndrome (CS). During early childhood, 50% of patients develop multifocal atrial tachycardia, a treatment-resistant tachyarrhythmia of unknown pathogenesis. This study investigated how overactive HRAS activity triggers arrhythmogenesis in atrial-like cardiomyocytes (ACMs) derived from human-induced pluripotent stem cells bearing CS-associated HRAS variants. METHODS: HRAS Gly12 mutations were introduced into a human-induced pluripotent stem cells-ACM reporter line. Human-induced pluripotent stem cells were generated from patients with CS exhibiting tachyarrhythmia. Calcium transients and action potentials were assessed in induced pluripotent stem cell-derived ACMs. Automated patch clamping assessed funny currents. HCN inhibitors targeted pacemaker-like activity in mutant ACMs. Transcriptomic data were analyzed via differential gene expression and gene ontology. Immunoblotting evaluated protein expression associated with calcium handling and pacemaker-nodal expression. RESULTS: ACMs harboring HRAS variants displayed higher beating rates compared with healthy controls. The hyperpolarization activated cyclic nucleotide gated potassium channel inhibitor ivabradine and the Nav1.5 blocker flecainide significantly decreased beating rates in mutant ACMs, whereas voltage-gated calcium channel 1.2 blocker verapamil attenuated their irregularity. Electrophysiological assessment revealed an increased number of pacemaker-like cells with elevated funny current densities among mutant ACMs. Mutant ACMs demonstrated elevated gene expression (ie, ISL1, TBX3, TBX18) related to intracellular calcium homeostasis, heart rate, RAS signaling, and induction of pacemaker-nodal-like transcriptional programming. Immunoblotting confirmed increased protein levels for genes of interest and suppressed MAPK (mitogen-activated protein kinase) activity in mutant ACMs. CONCLUSIONS: CS-associated gain-of-function HRASG12 mutations in induced pluripotent stem cells-derived ACMs trigger transcriptional changes associated with enhanced automaticity and arrhythmic activity consistent with multifocal atrial tachycardia. This is the first human-induced pluripotent stem cell model establishing the mechanistic basis for multifocal atrial tachycardia in CS.
Subject(s)
Induced Pluripotent Stem Cells , Myocytes, Cardiac , Humans , Child, Preschool , Myocytes, Cardiac/metabolism , Calcium/metabolism , Heart Atria/metabolism , Tachycardia , Calcium Channels/metabolism , Induced Pluripotent Stem Cells/metabolism , Action Potentials/physiology , Cell Differentiation , Proto-Oncogene Proteins p21(ras)/genetics , Proto-Oncogene Proteins p21(ras)/metabolismABSTRACT
BACKGROUND: Congenital heart disease (CHD) is the most common major congenital anomaly and causes significant morbidity and mortality. Epidemiologic evidence supports a role of genetics in the development of CHD. Genetic diagnoses can inform prognosis and clinical management. However, genetic testing is not standardized among individuals with CHD. We sought to develop a list of validated CHD genes using established methods and to evaluate the process of returning genetic results to research participants in a large genomic study. METHODS: Two-hundred ninety-five candidate CHD genes were evaluated using a ClinGen framework. Sequence and copy number variants involving genes in the CHD gene list were analyzed in Pediatric Cardiac Genomics Consortium participants. Pathogenic/likely pathogenic results were confirmed on a new sample in a clinical laboratory improvement amendments-certified laboratory and disclosed to eligible participants. Adult probands and parents of probands who received results were asked to complete a post-disclosure survey. RESULTS: A total of 99 genes had a strong or definitive clinical validity classification. Diagnostic yields for copy number variants and exome sequencing were 1.8% and 3.8%, respectively. Thirty-one probands completed clinical laboratory improvement amendments-confirmation and received results. Participants who completed postdisclosure surveys reported high personal utility and no decision regret after receiving genetic results. CONCLUSIONS: The application of ClinGen criteria to CHD candidate genes yielded a list that can be used to interpret clinical genetic testing for CHD. Applying this gene list to one of the largest research cohorts of CHD participants provides a lower bound for the yield of genetic testing in CHD.
Subject(s)
Heart Defects, Congenital , Adult , Child , Humans , Heart Defects, Congenital/diagnosis , Heart Defects, Congenital/genetics , Genetic Testing , Heart , Genomics , DNA Copy Number VariationsABSTRACT
BACKGROUND: Acute myocarditis (AM) is a well-known cause of sudden death and heart failure, often caused by prevalent viruses. We previously showed that some pediatric AM correlates with putatively damaging variants in genes related to cardiomyocyte structure and function. We sought to evaluate whether deleterious cardiomyopathic variants were enriched among fatal pediatric AM cases in New York City compared with ancestry-matched controls. METHODS: Twenty-four children (aged 3 weeks to 20 years) with death due to AM were identified through autopsy records; histologies were reviewed to confirm that all cases met Dallas criteria for AM and targeted panel sequencing of 57 cardiomyopathic genes was performed. Controls without cardiovascular disease were identified from a pediatric database and matched by genetic ancestry to cases using principal components from exome sequencing. Rates of putative deleterious variations (DV) were compared between cases and controls. Where available, AM tissues underwent viral analysis by polymerase chain reaction. RESULTS: DV were identified in 4 of 24 AM cases (16.7%), compared with 2 of 96 age and ancestry-matched controls (2.1%, P=0.014). Viral causes were proven for 6 of 8 AM cases (75%), including the one DV+ case where tissue was available for testing. DV+ cases were more likely to be female, have no evidence of chronic inflammation, and associate with sudden cardiac death than DV- cases. CONCLUSIONS: Deleterious variants in genes related to cardiomyocyte integrity are more common in children with fatal AM than controls, likely conferring susceptibility. Additionally, genetically mediated AM may progress more rapidly and be more severe.
Subject(s)
Databases, Nucleic Acid , Genetic Variation , Myocarditis/genetics , Adult , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Myocarditis/mortality , New York City/epidemiologyABSTRACT
Impairments in certain cardiac genes confer risk for myocarditis in children. To determine the extent of this association, we performed genomic sequencing in predominantly adult patients with acute myocarditis and matched control subjects. Putatively deleterious variants in a broad set of cardiac genes were found in 19 of 117 acute myocarditis cases vs 34 of 468 control subjects (P = 0.003). Thirteen genes classically associated with cardiomyopathy or neuromuscular disorders with cardiac involvement were implicated, including >1 associated damaging variant in DYSF, DSP, and TTN. Phenotypes of subjects who have acute myocarditis with or without deleterious variants were similar, indicating that genetic testing is necessary to differentiate them.
ABSTRACT
A genetic etiology is identified for one-third of patients with congenital heart disease (CHD), with 8% of cases attributable to coding de novo variants (DNVs). To assess the contribution of noncoding DNVs to CHD, we compared genome sequences from 749 CHD probands and their parents with those from 1,611 unaffected trios. Neural network prediction of noncoding DNV transcriptional impact identified a burden of DNVs in individuals with CHD (n = 2,238 DNVs) compared to controls (n = 4,177; P = 8.7 × 10-4). Independent analyses of enhancers showed an excess of DNVs in associated genes (27 genes versus 3.7 expected, P = 1 × 10-5). We observed significant overlap between these transcription-based approaches (odds ratio (OR) = 2.5, 95% confidence interval (CI) 1.1-5.0, P = 5.4 × 10-3). CHD DNVs altered transcription levels in 5 of 31 enhancers assayed. Finally, we observed a DNV burden in RNA-binding-protein regulatory sites (OR = 1.13, 95% CI 1.1-1.2, P = 8.8 × 10-5). Our findings demonstrate an enrichment of potentially disruptive regulatory noncoding DNVs in a fraction of CHD at least as high as that observed for damaging coding DNVs.
Subject(s)
Genetic Variation/genetics , Heart Defects, Congenital/genetics , RNA, Untranslated/genetics , Adolescent , Adult , Animals , Female , Genetic Predisposition to Disease/genetics , Genomics , Heart/physiology , Humans , Male , Mice , Middle Aged , Open Reading Frames/genetics , RNA-Binding Proteins/genetics , Transcription, Genetic/genetics , Young AdultABSTRACT
Certain human traits such as neurodevelopmental disorders (NDs) and congenital anomalies (CAs) are believed to be primarily genetic in origin. However, even after whole-genome sequencing (WGS), a substantial fraction of such disorders remain unexplained. We hypothesize that some cases of ND-CA are caused by aberrant DNA methylation leading to dysregulated genome function. Comparing DNA methylation profiles from 489 individuals with ND-CAs against 1534 controls, we identify epivariations as a frequent occurrence in the human genome. De novo epivariations are significantly enriched in cases, while RNAseq analysis shows that epivariations often have an impact on gene expression comparable to loss-of-function mutations. Additionally, we detect and replicate an enrichment of rare sequence mutations overlapping CTCF binding sites close to epivariations, providing a rationale for interpreting non-coding variation. We propose that epivariations contribute to the pathogenesis of some patients with unexplained ND-CAs, and as such likely have diagnostic relevance.
Subject(s)
Congenital Abnormalities/genetics , Epigenesis, Genetic , Genome, Human/genetics , Neurodevelopmental Disorders/genetics , Adolescent , Adult , Case-Control Studies , Child , Child, Preschool , Cohort Studies , DNA Methylation/genetics , Datasets as Topic , Epigenomics/methods , Humans , Infant , Infant, Newborn , Loss of Function Mutation/genetics , Male , Middle Aged , Sequence Analysis, DNA , Sequence Analysis, RNA , Young AdultABSTRACT
Congenital heart disease (CHD), a prevalent birth defect occurring in 1% of newborns, likely results from aberrant expression of cardiac developmental genes. Mutations in a variety of cardiac transcription factors, developmental signalling molecules and molecules that modify chromatin cause at least 20% of disease, but most CHD remains unexplained. We employ RNAseq analyses to assess allele-specific expression (ASE) and biallelic loss-of-expression (LOE) in 172 tissue samples from 144 surgically repaired CHD subjects. Here we show that only 5% of known imprinted genes with paternal allele silencing are monoallelic versus 56% with paternal allele expression-this cardiac-specific phenomenon seems unrelated to CHD. Further, compared with control subjects, CHD subjects have a significant burden of both LOE genes and ASE events associated with altered gene expression. These studies identify FGFBP2, LBH, RBFOX2, SGSM1 and ZBTB16 as candidate CHD genes because of significantly altered transcriptional expression.
Subject(s)
Heart Defects, Congenital/metabolism , RNA/metabolism , Adolescent , Adult , Aged , Alleles , Aorta/metabolism , Case-Control Studies , Child , Child, Preschool , Fetus , Gene Expression , Genetic Association Studies , Genomic Imprinting , Heart Defects, Congenital/genetics , Humans , Infant , Infant, Newborn , Middle Aged , Myocardium/metabolism , Pulmonary Artery/metabolism , Young AdultABSTRACT
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.