Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
Add more filters

Publication year range
1.
Mol Cell ; 84(10): 1842-1854.e7, 2024 May 16.
Article in English | MEDLINE | ID: mdl-38759624

ABSTRACT

Genomic context critically modulates regulatory function but is difficult to manipulate systematically. The murine insulin-like growth factor 2 (Igf2)/H19 locus is a paradigmatic model of enhancer selectivity, whereby CTCF occupancy at an imprinting control region directs downstream enhancers to activate either H19 or Igf2. We used synthetic regulatory genomics to repeatedly replace the native locus with 157-kb payloads, and we systematically dissected its architecture. Enhancer deletion and ectopic delivery revealed previously uncharacterized long-range regulatory dependencies at the native locus. Exchanging the H19 enhancer cluster with the Sox2 locus control region (LCR) showed that the H19 enhancers relied on their native surroundings while the Sox2 LCR functioned autonomously. Analysis of regulatory DNA actuation across cell types revealed that these enhancer clusters typify broader classes of context sensitivity genome wide. These results show that unexpected dependencies influence even well-studied loci, and our approach permits large-scale manipulation of complete loci to investigate the relationship between regulatory architecture and function.


Subject(s)
CCCTC-Binding Factor , Enhancer Elements, Genetic , Insulin-Like Growth Factor II , RNA, Long Noncoding , SOXB1 Transcription Factors , Animals , Mice , CCCTC-Binding Factor/metabolism , CCCTC-Binding Factor/genetics , Insulin-Like Growth Factor II/genetics , Insulin-Like Growth Factor II/metabolism , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , SOXB1 Transcription Factors/genetics , SOXB1 Transcription Factors/metabolism , Locus Control Region/genetics , Genomic Imprinting , Genomics/methods
2.
Cell ; 167(5): 1398-1414.e24, 2016 11 17.
Article in English | MEDLINE | ID: mdl-27863251

ABSTRACT

Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.


Subject(s)
Epigenomics , Immune System Diseases/genetics , Monocytes/metabolism , Neutrophils/metabolism , T-Lymphocytes/metabolism , Transcription, Genetic , Adult , Aged , Alternative Splicing , Female , Genetic Predisposition to Disease , Hematopoietic Stem Cells/metabolism , Histone Code , Humans , Male , Middle Aged , Quantitative Trait Loci , Young Adult
3.
Mol Cell ; 83(7): 1140-1152.e7, 2023 04 06.
Article in English | MEDLINE | ID: mdl-36931273

ABSTRACT

Sox2 expression in mouse embryonic stem cells (mESCs) depends on a distal cluster of DNase I hypersensitive sites (DHSs), but their individual contributions and degree of interdependence remain a mystery. We analyzed the endogenous Sox2 locus using Big-IN to scarlessly integrate large DNA payloads incorporating deletions, rearrangements, and inversions affecting single or multiple DHSs, as well as surgical alterations to transcription factor (TF) recognition sequences. Multiple mESC clones were derived for each payload, sequence-verified, and analyzed for Sox2 expression. We found that two DHSs comprising a handful of key TF recognition sequences were each sufficient for long-range activation of Sox2 expression. By contrast, three nearby DHSs were entirely context dependent, showing no activity alone but dramatically augmenting the activity of the autonomous DHSs. Our results highlight the role of context in modulating genomic regulatory element function, and our synthetic regulatory genomics approach provides a roadmap for the dissection of other genomic loci.


Subject(s)
Gene Expression Regulation , Regulatory Sequences, Nucleic Acid , Animals , Mice , Enhancer Elements, Genetic , Genomics , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , SOXB1 Transcription Factors/metabolism
4.
Nature ; 628(8007): 373-380, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38448583

ABSTRACT

Pervasive transcriptional activity is observed across diverse species. The genomes of extant organisms have undergone billions of years of evolution, making it unclear whether these genomic activities represent effects of selection or 'noise'1-4. Characterizing default genome states could help understand whether pervasive transcriptional activity has biological meaning. Here we addressed this question by introducing a synthetic 101-kb locus into the genomes of Saccharomyces cerevisiae and Mus musculus and characterizing genomic activity. The locus was designed by reversing but not complementing human HPRT1, including its flanking regions, thus retaining basic features of the natural sequence but ablating evolved coding or regulatory information. We observed widespread activity of both reversed and native HPRT1 loci in yeast, despite the lack of evolved yeast promoters. By contrast, the reversed locus displayed no activity at all in mouse embryonic stem cells, and instead exhibited repressive chromatin signatures. The repressive signature was alleviated in a locus variant lacking CpG dinucleotides; nevertheless, this variant was also transcriptionally inactive. These results show that synthetic genomic sequences that lack coding information are active in yeast, but inactive in mouse embryonic stem cells, consistent with a major difference in 'default genomic states' between these two divergent eukaryotic cell types, with implications for understanding pervasive transcription, horizontal transfer of genetic information and the birth of new genes.


Subject(s)
Genes, Synthetic , Genome , Saccharomyces cerevisiae , Transcription, Genetic , Animals , Humans , Mice , Chromatin/genetics , CpG Islands , Genes, Synthetic/genetics , Genome/genetics , Mouse Embryonic Stem Cells/metabolism , Promoter Regions, Genetic/genetics , Saccharomyces cerevisiae/genetics , Hypoxanthine Phosphoribosyltransferase/genetics , Evolution, Molecular
5.
Nature ; 626(8001): 1042-1048, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38418917

ABSTRACT

The loss of the tail is among the most notable anatomical changes to have occurred along the evolutionary lineage leading to humans and to the 'anthropomorphous apes'1-3, with a proposed role in contributing to human bipedalism4-6. Yet, the genetic mechanism that facilitated tail-loss evolution in hominoids remains unknown. Here we present evidence that an individual insertion of an Alu element in the genome of the hominoid ancestor may have contributed to tail-loss evolution. We demonstrate that this Alu element-inserted into an intron of the TBXT gene7-9-pairs with a neighbouring ancestral Alu element encoded in the reverse genomic orientation and leads to a hominoid-specific alternative splicing event. To study the effect of this splicing event, we generated multiple mouse models that express both full-length and exon-skipped isoforms of Tbxt, mimicking the expression pattern of its hominoid orthologue TBXT. Mice expressing both Tbxt isoforms exhibit a complete absence of the tail or a shortened tail depending on the relative abundance of Tbxt isoforms expressed at the embryonic tail bud. These results support the notion that the exon-skipped transcript is sufficient to induce a tail-loss phenotype. Moreover, mice expressing the exon-skipped Tbxt isoform develop neural tube defects, a condition that affects approximately 1 in 1,000 neonates in humans10. Thus, tail-loss evolution may have been associated with an adaptive cost of the potential for neural tube defects, which continue to affect human health today.


Subject(s)
Alternative Splicing , Evolution, Molecular , Hominidae , T-Box Domain Proteins , Tail , Animals , Humans , Mice , Alternative Splicing/genetics , Alu Elements/genetics , Disease Models, Animal , Genome/genetics , Hominidae/anatomy & histology , Hominidae/genetics , Introns/genetics , Neural Tube Defects/genetics , Neural Tube Defects/metabolism , Phenotype , Protein Isoforms/deficiency , Protein Isoforms/genetics , Protein Isoforms/metabolism , T-Box Domain Proteins/deficiency , T-Box Domain Proteins/genetics , T-Box Domain Proteins/metabolism , Tail/anatomy & histology , Tail/embryology , Exons/genetics
6.
Nature ; 623(7986): 423-431, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37914927

ABSTRACT

Genetically engineered mouse models (GEMMs) help us to understand human pathologies and develop new therapies, yet faithfully recapitulating human diseases in mice is challenging. Advances in genomics have highlighted the importance of non-coding regulatory genome sequences, which control spatiotemporal gene expression patterns and splicing in many human diseases1,2. Including regulatory extensive genomic regions, which requires large-scale genome engineering, should enhance the quality of disease modelling. Existing methods set limits on the size and efficiency of DNA delivery, hampering the routine creation of highly informative models that we call genomically rewritten and tailored GEMMs (GREAT-GEMMs). Here we describe 'mammalian switching antibiotic resistance markers progressively for integration' (mSwAP-In), a method for efficient genome rewriting in mouse embryonic stem cells. We demonstrate the use of mSwAP-In for iterative genome rewriting of up to 115 kb of a tailored Trp53 locus, as well as for humanization of mice using 116 kb and 180 kb human ACE2 loci. The ACE2 model recapitulated human ACE2 expression patterns and splicing, and notably, presented milder symptoms when challenged with SARS-CoV-2 compared with the existing K18-hACE2 model, thus representing a more human-like model of infection. Finally, we demonstrated serial genome writing by humanizing mouse Tmprss2 biallelically in the ACE2 GREAT-GEMM, highlighting the versatility of mSwAP-In in genome writing.


Subject(s)
Angiotensin-Converting Enzyme 2 , COVID-19 , Disease Models, Animal , Genetic Engineering , Genome , Tumor Suppressor Protein p53 , Animals , Humans , Mice , Alleles , Angiotensin-Converting Enzyme 2/genetics , Angiotensin-Converting Enzyme 2/metabolism , COVID-19/genetics , COVID-19/virology , DNA/genetics , Drug Resistance, Microbial/genetics , Genetic Engineering/methods , Genome/genetics , Mouse Embryonic Stem Cells/metabolism , SARS-CoV-2/metabolism , Serine Endopeptidases/genetics , Tumor Suppressor Protein p53/genetics
7.
Genome Res ; 32(3): 425-436, 2022 03.
Article in English | MEDLINE | ID: mdl-35082140

ABSTRACT

The specificity of interactions between genomic regulatory elements and potential target genes is influenced by the binding of insulator proteins such as CTCF, which can act as potent enhancer blockers when interposed between an enhancer and a promoter in a reporter assay. But not all CTCF sites genome-wide function as insulator elements, depending on cellular and genomic context. To dissect the influence of genomic context on enhancer blocker activity, we integrated reporter constructs with promoter-only, promoter and enhancer, and enhancer blocker configurations at hundreds of thousands of genomic sites using the Sleeping Beauty transposase. Deconvolution of reporter activity by genomic position reveals distinct expression patterns subject to genomic context, including a compartment of enhancer blocker reporter integrations with robust expression. The high density of integration sites permits quantitative delineation of characteristic genomic context sensitivity profiles and their decomposition into sensitivity to both local and distant DNase I hypersensitive sites. Furthermore, using a single-cell expression approach to test the effect of integrated reporters for differential expression of nearby endogenous genes reveals that CTCF insulator elements do not completely abrogate reporter effects on endogenous gene expression. Collectively, our results lend new insight into genomic regulatory compartmentalization and its influence on the determinants of promoter-enhancer specificity.


Subject(s)
Enhancer Elements, Genetic , Insulator Elements , CCCTC-Binding Factor/genetics , CCCTC-Binding Factor/metabolism , Genomics , Promoter Regions, Genetic
8.
Proc Natl Acad Sci U S A ; 118(52)2021 12 28.
Article in English | MEDLINE | ID: mdl-34930847

ABSTRACT

Sudden unexplained death in childhood (SUDC) is an understudied problem. Whole-exome sequence data from 124 "trios" (decedent child, living parents) was used to test for excessive de novo mutations (DNMs) in genes involved in cardiac arrhythmias, epilepsy, and other disorders. Among decedents, nonsynonymous DNMs were enriched in genes associated with cardiac and seizure disorders relative to controls (odds ratio = 9.76, P = 2.15 × 10-4). We also found evidence for overtransmission of loss-of-function (LoF) or previously reported pathogenic variants in these same genes from heterozygous carrier parents (11 of 14 transmitted, P = 0.03). We identified a total of 11 SUDC proband genotypes (7 de novo, 1 transmitted parental mosaic, 2 transmitted parental heterozygous, and 1 compound heterozygous) as pathogenic and likely contributory to death, a genetic finding in 8.9% of our cohort. Two genes had recurrent missense DNMs, RYR2 and CACNA1C Both RYR2 mutations are pathogenic (P = 1.7 × 10-7) and were previously studied in mouse models. Both CACNA1C mutations lie within a 104-nt exon (P = 1.0 × 10-7) and result in slowed L-type calcium channel inactivation and lower current density. In total, six pathogenic DNMs can alter calcium-related regulation of cardiomyocyte and neuronal excitability at a submembrane junction, suggesting a pathway conferring susceptibility to sudden death. There was a trend for excess LoF mutations in LoF intolerant genes, where ≥1 nonhealthy sample in denovo-db has a similar variant (odds ratio = 6.73, P = 0.02); additional uncharacterized genetic causes of sudden death in children might be discovered with larger cohorts.


Subject(s)
Arrhythmias, Cardiac/genetics , Calcium Signaling/genetics , Death, Sudden , Epilepsy/genetics , Child, Preschool , Female , Humans , Infant , Male , Mutation/genetics , Exome Sequencing
9.
Proc Natl Acad Sci U S A ; 118(10)2021 03 09.
Article in English | MEDLINE | ID: mdl-33649239

ABSTRACT

Routine rewriting of loci associated with human traits and diseases would facilitate their functional analysis. However, existing DNA integration approaches are limited in terms of scalability and portability across genomic loci and cellular contexts. We describe Big-IN, a versatile platform for targeted integration of large DNAs into mammalian cells. CRISPR/Cas9-mediated targeting of a landing pad enables subsequent recombinase-mediated delivery of variant payloads and efficient positive/negative selection for correct clones in mammalian stem cells. We demonstrate integration of constructs up to 143 kb, and an approach for one-step scarless delivery. We developed a staged pipeline combining PCR genotyping and targeted capture sequencing for economical and comprehensive verification of engineered stem cells. Our approach should enable combinatorial interrogation of genomic functional elements and systematic locus-scale analysis of genome function.


Subject(s)
CRISPR-Cas Systems , Gene Editing , Genetic Loci , Genome, Human , Human Embryonic Stem Cells , Mouse Embryonic Stem Cells , Animals , Cell Line , Humans , Mice
10.
Genome Res ; 30(12): 1781-1788, 2020 12.
Article in English | MEDLINE | ID: mdl-33093069

ABSTRACT

Effective public response to a pandemic relies upon accurate measurement of the extent and dynamics of an outbreak. Viral genome sequencing has emerged as a powerful approach to link seemingly unrelated cases, and large-scale sequencing surveillance can inform on critical epidemiological parameters. Here, we report the analysis of 864 SARS-CoV-2 sequences from cases in the New York City metropolitan area during the COVID-19 outbreak in spring 2020. The majority of cases had no recent travel history or known exposure, and genetically linked cases were spread throughout the region. Comparison to global viral sequences showed that early transmission was most linked to cases from Europe. Our data are consistent with numerous seeds from multiple sources and a prolonged period of unrecognized community spreading. This work highlights the complementary role of genomic surveillance in addition to traditional epidemiological indicators.


Subject(s)
COVID-19 , Genome, Viral , Pandemics , Phylogeny , SARS-CoV-2/genetics , Whole Genome Sequencing , COVID-19/epidemiology , COVID-19/genetics , COVID-19/transmission , Female , Humans , Male , New York City
11.
PLoS Pathog ; 17(5): e1009571, 2021 05.
Article in English | MEDLINE | ID: mdl-34015049

ABSTRACT

During the first phase of the COVID-19 epidemic, New York City rapidly became the epicenter of the pandemic in the United States. While molecular phylogenetic analyses have previously highlighted multiple introductions and a period of cryptic community transmission within New York City, little is known about the circulation of SARS-CoV-2 within and among its boroughs. We here perform phylogeographic investigations to gain insights into the circulation of viral lineages during the first months of the New York City outbreak. Our analyses describe the dispersal dynamics of viral lineages at the state and city levels, illustrating that peripheral samples likely correspond to distinct dispersal events originating from the main metropolitan city areas. In line with the high prevalence recorded in this area, our results highlight the relatively important role of the borough of Queens as a transmission hub associated with higher local circulation and dispersal of viral lineages toward the surrounding boroughs.


Subject(s)
COVID-19/epidemiology , COVID-19/transmission , SARS-CoV-2/genetics , Genome, Viral/genetics , Humans , New York City/epidemiology , Phylogeny , Phylogeography , Prevalence , SARS-CoV-2/classification , SARS-CoV-2/isolation & purification
12.
Hum Genet ; 141(8): 1431-1447, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35147782

ABSTRACT

Drug development and biological discovery require effective strategies to map existing genetic associations to causal genes. To approach this problem, we selected 12 common diseases and quantitative traits for which highly powered genome-wide association studies (GWAS) were available. For each disease or trait, we systematically curated positive control gene sets from Mendelian forms of the disease and from targets of medicines used for disease treatment. We found that these positive control genes were highly enriched in proximity of GWAS-associated single-nucleotide variants (SNVs). We then performed quantitative assessment of the contribution of commonly used genomic features, including open chromatin maps, expression quantitative trait loci (eQTL), and chromatin conformation data. Using these features, we trained and validated an Effector Index (Ei), to map target genes for these 12 common diseases and traits. Ei demonstrated high predictive performance, both with cross-validation on the training set, and an independently derived set for type 2 diabetes. Key predictive features included coding or transcript-altering SNVs, distance to gene, and open chromatin-based metrics. This work outlines a simple, understandable approach to prioritize genes at GWAS loci for functional follow-up and drug development, and provides a systematic strategy for prioritization of GWAS target genes.


Subject(s)
Diabetes Mellitus, Type 2 , Genome-Wide Association Study , Chromatin/genetics , Diabetes Mellitus, Type 2/genetics , Genetic Predisposition to Disease , Humans , Polymorphism, Single Nucleotide , Quantitative Trait Loci
13.
Mol Biol Evol ; 35(8): 1958-1967, 2018 08 01.
Article in English | MEDLINE | ID: mdl-29850830

ABSTRACT

Noncoding DNA sequences, which play various roles in gene expression and regulation, are under evolutionary pressure. Gene regulation requires specific protein-DNA binding events, and our previous studies showed that both DNA sequence and shape readout are employed by transcription factors (TFs) to achieve DNA binding specificity. By investigating the shape-disrupting properties of single nucleotide polymorphisms (SNPs) in human regulatory regions, we established a link between disruptive local DNA shape changes and loss of specific TF binding. Furthermore, we described cases where disease-associated SNPs may alter TF binding through DNA shape changes. This link led us to hypothesize that local DNA shape within and around TF binding sites is under selection pressure. To verify this hypothesis, we analyzed SNP data derived from 216 natural strains of Drosophila melanogaster. Comparing SNPs located in functional and nonfunctional regions within experimentally validated cis-regulatory modules (CRMs) from D. melanogaster that are active in the blastoderm stage of development, we found that SNPs within functional regions tended to cause smaller DNA shape variations. Furthermore, SNPs with higher minor allele frequency were more likely to result in smaller DNA shape variations. The same analysis based on a large number of SNPs in putative CRMs of the D. melanogaster genome derived from DNase I accessibility data confirmed these observations. Taken together, our results indicate that common SNPs in functional regions tend to maintain DNA shape, whereas shape-disrupting SNPs are more likely to be eliminated through purifying selection.


Subject(s)
DNA , Nucleic Acid Conformation , Polymorphism, Single Nucleotide , Selection, Genetic , Transcription Factors/metabolism , Animals , Binding Sites , Drosophila melanogaster , Gene Frequency , Genome, Insect , Humans
14.
Nature ; 489(7414): 75-82, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955617

ABSTRACT

DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.


Subject(s)
Chromatin/genetics , Chromatin/metabolism , DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , DNA Footprinting , DNA Methylation , DNA-Binding Proteins/metabolism , Deoxyribonuclease I/metabolism , Evolution, Molecular , Genomics , Humans , Mutation Rate , Promoter Regions, Genetic/genetics , Transcription Factors/metabolism , Transcription Initiation Site , Transcription, Genetic
15.
Nature ; 489(7414): 83-90, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955618

ABSTRACT

Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.


Subject(s)
DNA Footprinting , DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , DNA Methylation , DNA-Binding Proteins/metabolism , Deoxyribonuclease I/metabolism , Genomic Imprinting , Genomics , Humans , Polymorphism, Single Nucleotide/genetics , Transcription Initiation Site
16.
Genome Res ; 22(9): 1689-97, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955981

ABSTRACT

The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.


Subject(s)
Genetic Variation , Genomics , Regulatory Elements, Transcriptional , Regulatory Sequences, Nucleic Acid , Cell Line , Cell Line, Tumor , Chromosome Mapping , Deoxyribonuclease I/metabolism , Evolution, Molecular , Genetic Heterogeneity , Genome, Human , Genome-Wide Association Study , Humans , Neoplasms/genetics , Nucleotide Motifs , Polymorphism, Genetic , Population Groups/genetics , Selection, Genetic , Transcriptional Activation
17.
Genome Res ; 22(9): 1680-8, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955980

ABSTRACT

CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environments. Here we analyze genome-wide occupancy patterns of CTCF by ChIP-seq in 19 diverse human cell types, including normal primary cells and immortal lines. We observed highly reproducible yet surprisingly plastic genomic binding landscapes, indicative of strong cell-selective regulation of CTCF occupancy. Comparison with massively parallel bisulfite sequencing data indicates that 41% of variable CTCF binding is linked to differential DNA methylation, concentrated at two critical positions within the CTCF recognition sequence. Unexpectedly, CTCF binding patterns were markedly different in normal versus immortal cells, with the latter showing widespread disruption of CTCF binding associated with increased methylation. Strikingly, this disruption is accompanied by up-regulation of CTCF expression, with the result that both normal and immortal cells maintain the same average number of CTCF occupancy sites genome-wide. These results reveal a tight linkage between DNA methylation and the global occupancy patterns of a major sequence-specific regulatory factor.


Subject(s)
DNA Methylation , Repressor Proteins/metabolism , Binding Sites/genetics , CCCTC-Binding Factor , Cell Line , Chromatin Immunoprecipitation , Cluster Analysis , CpG Islands , Gene Expression Regulation , High-Throughput Nucleotide Sequencing , Humans
18.
PLoS Genet ; 8(3): e1002599, 2012.
Article in English | MEDLINE | ID: mdl-22457641

ABSTRACT

The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF-binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein-DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human-chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of "perfect" genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements.


Subject(s)
Binding Sites , Polymorphism, Genetic , Repressor Proteins/genetics , Transcription Factors , Alleles , Animals , CCCTC-Binding Factor , Cell Line , DNA-Binding Proteins/genetics , Genetic Linkage , Genotype , Humans , Polymorphism, Single Nucleotide , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/genetics
19.
medRxiv ; 2024 May 16.
Article in English | MEDLINE | ID: mdl-38798557

ABSTRACT

Genetic variation within intron 3 of the CACNA1C calcium channel gene is associated with schizophrenia and bipolar disorder, but analysis of the causal variants and their effect is complicated by a nearby variable-number tandem repeat (VNTR). Here, we used 155 long-read genome assemblies from 78 diverse individuals to delineate the structure and population variability of the CACNA1C intron 3 VNTR. We categorized VNTR sequences into 7 Types of structural alleles using sequence differences among repeat units. Only 12 repeat units at the 5' end of the VNTR were shared across most Types, but several Types were related through a series of large and small duplications. The most diverged Types were rare and present only in individuals with African ancestry, but the multiallelic structural polymorphism Variable Region 2 was present across populations at different frequencies, consistent with expansion of the VNTR preceding the emergence of early hominins. VR2 was in complete linkage disequilibrium with fine-mapped schizophrenia variants (SNPs) from genome-wide association studies (GWAS). This risk haplotype was associated with decreased CACNA1C gene expression in brain tissues profiled by the GTEx project. Our work suggests that sequence variation within a human-specific VNTR affects gene expression, and provides a detailed characterization of new alleles at a flagship neuropsychiatric locus.

20.
bioRxiv ; 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-37781588

ABSTRACT

Enhancer function is frequently investigated piecemeal using truncated reporter assays or single deletion analysis. Thus it remains unclear to what extent enhancer function at native loci relies on surrounding genomic context. Using the Big-IN technology for targeted integration of large DNAs, we analyzed the regulatory architecture of the murine Igf2/H19 locus, a paradigmatic model of enhancer selectivity. We assembled payloads containing a 157-kb functional Igf2/H19 locus and engineered mutations to genetically direct CTCF occupancy at the imprinting control region (ICR) that switches the target gene of the H19 enhancer cluster. Contrasting activity of payloads delivered at the endogenous Igf2/H19 locus or ectopically at Hprt revealed that the Igf2/H19 locus includes additional, previously unknown long-range regulatory elements. Exchanging components of the Igf2/H19 locus with the well-studied Sox2 locus showed that the H19 enhancer cluster functioned poorly out of context, and required its native surroundings to activate Sox2 expression. Conversely, the Sox2 locus control region (LCR) could activate both Igf2 and H19 outside its native context, but its activity was only partially modulated by CTCF occupancy at the ICR. Analysis of regulatory DNA actuation across different cell types revealed that, while the H19 enhancers are tightly coordinated within their native locus, the Sox2 LCR acts more independently. We show that these enhancer clusters typify broader classes of loci genome-wide. Our results show that unexpected dependencies may influence even the most studied functional elements, and our synthetic regulatory genomics approach permits large-scale manipulation of complete loci to investigate the relationship between locus architecture and function.

SELECTION OF CITATIONS
SEARCH DETAIL