Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
Science ; 346(6212): 1007-12, 2014 Nov 21.
Article in English | MEDLINE | ID: mdl-25411453

ABSTRACT

To study the evolutionary dynamics of regulatory DNA, we mapped >1.3 million deoxyribonuclease I-hypersensitive sites (DHSs) in 45 mouse cell and tissue types, and systematically compared these with human DHS maps from orthologous compartments. We found that the mouse and human genomes have undergone extensive cis-regulatory rewiring that combines branch-specific evolutionary innovation and loss with widespread repurposing of conserved DHSs to alternative cell fates, and that this process is mediated by turnover of transcription factor (TF) recognition elements. Despite pervasive evolutionary remodeling of the location and content of individual cis-regulatory regions, within orthologous mouse and human cell types the global fraction of regulatory DNA bases encoding recognition sites for each TF has been strictly conserved. Our findings provide new insights into the evolutionary forces shaping mammalian regulatory DNA landscapes.


Subject(s)
Conserved Sequence , DNA/genetics , Evolution, Molecular , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Animals , Base Sequence , Deoxyribonuclease I , Genome, Human , Humans , Mice , Restriction Mapping
2.
Cell ; 154(4): 888-903, 2013 Aug 15.
Article in English | MEDLINE | ID: mdl-23953118

ABSTRACT

Cellular-state information between generations of developing cells may be propagated via regulatory regions. We report consistent patterns of gain and loss of DNase I-hypersensitive sites (DHSs) as cells progress from embryonic stem cells (ESCs) to terminal fates. DHS patterns alone convey rich information about cell fate and lineage relationships distinct from information conveyed by gene expression. Developing cells share a proportion of their DHS landscapes with ESCs; that proportion decreases continuously in each cell type as differentiation progresses, providing a quantitative benchmark of developmental maturity. Developmentally stable DHSs densely encode binding sites for transcription factors involved in autoregulatory feedback circuits. In contrast to normal cells, cancer cells extensively reactivate silenced ESC DHSs and those from developmental programs external to the cell lineage from which the malignancy derives. Our results point to changes in regulatory DNA landscapes as quantitative indicators of cell-fate transitions, lineage relationships, and dysfunction.


Subject(s)
Cell Lineage , Gene Expression Regulation, Developmental , Animals , Cell Differentiation , Cell Transformation, Neoplastic , Chromatin/metabolism , Embryonic Stem Cells/metabolism , Enhancer Elements, Genetic , Feedback , Humans , Mice , Stem Cells/metabolism
3.
Nature ; 489(7414): 75-82, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955617

ABSTRACT

DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.


Subject(s)
Chromatin/genetics , Chromatin/metabolism , DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , DNA Footprinting , DNA Methylation , DNA-Binding Proteins/metabolism , Deoxyribonuclease I/metabolism , Evolution, Molecular , Genomics , Humans , Mutation Rate , Promoter Regions, Genetic/genetics , Transcription Factors/metabolism , Transcription Initiation Site , Transcription, Genetic
4.
Nature ; 489(7414): 83-90, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955618

ABSTRACT

Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.


Subject(s)
DNA Footprinting , DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , DNA Methylation , DNA-Binding Proteins/metabolism , Deoxyribonuclease I/metabolism , Genomic Imprinting , Genomics , Humans , Polymorphism, Single Nucleotide/genetics , Transcription Initiation Site
5.
Science ; 337(6099): 1190-5, 2012 Sep 07.
Article in English | MEDLINE | ID: mdl-22955828

ABSTRACT

Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure-related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.


Subject(s)
DNA/genetics , Disease/genetics , Genetic Variation , Polymorphism, Single Nucleotide , Regulatory Elements, Transcriptional , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Alleles , Chromatin/metabolism , Chromatin/ultrastructure , Crohn Disease/genetics , Deoxyribonuclease I/metabolism , Electrocardiography , Fetal Development , Fetus/metabolism , Gene Regulatory Networks , Genome, Human , Genome-Wide Association Study , Humans , Multiple Sclerosis/genetics , Phenotype , Promoter Regions, Genetic , Transcription Factors/chemistry , Transcription Factors/genetics
6.
Bioinformatics ; 28(14): 1919-20, 2012 Jul 15.
Article in English | MEDLINE | ID: mdl-22576172

ABSTRACT

UNLABELLED: The large and growing number of genome-wide datasets highlights the need for high-performance feature analysis and data comparison methods, in addition to efficient data storage and retrieval techniques. We introduce BEDOPS, a software suite for common genomic analysis tasks which offers improved flexibility, scalability and execution time characteristics over previously published packages. The suite includes a utility to compress large inputs into a lossless format that can provide greater space savings and faster data extractions than alternatives. AVAILABILITY: http://code.google.com/p/bedops/ includes binaries, source and documentation.


Subject(s)
Data Compression/methods , Genomics/methods , Software
7.
Hum Gene Ther ; 23(2): 231-7, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21981728

ABSTRACT

Concerns surrounding the oncogenic potential of recombinant gammaretroviral vectors has spurred a great deal of interest in vector integration site (VIS) preferences. Although gammaretroviral vectors exhibit a modest preference for integration near transcription start sites (TSS) of active genes, such associations only account for about a third of all VIS. Previous studies suggested a correlation between gammaretroviral VIS and DNase hypersensitive sites (DHS), which mark chromatin regions associated with cis-regulatory elements. In order to study this issue directly, we assessed the correlation between 167 validated gammaretroviral VIS and a deep genome-wide map of DHS, both determined in the same cell line (the human fibrosarcoma HT1080). The DHS map was developed by sequencing individual DNase I cleavage sites using massively parallel sequencing technologies. These studies revealed an overwhelming preference for integrations associated with DHS, with a median distance of only 238 bp between individual VIS and the nearest DHS for the experimental dataset, compared to 3 kb for a random dataset and 577 to 1457 bp for two unrelated cell lines (p<0.001). Indeed, nearly 84% of all VIS were found to be located within 1 kb of a DHS (p=10(-43)). Further, this correlation was statistically independent from the association with TSS. The preference for DHS far exceeds that seen for other hallmarks of gammaretroviral VIS, including TSS, and may help explain several aspects of gammaretroviral vector biology, including the mechanism of VIS selection, as well as the relative frequency and underlying biology of gammaretroviral vector-mediated genotoxicity.


Subject(s)
Chromatin/genetics , Deoxyribonuclease I/genetics , Fibrosarcoma/virology , Gammaretrovirus/genetics , Virus Integration , Cell Line, Tumor , Chromosome Mapping , Fibrosarcoma/genetics , Fibrosarcoma/pathology , Genetic Vectors , Humans , Regulatory Sequences, Nucleic Acid , Sequence Analysis, DNA , Transcription Initiation Site , Transcription, Genetic
8.
Nucleic Acids Res ; 37(13): e95, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19528077

ABSTRACT

We developed a primer design method, Pythia, in which state of the art DNA binding affinity computations are directly integrated into the primer design process. We use chemical reaction equilibrium analysis to integrate multiple binding energy calculations into a conservative measure of polymerase chain reaction (PCR) efficiency, and a precomputed index on genomic sequences to evaluate primer specificity. We show that Pythia can design primers with success rates comparable with those of current methods, but yields much higher coverage in difficult genomic regions. For example, in RepeatMasked sequences in the human genome, Pythia achieved a median coverage of 89% as compared with a median coverage of 51% for Primer3. For parameter settings yielding sensitivities of 81%, our method has a recall of 97%, compared with the Primer3 recall of 48%. Because our primer design approach is based on the chemistry of DNA interactions, it has fewer and more physically meaningful parameters than current methods, and is therefore easier to adjust to specific experimental requirements. Our software is freely available at http://pythia.sourceforge.net.


Subject(s)
Algorithms , DNA Primers/chemistry , Polymerase Chain Reaction , Thermodynamics , DNA/chemistry , DNA Primers/standards , Genome, Human , Genomics , Humans , Interspersed Repetitive Sequences , Nucleic Acid Denaturation
9.
Genome Biol ; 9(12): R168, 2008.
Article in English | MEDLINE | ID: mdl-19055709

ABSTRACT

BACKGROUND: Conserved non-coding sequences in the human genome are approximately tenfold more abundant than known genes, and have been hypothesized to mark the locations of cis-regulatory elements. However, the global contribution of conserved non-coding sequences to the transcriptional regulation of human genes is currently unknown. Deeply conserved elements shared between humans and teleost fish predominantly flank genes active during morphogenesis and are enriched for positive transcriptional regulatory elements. However, such deeply conserved elements account for <1% of the conserved non-coding sequences in the human genome, which are predominantly mammalian. RESULTS: We explored the regulatory potential of a large sample of these 'common' conserved non-coding sequences using a variety of classic assays, including chromatin remodeling, and enhancer/repressor and promoter activity. When tested across diverse human model cell types, we find that the fraction of experimentally active conserved non-coding sequences within any given cell type is low (approximately 5%), and that this proportion increases only modestly when considered collectively across cell types. CONCLUSIONS: The results suggest that classic assays of cis-regulatory potential are unlikely to expose the functional potential of the substantial majority of mammalian conserved non-coding sequences in the human genome.


Subject(s)
Conserved Sequence/genetics , Genome, Human , Regulatory Sequences, Nucleic Acid , Animals , Cell Line , Evolution, Molecular , Genome , Humans , Mice
10.
Nature ; 447(7146): 799-816, 2007 Jun 14.
Article in English | MEDLINE | ID: mdl-17571346

ABSTRACT

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.


Subject(s)
Genome, Human/genetics , Genomics , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic/genetics , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation , Conserved Sequence/genetics , DNA Replication , Evolution, Molecular , Exons/genetics , Genetic Variation/genetics , Heterozygote , Histones/metabolism , Humans , Pilot Projects , Protein Binding , RNA, Messenger/genetics , RNA, Untranslated/genetics , Transcription Factors/metabolism , Transcription Initiation Site
11.
J Bioinform Comput Biol ; 4(2): 299-315, 2006 Apr.
Article in English | MEDLINE | ID: mdl-16819785

ABSTRACT

The polymerase chain reaction (PCR) is a fundamental tool of molecular biology. Quantitative PCR is the gold-standard methodology for determination of DNA copy numbers, quantitating transcription, and numerous other applications. A major barrier to large-scale application of PCR for quantitative genomic analyses is the current requirement for manual validation of individual PCRs to ensure generation of a single product. This typically requires visual inspection either of gel electrophoreses or temperature dissociation ("melting") curves of individual PCRs--a time-consuming and costly process. Here we describe a robust computational solution to this fundamental problem. Using a training set of 10 080 reactions comprising multiple quantitative PCRs from each of 1728 unique human genomic amplicons, we developed a support vector machine classifier capable of discriminating single-product PCRs with better than 99% accuracy. This approach has broad utility, and eliminates a major bottleneck to widespread application of PCR for high-throughput genomic applications.


Subject(s)
Algorithms , Artificial Intelligence , DNA/analysis , DNA/chemistry , Pattern Recognition, Automated/methods , Polymerase Chain Reaction/methods , DNA/genetics , Nucleic Acid Denaturation , Transition Temperature
12.
Nat Methods ; 3(7): 511-8, 2006 Jul.
Article in English | MEDLINE | ID: mdl-16791208

ABSTRACT

Localized accessibility of critical DNA sequences to the regulatory machinery is a key requirement for regulation of human genes. Here we describe a high-resolution, genome-scale approach for quantifying chromatin accessibility by measuring DNase I sensitivity as a continuous function of genome position using tiling DNA microarrays (DNase-array). We demonstrate this approach across 1% ( approximately 30 Mb) of the human genome, wherein we localized 2,690 classical DNase I hypersensitive sites with high sensitivity and specificity, and also mapped larger-scale patterns of chromatin architecture. DNase I hypersensitive sites exhibit marked aggregation around transcriptional start sites (TSSs), though the majority mark nonpromoter functional elements. We also developed a computational approach for visualizing higher-order features of chromatin structure. This revealed that human chromatin organization is dominated by large (100-500 kb) 'superclusters' of DNase I hypersensitive sites, which encompass both gene-rich and gene-poor regions. DNase-array is a powerful and straightforward approach for systematic exposition of the cis-regulatory architecture of complex genomes.


Subject(s)
Deoxyribonuclease I/chemistry , Genome , Oligonucleotide Array Sequence Analysis/methods , Chromatin/chemistry , Deoxyribonuclease I/genetics , Humans , Regulatory Sequences, Nucleic Acid
13.
Article in English | MEDLINE | ID: mdl-16447995

ABSTRACT

PCR, the polymerase chain reaction, is a fundamental tool of molecular biology. Quantitative PCR is the gold-standard methodology for determination of DNA copy numbers, quantitating transcription, and numerous other applications. A major barrier to large-scale application of PCR for quantitative genomic analyses is the current requirement for manual validation of individual PCR reactions to ensure generation of a single product. This typically requires visual inspection either of gel electrophoreses or temperature dissociation ("melting") curves of individual PCR reactions - a time-consuming and costly process. Here we describe a robust computational solution to this fundamental problem. Using a training set of 10,080 reactions comprising multiple quantitative PCR reactions from each of 1,728 unique human genomic amplicons, we developed a support vector machine classifier capable of discriminating single-product PCR reactions with better than 99% accuracy. This approach has broad utility, and eliminates a major bottleneck to widespread application of PCR for high-throughput genomic applications.


Subject(s)
Algorithms , Artificial Intelligence , DNA/analysis , DNA/chemistry , Pattern Recognition, Automated/methods , Polymerase Chain Reaction/methods , DNA/genetics , Nucleic Acid Denaturation , Reproducibility of Results , Sensitivity and Specificity , Transition Temperature
14.
Proc Natl Acad Sci U S A ; 101(48): 16837-42, 2004 Nov 30.
Article in English | MEDLINE | ID: mdl-15550541

ABSTRACT

We developed a quantitative methodology, digital analysis of chromatin structure (DACS), for high-throughput, automated mapping of DNase I-hypersensitive sites and associated cis-regulatory sequences in the human and other complex genomes. We used 19/20-bp genomic DNA tags to localize individual DNase I cutting events in nuclear chromatin and produced approximately 257,000 tags from erythroid cells. Tags were mapped to the human genome, and a quantitative algorithm was applied to discriminate statistically significant clusters of independent DNase I cutting events. We show that such clusters identify both known regulatory sequences and previously unrecognized functional elements across the genome. We used in silico simulation to demonstrate that DACS is capable of efficient and accurate localization of the majority of DNase I-hypersensitive sites in the human genome without requiring an independent validation step. A unique feature of DACS is that it permits unbiased evaluation of the chromatin state of regulatory sequences from widely separated genomic loci. We found surprisingly large differences in the accessibility of distant regulatory sequences, suggesting the existence of a hierarchy of nuclear organization that escapes detection by conventional chromatin assays.


Subject(s)
Chromatin/chemistry , Chromatin/genetics , Humans , K562 Cells , Multigene Family , Protein Conformation , Regulatory Sequences, Nucleic Acid
15.
Proc Natl Acad Sci U S A ; 101(13): 4537-42, 2004 Mar 30.
Article in English | MEDLINE | ID: mdl-15070753

ABSTRACT

Comprehensive identification of sequences that regulate transcription is one of the major goals of genome biology. Focal alteration in chromatin structure in vivo, detectable through hypersensitivity to DNaseI and other nucleases, is the sine qua non of a diverse cast of transcriptional regulatory elements including enhancers, promoters, insulators, and locus control regions. We developed an approach for genome-scale identification of DNaseI hypersensitive sites (HSs) via isolation and cloning of in vivo DNaseI cleavage sites to create libraries of active chromatin sequences (ACSs). Here, we describe analysis of >61,000 ACSs derived from erythroid cells. We observed peaks in the density of ACSs at the transcriptional start sites of known genes at non-gene-associated CpG islands, and, to a lesser degree, at evolutionarily conserved noncoding sequences. Peaks in ACS density paralleled the distribution of DNaseI HSs. ACSs and DNaseI HSs were distributed between both expressed and nonexpressed genes, suggesting that a large proportion of genes reside within open chromatin domains. The results permit a quantitative approximation of the distribution of HSs and classical cis-regulatory sequences in the human genome.


Subject(s)
Chromatin/genetics , DNA/metabolism , Deoxyribonuclease I/metabolism , Genome, Human , DNA/chemistry , Gene Expression Regulation, Neoplastic , Humans , Introns/genetics , K562 Cells , Oligonucleotide Array Sequence Analysis , Substrate Specificity , Transcription, Genetic
16.
Nat Methods ; 1(3): 219-25, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15782197

ABSTRACT

Identification of functional, noncoding elements that regulate transcription in the context of complex genomes is a major goal of modern biology. Localization of functionality to specific sequences is a requirement for genetic and computational studies. Here, we describe a generic approach, quantitative chromatin profiling, that uses quantitative analysis of in vivo chromatin structure over entire gene loci to rapidly and precisely localize cis-regulatory sequences and other functional modalities encoded by DNase I hypersensitive sites. To demonstrate the accuracy of this approach, we analyzed approximately 300 kilobases of human genome sequence from diverse gene loci and cleanly delineated functional elements corresponding to a spectrum of classical cis-regulatory activities including enhancers, promoters, locus control regions and insulators as well as novel elements. Systematic, high-throughput identification of functional elements coinciding with DNase I hypersensitive sites will substantially expand our knowledge of transcriptional regulation and should simplify the search for noncoding genetic variation with phenotypic consequences.


Subject(s)
Algorithms , Chromatin/genetics , Chromosome Mapping/methods , Deoxyribonuclease I/genetics , Polymerase Chain Reaction/methods , Quantitative Trait Loci/genetics , Sequence Analysis, DNA/methods , Cell Line , Erythroid Cells/enzymology , Genes, Regulator/genetics , Genome, Human , Humans , Reproducibility of Results , Sensitivity and Specificity , Sequence Alignment/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...