ABSTRACT
DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify â¼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect â¼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.
Subject(s)
Chromatin/genetics , Chromatin/metabolism , DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , DNA Footprinting , DNA Methylation , DNA-Binding Proteins/metabolism , Deoxyribonuclease I/metabolism , Evolution, Molecular , Genomics , Humans , Mutation Rate , Promoter Regions, Genetic/genetics , Transcription Factors/metabolism , Transcription Initiation Site , Transcription, GeneticABSTRACT
Understanding the molecular basis for phenotypic differences between humans and other primates remains an outstanding challenge. Mutations in non-coding regulatory DNA that alter gene expression have been hypothesized as a key driver of these phenotypic differences. This has been supported by differential gene expression analyses in general, but not by the identification of specific regulatory elements responsible for changes in transcription and phenotype. To identify the genetic source of regulatory differences, we mapped DNaseI hypersensitive (DHS) sites, which mark all types of active gene regulatory elements, genome-wide in the same cell type isolated from human, chimpanzee, and macaque. Most DHS sites were conserved among all three species, as expected based on their central role in regulating transcription. However, we found evidence that several hundred DHS sites were gained or lost on the lineages leading to modern human and chimpanzee. Species-specific DHS site gains are enriched near differentially expressed genes, are positively correlated with increased transcription, show evidence of branch-specific positive selection, and overlap with active chromatin marks. Species-specific sequence differences in transcription factor motifs found within these DHS sites are linked with species-specific changes in chromatin accessibility. Together, these indicate that the regulatory elements identified here are genetic contributors to transcriptional and phenotypic differences among primate species.
Subject(s)
Deoxyribonuclease I/genetics , Evolution, Molecular , Primates/genetics , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic , Animals , Binding Sites/genetics , Cell Line , Chromatin/genetics , Gene Expression Regulation , Genome, Human , Humans , Mutation , Nucleotide Motifs , Phenotype , Selection, Genetic , Species Specificity , Transcription Factors/geneticsABSTRACT
Regulation of gene transcription in diverse cell types is determined largely by varied sets of cis-elements where transcription factors bind. Here we demonstrate that data from a single high-throughput DNase I hypersensitivity assay can delineate hundreds of thousands of base-pair resolution in vivo footprints in human cells that precisely mark individual transcription factor-DNA interactions. These annotations provide a unique resource for the investigation of cis-regulatory elements. We find that footprints for specific transcription factors correlate with ChIP-seq enrichment and can accurately identify functional versus nonfunctional transcription factor motifs. We also find that footprints reveal a unique evolutionary conservation pattern that differentiates functional footprinted bases from surrounding DNA. Finally, detailed analysis of CTCF footprints suggests multiple modes of binding and a novel DNA binding motif upstream of the primary binding site.
Subject(s)
DNA-Binding Proteins/metabolism , Protein Footprinting/methods , Transcription Factors/metabolism , Base Sequence , Binding Sites , Cell Line , DNA/metabolism , DNA-Binding Proteins/genetics , Deoxyribonuclease I/metabolism , Genome , Genomics , Humans , Molecular Sequence Data , Promoter Regions, Genetic , Protein Binding/genetics , Transcription Factors/genetics , Transcription, GeneticABSTRACT
The human body contains thousands of unique cell types, each with specialized functions. Cell identity is governed in large part by gene transcription programs, which are determined by regulatory elements encoded in DNA. To identify regulatory elements active in seven cell lines representative of diverse human cell types, we used DNase-seq and FAIRE-seq (Formaldehyde Assisted Isolation of Regulatory Elements) to map "open chromatin." Over 870,000 DNaseI or FAIRE sites, which correspond tightly to nucleosome-depleted regions, were identified across the seven cell lines, covering nearly 9% of the genome. The combination of DNaseI and FAIRE is more effective than either assay alone in identifying likely regulatory elements, as judged by coincidence with transcription factor binding locations determined in the same cells. Open chromatin common to all seven cell types tended to be at or near transcription start sites and to be coincident with CTCF binding sites, while open chromatin sites found in only one cell type were typically located away from transcription start sites and contained DNA motifs recognized by regulators of cell-type identity. We show that open chromatin regions bound by CTCF are potent insulators. We identified clusters of open regulatory elements (COREs) that were physically near each other and whose appearance was coordinated among one or more cell types. Gene expression and RNA Pol II binding data support the hypothesis that COREs control gene activity required for the maintenance of cell-type identity. This publicly available atlas of regulatory elements may prove valuable in identifying noncoding DNA sequence variants that are causally linked to human disease.
Subject(s)
Chromatin/metabolism , Chromosome Mapping , Regulatory Elements, Transcriptional , Sequence Analysis, DNA/methods , Base Sequence , Binding Sites , CCCTC-Binding Factor , Cell Differentiation/genetics , Cell Line , Gene Expression Regulation , Humans , Protein Binding , Repressor Proteins/metabolism , Transcription, Genetic , Transcriptional ActivationABSTRACT
The epithelium lining the epididymis has a pivotal role in ensuring a luminal environment that can support normal sperm maturation. Many of the individual genes that encode proteins involved in establishing the epididymal luminal fluid are well characterized. They include ion channels, ion exchangers, transporters, and solute carriers. However, the molecular mechanisms that coordinate expression of these genes and modulate their activities in response to biological stimuli are less well understood. To identify cis-regulatory elements for genes expressed in human epididymis epithelial cells, we generated genome-wide maps of open chromatin by DNase-seq. This analysis identified 33,542 epididymis-selective DNase I hypersensitive sites (DHS), which were not evident in five cell types of different lineages. Identification of genes with epididymis-selective DHS at their promoters revealed gene pathways that are active in immature epididymis epithelial cells. These include processes correlating with epithelial function and also others with specific roles in the epididymis, including retinol metabolism and ascorbate and aldarate metabolism. Peaks of epididymis-selective chromatin were seen in the androgen receptor gene and the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which has a critical role in regulating ion transport across the epididymis epithelium. In silico prediction of transcription factor binding sites that were overrepresented in epididymis-selective DHS identified epithelial transcription factors, including ELF5 and ELF3, the androgen receptor, Pax2, and Sox9, as components of epididymis transcriptional networks. Active genes, which are targets of each transcription factor, reveal important biological processes in the epididymis epithelium.
Subject(s)
Chromatin Assembly and Disassembly , Epididymis/metabolism , Epithelial Cells/metabolism , Gene Expression Regulation, Developmental , Infertility, Male/genetics , Spermatogenesis , Cells, Cultured , Chromosome Mapping , Computational Biology , DNA, Intergenic , Epididymis/cytology , Epididymis/growth & development , Epididymis/physiopathology , Epithelial Cells/cytology , Expert Systems , Fetus/cytology , Gene Expression Profiling , Genome-Wide Association Study , Humans , Infertility, Male/metabolism , Infertility, Male/physiopathology , Male , Nucleosomes/metabolism , Oligonucleotide Array Sequence Analysis , Organ Specificity , Promoter Regions, GeneticABSTRACT
BACKGROUND: Distal cell-type-specific regulatory elements may be located at very large distances from the genes that they control and are often hidden within intergenic regions or in introns of other genes. The development of methods that enable mapping of regions of open chromatin genome wide has greatly advanced the identification and characterisation of these elements. METHODS: Here we use DNase I hypersensitivity mapping followed by deep sequencing (DNase-seq) to generate a map of open chromatin in primary human tracheal epithelial (HTE) cells and use bioinformatic approaches to characterise the distribution of these sites within the genome and with respect to gene promoters, intronic and intergenic regions. RESULTS: Genes with HTE-selective open chromatin at their promoters were associated with multiple pathways of epithelial function and differentiation. The data predict novel cell-type-specific regulatory elements for genes involved in HTE cell function, such as structural proteins and ion channels, and the transcription factors that may interact with them to control gene expression. Moreover, the map of open chromatin can identify the location of potentially critical regulatory elements in genome-wide association studies (GWAS) in which the strongest association is with single nucleotide polymorphisms in non-coding regions of the genome. We demonstrate its relevance to a recent GWAS that identifies modifiers of cystic fibrosis lung disease severity. CONCLUSION: Since HTE cells have many functional similarities with bronchial epithelial cells and other differentiated cells in the respiratory epithelium, these data are of direct relevance to elucidating the molecular basis of normal lung function and lung disease.
Subject(s)
Chromatin/genetics , Epithelial Cells/metabolism , Gene Expression Regulation , Lung/physiology , Respiratory Mucosa/metabolism , Trachea/cytology , Chromosome Mapping/methods , Computational Biology , Deoxyribonuclease I , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Humans , Lung/metabolismABSTRACT
BACKGROUND: Biologists need to perform complex queries, often across a variety of databases. Typically, each data resource provides an advanced query interface, each of which must be learnt by the biologist before they can begin to query them. Frequently, more than one data source is required and for high-throughput analysis, cutting and pasting results between websites is certainly very time consuming. Therefore, many groups rely on local bioinformatics support to process queries by accessing the resource's programmatic interfaces if they exist. This is not an efficient solution in terms of cost and time. Instead, it would be better if the biologist only had to learn one generic interface. BioMart provides such a solution. RESULTS: BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations. Once these queries have been defined, they may be automated with its "scripting at the click of a button" functionality. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape, Taverna. In this paper, we describe all aspects of BioMart from a user's perspective and demonstrate how it can be used to solve real biological use cases such as SNP selection for candidate gene screening or annotation of microarray results. CONCLUSION: BioMart is an easy to use, generic and scalable system and therefore, has become an integral part of large data resources including Ensembl, UniProt, HapMap, Wormbase, Gramene, Dictybase, PRIDE, MSD and Reactome. BioMart is freely accessible to use at http://www.biomart.org.
Subject(s)
Computational Biology/methods , Database Management Systems , Databases, Genetic , Internet , User-Computer InterfaceABSTRACT
The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to 'non-Ensembl' data sets.