RESUMO
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.
Assuntos
Montagem e Desmontagem da Cromatina , Genoma Humano , Fatores de Transcrição/metabolismo , Composição de Bases , Sítios de Ligação/genética , Linhagem Celular , Imunoprecipitação da Cromatina , Análise por Conglomerados , Biologia Computacional/métodos , Desoxirribonuclease I/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Anotação de Sequência Molecular , Nucleossomos/genética , Nucleossomos/metabolismo , Motivos de Nucleotídeos , Especificidade de Órgãos/genética , Ligação Proteica/genéticaRESUMO
The Encyclopedia of DNA Elements (ENCODE) consortium aims to identify all functional elements in the human genome including transcripts, transcriptional regulatory regions, along with their chromatin states and DNA methylation patterns. The ENCODE project generates data utilizing a variety of techniques that can enrich for regulatory regions, such as chromatin immunoprecipitation (ChIP), micrococcal nuclease (MNase) digestion and DNase I digestion, followed by deeply sequencing the resulting DNA. As part of the ENCODE project, we have developed a Web-accessible repository accessible at http://factorbook.org. In Wiki format, factorbook is a transcription factor (TF)-centric repository of all ENCODE ChIP-seq datasets on TF-binding regions, as well as the rich analysis results of these data. In the first release, factorbook contains 457 ChIP-seq datasets on 119 TFs in a number of human cell lines, the average profiles of histone modifications and nucleosome positioning around the TF-binding regions, sequence motifs enriched in the regions and the distance and orientation preferences between motif sites.
Assuntos
Bases de Dados Genéticas , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Linhagem Celular , Imunoprecipitação da Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Histonas , Humanos , Internet , Nucleossomos/metabolismo , Motivos de Nucleotídeos , Análise de Sequência de DNARESUMO
BACKGROUND: Previous work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by different protocols from different cellular compartments of several human cell lines. ENCODE also generated the genome-wide mapping of eleven histone marks, one histone variant, and DNase I hypersensitivity sites in seven cell lines. RESULTS: We built a novel quantitative model to study the relationship between chromatin features and expression levels. Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels. We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy. We also found that expression levels measured by CAGE are better predicted than by RNA-PET or RNA-Seq, and different categories of chromatin features are the most predictive of expression for different RNA measurement methods. Additionally, PolyA+ RNA is overall more predictable than PolyA- RNA among different cell compartments, and PolyA+ cytosolic RNA measured with RNA-Seq is more predictable than PolyA+ nuclear RNA, while the opposite is true for PolyA- RNA. CONCLUSIONS: Our study provides new insights into transcriptional regulation by analyzing chromatin features in different cellular contexts.
Assuntos
Cromatina/química , Genoma Humano , Modelos Genéticos , Modelos Estatísticos , Transcrição Gênica , Cromatina/metabolismo , Histonas/genética , Histonas/metabolismo , Humanos , Especificidade de Órgãos , Poli A/metabolismo , RNA Mensageiro/química , RNA Mensageiro/metabolismoRESUMO
Promoter methylation analysis of genes frequently silenced in breast cancer is a promising indicator of breast cancer risk, as these methylation events are thought to occur long before presentation of disease. The numerous exfoliated epithelial cells present in breast milk may provide the breast epithelial DNA needed for detailed methylation analysis and assessment of breast cancer risk. Fresh breast milk samples and health, lifestyle, and reproductive history questionnaires were collected from 111 women. Pyrosequencing analysis was conducted on DNA isolated from the exfoliated epithelial cells immunomagnetically separated from the total cell population in the breast milk of 102 women. A total of 65 CpG sites were examined in six tumor suppressor genes: PYCARD (also known as ASC or TMS1), CDH1, GSTP1, RBP1 (also known as CRBP1), SFRP1, and RASSF1. A sufficient quantity of DNA was obtained for meaningful analysis of promoter methylation; women donated an average of 86 ml of milk with a mean yield of 32,700 epithelial cells per ml. Methylation scores were in general low as expected of benign tissue, but analysis of outlier methylation scores revealed a significant relationship between breast cancer risk, as indicated by previous biopsy, and methylation score for several CpG sites in CDH1, GSTP1, SFRP1, and RBP1. Methylation of RASSF1 was positively correlated with women's age irrespective of her reproductive history. Promoter methylation patterns in DNA from breast milk epithelial cells can likely be used to assess breast cancer risk. Additional studies of women at high breast cancer risk are warranted.