RESUMO
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
Assuntos
DNA/genética , Enciclopédias como Assunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Alelos , Linhagem Celular , Fator de Transcrição GATA1/metabolismo , Perfilação da Expressão Gênica , Genômica , Humanos , Células K562 , Especificidade de Órgãos , Fosforilação/genética , Polimorfismo de Nucleotídeo Único/genética , Mapas de Interação de Proteínas , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Seleção Genética/genética , Sítio de Iniciação de TranscriçãoRESUMO
There has been a recent surge in the use of genome-wide methodologies to identify and annotate the transcriptional regulatory elements in the human genome. Here we review some of these methodologies and the conceptual insights about transcription regulation that have been gained from the use of genome-wide studies. It has become clear that the binding of transcription factors is itself a highly regulated process, and binding does not always appear to have functional consequences. Numerous properties have now been associated with regulatory elements that may be useful in their identification. Several aspects of enhancer function have been shown to be more widespread than was previously appreciated, including the highly combinatorial nature of transcription factor binding, the postinitiation regulation of many target genes, and the binding of enhancers at early stages to maintain their competence during development. Going forward, the integration of multiple genome-wide data sets should become a standard approach to elucidate higher-order regulatory interactions.
Assuntos
Elementos Facilitadores Genéticos , Genoma Humano , Fatores de Transcrição/fisiologia , Animais , Sequência de Bases , Imunoprecipitação da Cromatina , Sequência Conservada , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Ligação Proteica , Análise de Sequência de DNA , Fatores de Transcrição/metabolismoRESUMO
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
Assuntos
Imunoprecipitação da Cromatina/métodos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Genoma/genética , Genômica/métodos , Guias como Assunto , Histonas/metabolismo , Humanos , Internet , Fatores de Transcrição/metabolismoRESUMO
Regulation of gene expression at the transcriptional level is achieved by complex interactions of transcription factors operating at their target genes. Dissecting the specific combination of factors that bind each target is a significant challenge. Here, we describe in detail the Allele Binding Cooperativity test, which uses variation in transcription factor binding among individuals to discover combinations of factors and their targets. We developed the ALPHABIT (a large-scale process to hunt for allele binding interacting transcription factors) pipeline, which includes statistical analysis of binding sites followed by experimental validation, and demonstrate that this method predicts transcription factors that associate with NFκB. Our method successfully identifies factors that have been known to work with NFκB (E2A, STAT1, IRF2), but whose global coassociation and sites of cooperative action were not known. In addition, we identify a unique coassociation (EBF1) that had not been reported previously. We present a general approach for discovering combinatorial models of regulation and advance our understanding of the genetic basis of variation in transcription factor binding.
Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição/metabolismo , Alelos , Sítios de Ligação , Imunoprecipitação da Cromatina , Humanos , NF-kappa B/metabolismo , Ligação Proteica/genética , Sequências Reguladoras de Ácido Nucleico/genética , SoftwareRESUMO
Small noncoding regulatory RNAs (sRNAs) play a key role in the posttranscriptional regulation of many bacterial genes. The genome of Caulobacter crescentus encodes at least 31 sRNAs, and 27 of these sRNAs are of unknown function. An overexpression screen for sRNA-induced growth inhibition along with sequence conservation in a related Caulobacter species led to the identification of a novel sRNA, CrfA, that is specifically induced upon carbon starvation. Twenty-seven genes were found to be strongly activated by CrfA accumulation. One-third of these target genes encode putative TonB-dependent receptors, suggesting CrfA plays a role in the surface modification of C. crescentus, facilitating the uptake of nutrients during periods of carbon starvation. The mechanism of CrfA-mediated gene activation was investigated for one of the genes predicted to encode a TonB-dependent receptor, CC3461. CrfA functions to stabilize the CC3461 transcript. Complementarity between a region of CrfA and the terminal region of the CC3461 5'-untranslated region (5'-UTR) and also the behavior of a deletion of this region and a site-specific base substitution and a 3-base deletion in the CrfA complementary sequence suggest that CrfA binds to a stem-loop structure upstream of the CC3461 Shine-Dalgarno sequence and stabilizes the transcript.
Assuntos
Caulobacter crescentus/metabolismo , RNA Bacteriano/metabolismo , RNA não Traduzido/metabolismo , Regiões 5' não Traduzidas/genética , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sequência de Bases , Northern Blotting , Carbono/metabolismo , Caulobacter crescentus/genética , Regulação Bacteriana da Expressão Gênica/genética , Regulação Bacteriana da Expressão Gênica/fisiologia , Dados de Sequência Molecular , Técnicas de Amplificação de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase , Regiões Promotoras Genéticas/genética , RNA Bacteriano/genética , RNA não Traduzido/genética , Homologia de Sequência de AminoácidosRESUMO
Small non-coding RNAs (sRNAs) are active in many bacterial cell functions, including regulation of the cell's response to environmental challenges. We describe the identification of 27 novel Caulobacter crescentus sRNAs by analysis of RNA expression levels assayed using a tiled Caulobacter microarray and a protocol optimized for detection of sRNAs. The principal analysis method involved identification of sets of adjacent probes with unusually high correlation between the individual intergenic probes within the set, suggesting presence of a sRNA. Among the validated sRNAs, two are candidate transposase gene antisense RNAs. The expression of 10 of the sRNAs is regulated by either entry into stationary phase, carbon starvation, or rich versus minimal media. The expression of four of the novel sRNAs changes as the cell cycle progresses. One of these shares a promoter motif with several genes expressed at the swarmer-to-stalked cell transition; while another appears to be controlled by the CtrA global transcriptional regulator. The probe correlation analysis approach reported here is of general use for large-scale sRNA identification for any sequenced microbial genome.
Assuntos
Caulobacter crescentus/genética , Regulação Bacteriana da Expressão Gênica , RNA Bacteriano/isolamento & purificação , RNA não Traduzido/isolamento & purificação , Sequência de Bases , Genoma Bacteriano , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos , RNA Antissenso/isolamento & purificação , RNA Antissenso/metabolismo , RNA Bacteriano/análise , RNA Bacteriano/metabolismo , RNA não Traduzido/análise , RNA não Traduzido/metabolismo , Transcrição GênicaRESUMO
DNA in a single-stranded form (ssDNA) exists transiently within the cell and comprises the telomeres of linear chromosomes and the genomes of some DNA viruses. As with RNA, in the single-stranded state, some DNA sequences are able to fold into complex secondary and tertiary structures that may be recognized by proteins and participate in gene regulation. To better understand how such DNA elements might fold and interact with proteins, and to compare recognition features to those of a structured RNA, we used in vitro selection to identify ssDNAs that bind an RNA-binding peptide from the HIV Rev protein with high affinity and specificity. The large majority of selected binders contain a non-Watson-Crick G.T base-pair and an adjacent C:G base-pair and both are essential for binding. This GT motif can be presented in different DNA contexts, including a nearly perfect duplex and a branched three-helix structure, and appears to be recognized in large part by arginine residues separated by one turn of an alpha-helix. Interestingly, a very similar GT motif is necessary also for protein binding and function of a well-characterized model ssDNA regulatory element from the proenkephalin promoter.
Assuntos
DNA de Cadeia Simples/química , DNA/química , Conformação de Ácido Nucleico , Motivos de Aminoácidos , Arginina/química , Sequência de Bases , Dicroísmo Circular , Relação Dose-Resposta a Droga , Encefalinas/genética , Produtos do Gene rev/química , Substâncias Macromoleculares , Dados de Sequência Molecular , Peptídeos/química , Regiões Promotoras Genéticas , Ligação Proteica , Precursores de Proteínas/genética , Estrutura Secundária de Proteína , RNA/química , Homologia de Sequência do Ácido NucleicoRESUMO
Whole-genome tiling arrays are powerful tools for detecting and characterizing novel RNA transcripts. Here, we describe a complete method combining elements of molecular and computational biology to identify small noncoding RNA (sRNA) transcripts. We focus on the key features of this approach, which include size-fractionation of input RNA, direct detection of array hybridization with antibodies that recognize RNA:DNA hybrids, and correlation-based computational methods for automated sRNA identification and boundary determination.
Assuntos
Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , RNA não Traduzido/análise , RNA não Traduzido/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , RNA Mensageiro/isolamento & purificação , RNA não Traduzido/isolamento & purificaçãoRESUMO
We have developed a mammalian cell-based screening platform to identify proteins that assemble into RNA-protein complexes. Based on Tat-mediated activation of the HIV LTR, proteins that interact with an RNA target elicit expression of a GFP reporter and are captured by fluorescence activated cell sorting. This "Tat-hybrid" screening platform was used to identify proteins that interact with the Mason Pfizer monkey virus (MPMV) constitutive transport element (CTE), a structured RNA hairpin that mediates the transport of unspliced viral mRNAs from the nucleus to the cytoplasm. Several hnRNP-like proteins, including hnRNP A1, were identified and shown to interact with the CTE with selectivity in the reporter system comparable to Tap, a known CTE-binding protein. In vitro gel shift and pull-down assays showed that hnRNP A1 is able to form a complex with the CTE and Tap and that the RGG domain of hnRNP A1 mediates binding to Tap. These results suggest that hnRNP-like proteins may be part of larger export-competent RNA-protein complexes and that the RGG domains of these proteins play an important role in directing these binding events. The results also demonstrate the utility of the screening platform for identifying and characterizing new components of RNA-protein complexes.
Assuntos
Bioquímica/métodos , RNA/metabolismo , Separação Celular , Mapeamento Cromossômico/métodos , Códon , Citoplasma/metabolismo , Metilação de DNA , DNA Complementar/metabolismo , Citometria de Fluxo , Biblioteca Gênica , Glutationa Transferase/metabolismo , Proteínas de Fluorescência Verde/metabolismo , Repetição Terminal Longa de HIV , Células HeLa , Ribonucleoproteína Nuclear Heterogênea A1 , Ribonucleoproteínas Nucleares Heterogêneas Grupo A-B/química , Humanos , Plasmídeos/metabolismo , Ligação Proteica , RNA Mensageiro/metabolismoRESUMO
To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Camundongos/genética , Anotação de Sequência Molecular , Animais , Genoma , Genoma Humano , Humanos , InternetRESUMO
The local geometry of a DNA helix can influence protein recognition, but the sequence-specific features that contribute to helix structure are not fully understood, and even less is known about how RNA helix geometry may affect protein recognition. To begin to understand how local or global helix structure may influence binding in an RNA model system, we generated a series of DNA analogues of HIV and BIV TAR RNAs in which ribose sugars were systematically substituted in and around the known binding sites for argininamide and a BIV Tat arginine-rich peptide, respectively, and measured their corresponding binding affinities. For each TAR interaction, binding occurs in the RNA major groove with high specificity, whereas binding to the all-DNA analogue is weak and nonspecific. Relatively few substitutions are needed to convert either DNA analogue of TAR into a high-affinity binder, with the ribose requirements being restricted largely to regions that directly contact the ligand. Substitutions at individual positions show up to 70-fold differences in binding affinity, even at adjacent base pairs, while two base pairs at the core of the BIV Tat peptide-RNA interface are largely unaffected by deoxyribose substitution. These results suggest that the helix geometries and unique conformational features required for binding are established locally and are relatively insulated from effects more than one base pair away. It seems plausible that arginine-rich peptides are able to adapt to a mosaic helical architecture in which segments as small as single base steps may be considered as modular recognition units.