RESUMEN
In experiments with significant perturbations to transcription, nascent RNA sequencing protocols are dependent on external spike-ins for reliable normalization. Unlike in RNA-seq, these spike-ins are not standardized and, in many cases, depend on a run-on reaction that is assumed to have constant efficiency across samples. To assess the validity of this assumption, we analyze a large number of published nascent RNA spike-ins to quantify their variability across existing normalization methods. Furthermore, we develop a new biologically-informed Bayesian model to estimate the error in spike-in based normalization estimates, which we term Virtual Spike-In (VSI). We apply this method both to published external spike-ins as well as using reads at the [Formula: see text] end of long genes, building on prior work from Mahat (Mol Cell 62(1):63-78, 2016. https://doi.org/10.1016/j.molcel.2016.02.025 ) and Vihervaara (Nat Commun 8(1):255, 2017. https://doi.org/10.1038/s41467-017-00151-0 ). We find that spike-ins in existing nascent RNA experiments are typically under sequenced, with high variability between samples. Furthermore, we show that these high variability estimates can have significant downstream effects on analysis, complicating biological interpretations of results.
Asunto(s)
ARN , ARN/genética , Teorema de Bayes , Análisis de Secuencia de ARN , RNA-SeqRESUMEN
Gene transcription is controlled and modulated by regulatory regions, including enhancers and promoters. These regions are abundant in unstable, non-coding bidirectional transcription. Using nascent RNA transcription data across hundreds of human samples, we identified over 800,000 regions containing bidirectional transcription. We then identify highly correlated transcription between bidirectional and gene regions. The identified correlated pairs, a bidirectional region and a gene, are enriched for disease associated SNPs and often supported by independent 3D data. We present these resources as an SQL database which serves as a resource for future studies into gene regulation, enhancer associated RNAs, and transcription factors.
RESUMEN
Detecting changes in the activity of a transcription factor (TF) in response to a perturbation provides insights into the underlying cellular process. Transcription Factor Enrichment Analysis (TFEA) is a robust and reliable computational method that detects positional motif enrichment associated with changes in transcription observed in response to a perturbation. TFEA detects positional motif enrichment within a list of ranked regions of interest (ROIs), typically sites of RNA polymerase initiation inferred from regulatory data such as nascent transcription. Therefore, we also introduce muMerge, a statistically principled method of generating a consensus list of ROIs from multiple replicates and conditions. TFEA is broadly applicable to data that informs on transcriptional regulation including nascent transcription (eg. PRO-Seq), CAGE, histone ChIP-Seq, and accessibility data (e.g., ATAC-Seq). TFEA not only identifies the key regulators responding to a perturbation, but also temporally unravels regulatory networks with time series data. Consequently, TFEA serves as a hypothesis-generating tool that provides an easy, rigorous, and cost-effective means to broadly assess TF activity yielding new biological insights.
Asunto(s)
Factores de Transcripción/metabolismo , Mama/citología , Mama/metabolismo , Línea Celular , Secuenciación de Inmunoprecipitación de Cromatina/estadística & datos numéricos , Biología Computacional/métodos , Simulación por Computador , Dexametasona/farmacología , Células Epiteliales/metabolismo , Femenino , Regulación de la Expresión Génica , Técnicas Genéticas/estadística & datos numéricos , Células HCT116 , Humanos , Imidazoles/farmacología , Piperazinas/farmacología , Receptores de Glucocorticoides/efectos de los fármacos , Receptores de Glucocorticoides/metabolismo , Factores de Transcripción/genética , Transcripción Genética , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismoRESUMEN
CDK7 associates with the 10-subunit TFIIH complex and regulates transcription by phosphorylating the C-terminal domain (CTD) of RNA polymerase II (RNAPII). Few additional CDK7 substrates are known. Here, using the covalent inhibitor SY-351 and quantitative phosphoproteomics, we identified CDK7 kinase substrates in human cells. Among hundreds of high-confidence targets, the vast majority are unique to CDK7 (i.e., distinct from other transcription-associated kinases), with a subset that suggest novel cellular functions. Transcription-associated factors were predominant CDK7 substrates, including SF3B1, U2AF2, and other splicing components. Accordingly, widespread and diverse splicing defects, such as alternative exon inclusion and intron retention, were characterized in CDK7-inhibited cells. Combined with biochemical assays, we establish that CDK7 directly activates other transcription-associated kinases CDK9, CDK12, and CDK13, invoking a "master regulator" role in transcription. We further demonstrate that TFIIH restricts CDK7 kinase function to the RNAPII CTD, whereas other substrates (e.g., SPT5 and SF3B1) are phosphorylated by the three-subunit CDK-activating kinase (CAK; CCNH, MAT1, and CDK7). These results suggest new models for CDK7 function in transcription and implicate CAK dissociation from TFIIH as essential for kinase activation. This straightforward regulatory strategy ensures CDK7 activation is spatially and temporally linked to transcription, and may apply toward other transcription-associated kinases.
Asunto(s)
Quinasas Ciclina-Dependientes/metabolismo , Modelos Biológicos , Factor de Transcripción TFIIH/metabolismo , Transcripción Genética/genética , Empalme Alternativo/genética , Supervivencia Celular/efectos de los fármacos , Quinasas Ciclina-Dependientes/antagonistas & inhibidores , Quinasas Ciclina-Dependientes/genética , Activación Enzimática/genética , Células HL-60 , Humanos , Quinasa Activadora de Quinasas Ciclina-DependientesRESUMEN
RNA polymerase II (RNAPII) transcription is governed by the pre-initiation complex (PIC), which contains TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, RNAPII, and Mediator. After initiation, RNAPII enzymes pause after transcribing less than 100 bases; precisely how RNAPII pausing is enforced and regulated remains unclear. To address specific mechanistic questions, we reconstituted human RNAPII promoter-proximal pausing in vitro, entirely with purified factors (no extracts). As expected, NELF and DSIF increased pausing, and P-TEFb promoted pause release. Unexpectedly, the PIC alone was sufficient to reconstitute pausing, suggesting RNAPII pausing is an inherent PIC function. In agreement, pausing was lost upon replacement of the TFIID complex with TATA-binding protein (TBP), and PRO-seq experiments revealed widespread disruption of RNAPII pausing upon acute depletion (t = 60 min) of TFIID subunits in human or Drosophila cells. These results establish a TFIID requirement for RNAPII pausing and suggest pause regulatory factors may function directly or indirectly through TFIID.