RESUMO
Transcription factors bind to sequence motifs and act as activators or repressors. Transcription factors interface with a constellation of accessory cofactors to regulate distinct mechanistic steps to regulate transcription. We rapidly degraded the essential and ubiquitously expressed transcription factor ZNF143 to determine its function in the transcription cycle. ZNF143 facilitates RNA Polymerase initiation and activates gene expression. ZNF143 binds the promoter of nearly all its activated target genes. ZNF143 also binds near the site of genic transcription initiation to directly repress a subset of genes. Although ZNF143 stimulates initiation at ZNF143-repressed genes (i.e. those that increase expression upon ZNF143 depletion), the molecular context of binding leads to cis repression. ZNF143 competes with other more efficient activators for promoter access, physically occludes transcription initiation sites and promoter-proximal sequence elements, and acts as a molecular roadblock to RNA Polymerases during early elongation. The term context specific is often invoked to describe transcription factors that have both activation and repression functions. We define the context and molecular mechanisms of ZNF143-mediated cis activation and repression.
RESUMO
Common genetic variants in the repressive GATA-family transcription factor (TF) TRPS1 locus are associated with breast cancer risk, and luminal breast cancer cell lines are particularly sensitive to TRPS1 knockout. We introduced an inducible degron tag into the native TRPS1 locus within a luminal breast cancer cell line to identify the direct targets of TRPS1 and determine how TRPS1 mechanistically regulates gene expression. We acutely deplete over 80 percent of TRPS1 from chromatin within 30 minutes of inducing degradation. We find that TRPS1 regulates transcription of hundreds of genes, including those related to estrogen signaling. TRPS1 directly regulates chromatin structure, which causes estrogen receptor alpha (ER) to redistribute in the genome. ER redistribution leads to both repression and activation of dozens of ER target genes. Downstream from these primary effects, TRPS1 depletion represses cell cycle-related gene sets and reduces cell doubling rate. Finally, we show that high TRPS1 activity, calculated using a gene expression signature defined by primary TRPS1-regulated genes, is associated with worse breast cancer patient prognosis. Taken together, these data suggest a model in which TRPS1 modulates the genomic distribution of ER, both activating and repressing transcription of genes related to cancer cell fitness.
Assuntos
Neoplasias da Mama , Cromatina , Dedos , Doenças do Cabelo , Síndrome de Langer-Giedion , Nariz , Feminino , Humanos , Neoplasias da Mama/genética , Cromatina/genética , Receptor alfa de Estrogênio/genética , Dedos/anormalidades , Fatores de Transcrição GATA , Expressão Gênica , Genes cdc , Nariz/anormalidades , Proteínas Repressoras/genéticaRESUMO
The clinical success of combined androgen deprivation therapy (ADT) and radiotherapy (RT) in prostate cancer created interest in understanding the mechanistic links between androgen receptor (AR) signaling and the DNA damage response (DDR). Convergent data have led to a model where AR both regulates, and is regulated by, the DDR. Integral to this model is that the AR regulates the transcription of DDR genes both at a steady state and in response to ionizing radiation (IR). In this study, we sought to determine which immediate transcriptional changes are induced by IR in an AR-dependent manner. Using PRO-seq to quantify changes in nascent RNA transcription in response to IR, the AR antagonist enzalutamide, or the combination of the two, we find that enzalutamide treatment significantly decreased expression of canonical AR target genes but had no effect on DDR gene sets in prostate cancer cells. Surprisingly, we also found that the AR is not a primary regulator of DDR genes either in response to IR or at a steady state in asynchronously growing prostate cancer cells. IMPLICATIONS: Our data indicate that the clinical benefit of combining ADT with RT is not due to direct AR regulation of DDR gene transcription, and that the field needs to consider alternative mechanisms for this clinical benefit.
Assuntos
Neoplasias de Próstata Resistentes à Castração , Neoplasias da Próstata , Masculino , Humanos , Receptores Androgênicos/genética , Receptores Androgênicos/metabolismo , Neoplasias da Próstata/tratamento farmacológico , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Antagonistas de Androgênios/farmacologia , Linhagem Celular Tumoral , Dano ao DNA , Neoplasias de Próstata Resistentes à Castração/genéticaRESUMO
Breast cancer is the most frequently diagnosed cancer in women. The most common subtype is luminal breast cancer, which is typically driven by the estrogen receptor α (ER), a transcription factor (TF) that activates many genes required for proliferation. Multiple effective therapies target this pathway, but individuals often develop resistance. Thus, there is a need to identify additional targets that regulate ER activity and contribute to breast tumor progression. TRPS1 is a repressive GATA-family TF that is overexpressed in breast tumors. Common genetic variants in the TRPS1 locus are associated with breast cancer risk, and luminal breast cancer cell lines are particularly sensitive to TRPS1 knockout. However, we do not know how TRPS1 regulates target genes to mediate these breast cancer patient and cellular outcomes. We introduced an inducible degron tag into the native TRPS1 locus within a luminal breast cancer cell line to identify the direct targets of TRPS1 and determine how TRPS1 mechanistically regulates gene expression. We acutely deplete over eighty percent of TRPS1 from chromatin within 30 minutes of inducing degradation. We find that TRPS1 regulates transcription of hundreds of genes, including those related to estrogen signaling. TRPS1 directly regulates chromatin structure, which causes ER to redistribute in the genome. ER redistribution leads to both repression and activation of dozens of ER target genes. Downstream from these primary effects, TRPS1 depletion represses cell cycle-related gene sets and reduces cell doubling rate. Finally, we show that high TRPS1 activity, calculated using a gene expression signature defined by primary TRPS1-regulated genes, is associated with worse breast cancer patient prognosis. Taken together, these data suggest a model in which TRPS1 modulates the activity of other TFs, both activating and repressing transcription of genes related to cancer cell fitness.
RESUMO
Chromatin accessibility assays have revolutionized the field of transcription regulation by providing single-nucleotide resolution measurements of regulatory features such as promoters and transcription factor binding sites. ATAC-seq directly measures how well the Tn5 transposase accesses chromatinized DNA. Tn5 has a complex sequence bias that is not effectively scaled with traditional bias-correction methods. We model this complex bias using a rule ensemble machine learning approach that integrates information from many input k-mers proximal to the ATAC sequence reads. We effectively characterize and correct single-nucleotide sequence biases and regional sequence biases of the Tn5 enzyme. Correction of enzymatic sequence bias is an important step in interpreting chromatin accessibility assays that aim to infer transcription factor binding and regulatory activity of elements in the genome.
RESUMO
Alleles within the chr19p13.1 locus are associated with increased risk of both ovarian and breast cancer and increased expression of the ANKLE1 gene. ANKLE1 is molecularly characterized as an endonuclease that efficiently cuts branched DNA and shuttles between the nucleus and cytoplasm. However, the role of ANKLE1 in mammalian development and homeostasis remains unknown. In normal development ANKLE1 expression is limited to the erythroblast lineage and we found that ANKLE1's role is to cleave the mitochondrial genome during erythropoiesis. We show that ectopic expression of ANKLE1 in breast epithelial-derived cells leads to genome instability and mitochondrial DNA (mtDNA) cleavage. mtDNA degradation then leads to mitophagy and causes a shift from oxidative phosphorylation to glycolysis (Warburg effect). Moreover, mtDNA degradation activates STAT1 and expression of epithelial-mesenchymal transition (EMT) genes. Reduction in mitochondrial content contributes to apoptosis resistance, which may allow precancerous cells to avoid apoptotic checkpoints and proliferate. These findings provide evidence that ANKLE1 is the causal cancer susceptibility gene in the chr19p13.1 locus and describe mechanisms by which higher ANKLE1 expression promotes cancer risk.
Assuntos
DNA Mitocondrial , Neoplasias , Animais , Mitocôndrias , Núcleo Celular , Apoptose , MamíferosRESUMO
Adipocytes contribute to metabolic disorders such as obesity, diabetes, and atherosclerosis. Prior characterizations of the transcriptional network driving adipogenesis have overlooked transiently acting transcription factors (TFs), genes, and regulatory elements that are essential for proper differentiation. Moreover, traditional gene regulatory networks provide neither mechanistic details about individual regulatory element-gene relationships nor temporal information needed to define a regulatory hierarchy that prioritizes key regulatory factors. To address these shortcomings, we integrate kinetic chromatin accessibility (ATAC-seq) and nascent transcription (PRO-seq) data to generate temporally resolved networks that describe TF binding events and resultant effects on target gene expression. Our data indicate which TF families cooperate with and antagonize each other to regulate adipogenesis. Compartment modeling of RNA polymerase density quantifies how individual TFs mechanistically contribute to distinct steps in transcription. The glucocorticoid receptor activates transcription by inducing RNA polymerase pause release, whereas SP and AP-1 factors affect RNA polymerase initiation. We identify Twist2 as a previously unappreciated effector of adipocyte differentiation. We find that TWIST2 acts as a negative regulator of 3T3-L1 and primary preadipocyte differentiation. We confirm that Twist2 knockout mice have compromised lipid storage within subcutaneous and brown adipose tissue. Previous phenotyping of Twist2 knockout mice and Setleis syndrome Twist2 -/- patients noted deficiencies in subcutaneous adipose tissue. This network inference framework is a powerful and general approach for interpreting complex biological phenomena and can be applied to a wide range of cellular processes.
Assuntos
Adipócitos , Redes Reguladoras de Genes , Proteína 1 Relacionada a Twist , Animais , Camundongos , Linhagem Celular , Adipócitos/citologia , Adipócitos/metabolismo , Fatores de Transcrição/metabolismo , Adipogenia , Transcrição Gênica , Elementos Reguladores de Transcrição , Proteína 1 Relacionada a Twist/metabolismoRESUMO
Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases and not designed for analyzing single-cell data. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. SELMA can utilize internal mitochondrial DNA data to improve bias estimation. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. Furthermore, we show strong effects of intrinsic biases in single-cell ATAC-seq data, and develop the first single-cell ATAC-seq intrinsic bias correction model to improve cell clustering. SELMA can enhance the performance of existing bioinformatics tools and improve the analysis of both bulk and single-cell chromatin accessibility sequencing data.
Assuntos
Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Cromatina/genética , DNA Mitocondrial , Desoxirribonucleases/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Lineares , Análise de Sequência de DNA/métodos , Análise de Célula Única , Fatores de Transcrição/metabolismoRESUMO
Many lncRNAs have been discovered using transcriptomic data; however, it is unclear what fraction of lncRNAs is functional and what structural properties affect their phenotype. MUNC lncRNA (also known as DRReRNA) acts as an enhancer RNA for the Myod1 gene in cis and stimulates the expression of other promyogenic genes in trans by recruiting the cohesin complex. Here, experimental probing of the RNA structure revealed that MUNC contains multiple structural domains not detected by prediction algorithms in the absence of experimental information. We show that these specific and structurally distinct domains are required for induction of promyogenic genes, for binding genomic sites and gene expression regulation, and for binding the cohesin complex. Myod1 induction and cohesin interaction comprise only a subset of MUNC phenotype. Our study reveals unexpectedly complex, structure-driven functions for the MUNC lncRNA and emphasizes the importance of experimentally determined structures for understanding structure-function relationships in lncRNAs.
Assuntos
Desenvolvimento Muscular/genética , RNA Longo não Codificante/metabolismo , Transcrição Gênica , Animais , Sequência de Bases , Diferenciação Celular/genética , Linhagem Celular , Feminino , Genoma , Camundongos , Fibras Musculares Esqueléticas/metabolismo , Conformação de Ácido Nucleico , Fenótipo , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA Longo não Codificante/química , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Deleção de SequênciaRESUMO
Nascent RNA profiling is growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniformly processed output files for downstream analysis and assesses adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report. PEPPRO can be run locally or using a cluster, providing a portable first step for genomic nascent RNA analysis.
Assuntos
RNA/genética , RNA/normas , Software , Éxons/genética , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Íntrons/genética , Células K562 , Controle de QualidadeRESUMO
Heat shock transcription factor 1 (HSF1) orchestrates cellular stress protection by activating or repressing gene transcription in response to protein misfolding, oncogenic cell proliferation, and other environmental stresses. HSF1 is tightly regulated via intramolecular repressive interactions, post-translational modifications, and protein-protein interactions. How these HSF1 regulatory protein interactions are altered in response to acute and chronic stress is largely unknown. To elucidate the profile of HSF1 protein interactions under normal growth and chronic and acutely stressful conditions, quantitative proteomics studies identified interacting proteins in the response to heat shock or in the presence of a poly-glutamine aggregation protein cell-based model of Huntington's disease. These studies identified distinct protein interaction partners of HSF1 as well as changes in the magnitude of shared interactions as a function of each stressful condition. Several novel HSF1-interacting proteins were identified that encompass a wide variety of cellular functions, including roles in DNA repair, mRNA processing, and regulation of RNA polymerase II. One HSF1 partner, CTCF, interacted with HSF1 in a stress-inducible manner and functions in repression of specific HSF1 target genes. Understanding how HSF1 regulates gene repression is a crucial question, given the dysregulation of HSF1 target genes in both cancer and neurodegeneration. These studies expand our understanding of HSF1-mediated gene repression and provide key insights into HSF1 regulation via protein-protein interactions.
Assuntos
Fator de Ligação a CCCTC/metabolismo , Regulação Neoplásica da Expressão Gênica , Fatores de Transcrição de Choque Térmico/metabolismo , Resposta ao Choque Térmico , Doença de Huntington/metabolismo , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Animais , Fator de Ligação a CCCTC/genética , Células HEK293 , Fatores de Transcrição de Choque Térmico/genética , Humanos , Doença de Huntington/genética , Doença de Huntington/patologia , Camundongos , Camundongos Knockout , Proteínas de Neoplasias/genética , Neoplasias/genética , Neoplasias/patologia , Mapas de Interação de ProteínasRESUMO
Inducible degron systems are widely used to specifically and rapidly deplete proteins of interest in cell lines and organisms. An advantage of inducible degradation is that the biological system under study remains intact and functional until perturbation, a feature that necessitates that the endogenous levels of the protein are maintained. However, endogenous tagging of genes with auxin-inducible degrons (AID) can result in chronic, auxin-independent proteasome-mediated degradation. The ARF-AID (auxin-response factor-auxin-inducible degron) system is a re-engineered auxin-inducible protein degradation system. The additional expression of the ARF-PB1 domain prevents chronic, auxin-independent degradation of AID-tagged proteins while preserving rapid auxin-induced degradation of tagged proteins. Here, we describe the protocol for engineering human cell lines to implement the ARF-AID system for specific and inducible protein degradation. These methods are adaptable and can be extended from cell lines to organisms. © 2020 The Authors. Basic Protocol 1: Generation of ARF-P2A-TIR1 progenitor cells Basic Protocol 2: Designing, cloning, and testing of a gene-specific sgRNA Basic Protocol 3: Design and amplification of a homology-directed repair construct (C-terminal tagging) Alternate Protocol 1: Design and amplification of a homology-directed repair construct (N-terminal tagging) Basic Protocol 4: Tagging of a gene of interest with AID Alternate Protocol 2: Establishment of an ARF-AID clamp system Basic Protocol 5: Testing of auxin-mediated degradation of the AID-tagged protein.
Assuntos
Citoplasma/metabolismo , Proteínas/análise , Proteólise , Células HEK293 , HumanosRESUMO
SUMMARY: Nascent transcript measurements derived from run-on sequencing experiments are critical for the investigation of transcriptional mechanisms and regulatory networks. However, conventional mRNA gene annotations significantly differ from the boundaries of primary transcripts. New primary transcript annotations are needed to accurately interpret run-on data. We developed the primaryTranscriptAnnotation R package to infer the transcriptional start and termination sites of primary transcripts from genomic run-on data. We then used these inferred coordinates to annotate transcriptional units identified de novo. This package provides the novel utility to integrate data-driven primary transcript annotations with transcriptional unit coordinates identified in an unbiased manner. Highlighting the importance of using accurate primary transcript coordinates, we demonstrate that this new methodology increases the detection of differentially expressed transcripts and provides more accurate quantification of RNA polymerase pause indices. AVAILABILITY AND IMPLEMENTATION: https://github.com/WarrenDavidAnderson/genomicsRpackage/tree/master/primaryTranscriptAnnotation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma , Genômica , Anotação de Sequência Molecular , RNA Mensageiro/genéticaRESUMO
Rapid perturbation of protein function permits the ability to define primary molecular responses while avoiding downstream cumulative effects of protein dysregulation. The auxin-inducible degron (AID) system was developed as a tool to achieve rapid and inducible protein degradation in nonplant systems. However, tagging proteins at their endogenous loci results in chronic auxin-independent degradation by the proteasome. To correct this deficiency, we expressed the auxin response transcription factor (ARF) in an improved inducible degron system. ARF is absent from previously engineered AID systems but is a critical component of native auxin signaling. In plants, ARF directly interacts with AID in the absence of auxin, and we found that expression of the ARF PB1 (Phox and Bem1) domain suppresses constitutive degradation of AID-tagged proteins. Moreover, the rate of auxin-induced AID degradation is substantially faster in the ARF-AID system. To test the ARF-AID system in a quantitative and sensitive manner, we measured genome-wide changes in nascent transcription after rapidly depleting the ZNF143 transcription factor. Transcriptional profiling indicates that ZNF143 activates transcription in cis and regulates promoter-proximal paused RNA polymerase density. Rapidly inducible degradation systems that preserve the target protein's native expression levels and patterns will revolutionize the study of biological systems by enabling specific and temporally defined protein dysregulation.
Assuntos
Técnicas Genéticas , Proteínas/metabolismo , Proteólise , Linhagem Celular , Inibidores de Cisteína Proteinase/farmacologia , Regulação da Expressão Gênica/efeitos dos fármacos , Células HEK293 , Humanos , Ácidos Indolacéticos/farmacologia , Leupeptinas/farmacologia , Células MCF-7 , Complexo de Endopeptidases do Proteassoma/metabolismo , Proteólise/efeitos dos fármacos , Transativadores/genética , Transativadores/metabolismoRESUMO
Heat shock factor 1 is the master transcriptional regulator of molecular chaperones and binds to the same cis-acting heat shock element (HSE) across the eukaryotic lineage. In budding yeast, Hsf1 drives the transcription of â¼20 genes essential to maintain proteostasis under basal conditions, yet its specific targets and extent of inducible binding during heat shock remain unclear. Here we combine Hsf1 chromatin immunoprecipitation sequencing (seq), nascent RNA-seq, and Hsf1 nuclear depletion to quantify Hsf1 binding and transcription across the yeast genome. We find that Hsf1 binds 74 loci during acute heat shock, and these are linked to 46 genes with strong Hsf1-dependent expression. Notably, Hsf1's induced DNA binding leads to a disproportionate (â¼7.5-fold) increase in nascent transcription. Promoters with high basal Hsf1 occupancy have nucleosome-depleted regions due to the presence of "pioneer factors." These accessible sites are likely critical for Hsf1 occupancy as the activator is incapable of binding HSEs within a stably positioned, reconstituted nucleosome. In response to heat shock, however, Hsf1 accesses nucleosomal sites and promotes chromatin disassembly in concert with the Remodels Structure of Chromatin (RSC) complex. Our data suggest that the interplay between nucleosome positioning, HSE strength, and active Hsf1 levels allows cells to precisely tune expression of the proteostasis network.
Assuntos
Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/fisiologia , Fatores de Transcrição de Choque Térmico/metabolismo , Proteínas de Choque Térmico/genética , Proteínas de Choque Térmico/fisiologia , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiologia , Fatores de Transcrição/genética , Fatores de Transcrição/fisiologia , Cromatina/genética , Montagem e Desmontagem da Cromatina/genética , Montagem e Desmontagem da Cromatina/fisiologia , Proteínas de Ligação a DNA/metabolismo , Regulação Fúngica da Expressão Gênica/genética , Fatores de Transcrição de Choque Térmico/genética , Proteínas de Choque Térmico/metabolismo , Resposta ao Choque Térmico/genética , Chaperonas Moleculares/metabolismo , Nucleossomos/metabolismo , Regiões Promotoras Genéticas/genética , Sequências Reguladoras de Ácido Nucleico , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/metabolismoRESUMO
Although aneuploidy is found in the majority of tumors, the degree of aneuploidy varies widely. It is unclear how cancer cells become aneuploid or how highly aneuploid tumors are different from those of more normal ploidy. We developed a simple computational method that measures the degree of aneuploidy or structural rearrangements of large chromosome regions of 522 human breast tumors from The Cancer Genome Atlas (TCGA). Highly aneuploid tumors overexpress activators of mitotic transcription and the genes encoding proteins that segregate chromosomes. Overexpression of three mitotic transcriptional regulators, E2F1, MYBL2, and FOXM1, is sufficient to increase the rate of lagging anaphase chromosomes in a non-transformed vertebrate tissue, demonstrating that this event can initiate aneuploidy. Highly aneuploid human breast tumors are also enriched in TP53 mutations. TP53 mutations co-associate with the overexpression of mitotic transcriptional activators, suggesting that these events work together to provide fitness to breast tumors.
Assuntos
Aneuploidia , Neoplasias da Mama/genética , Anáfase/genética , Animais , Neoplasias da Mama/patologia , Instabilidade Cromossômica , Cromossomos Humanos/genética , Embrião não Mamífero/metabolismo , Feminino , Frequência do Gene/genética , Humanos , Mitose/genética , Modelos Genéticos , Mutação/genética , Fenótipo , Fatores de Transcrição/metabolismo , Transcrição Gênica , Xenopus/embriologiaRESUMO
Summary: Identification of functional transcription factors that regulate a given gene set is an important problem in gene regulation studies. Conventional approaches for identifying transcription factors, such as DNA sequence motif analysis, are unable to predict functional binding of specific factors and not sensitive enough to detect factors binding at distal enhancers. Here, we present binding analysis for regulation of transcription (BART), a novel computational method and software package for predicting functional transcription factors that regulate a query gene set or associate with a query genomic profile, based on more than 6000 existing ChIP-seq datasets for over 400 factors in human or mouse. This method demonstrates the advantage of utilizing publicly available data for functional genomics research. Availability and implementation: BART is implemented in Python and available at http://faculty.virginia.edu/zanglab/bart. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Epigenômica , Software , Fatores de Transcrição/análise , Animais , Bases de Dados Genéticas , Regulação da Expressão Gênica , Humanos , Camundongos , Análise de Sequência de DNA/métodos , Fatores de Transcrição/genéticaRESUMO
A key challenge in quantitative ChIP combined with high-throughput sequencing (ChIP-seq) is the normalization of data in the presence of genome-wide changes in occupancy. Analysis-based normalization methods were developed for transcriptomic data and these are dependent on the underlying assumption that total transcription does not change between conditions. For genome-wide changes in transcription factor (TF) binding, these assumptions do not hold true. The challenges in normalization are confounded by experimental variability during sample preparation, processing and recovery. We present a novel normalization strategy utilizing an internal standard of unchanged peaks for reference. Our method can be readily applied to monitor genome-wide changes by ChIP-seq that are otherwise lost or misrepresented through analytical normalization. We compare our approach to normalization by total read depth and two alternative methods that utilize external experimental controls to study TF binding. We successfully resolve the key challenges in quantitative ChIP-seq analysis and demonstrate its application by monitoring the loss of Estrogen Receptor-alpha (ER) binding upon fulvestrant treatment, ER binding in response to estrodiol, ER mediated change in H4K12 acetylation and profiling ER binding in patient-derived xenographs. This is supported by an adaptable pipeline to normalize and quantify differential TF binding genome-wide and generate metrics for differential binding at individual sites.
Assuntos
Imunoprecipitação da Cromatina/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/normas , Animais , Anticorpos , Fator de Ligação a CCCTC/imunologia , Drosophila melanogaster/genética , Receptor alfa de Estrogênio/imunologia , Receptor alfa de Estrogênio/metabolismo , Histonas/imunologia , Histonas/metabolismo , Humanos , Células MCF-7 , Camundongos , Padrões de ReferênciaRESUMO
Coupling molecular biology to high-throughput sequencing has revolutionized the study of biology. Molecular genomics techniques are continually refined to provide higher resolution mapping of nucleic acid interactions and structure. Sequence preferences of enzymes can interfere with the accurate interpretation of these data. We developed seqOutBias to characterize enzymatic sequence bias from experimental data and scale individual sequence reads to correct intrinsic enzymatic sequence biases. SeqOutBias efficiently corrects DNase-seq, TACh-seq, ATAC-seq, MNase-seq and PRO-seq data. We show that seqOutBias correction facilitates identification of true molecular signatures resulting from transcription factors and RNA polymerase interacting with DNA.
Assuntos
Algoritmos , Biologia Computacional/métodos , DNA/metabolismo , Desoxirribonucleases/metabolismo , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Viés , DNA/química , DNA/genética , RNA Polimerases Dirigidas por DNA/genética , RNA Polimerases Dirigidas por DNA/metabolismo , Desoxirribonucleases/genética , Ligação Proteica , Reprodutibilidade dos Testes , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Genome-wide association studies (GWAS) have discovered thousands loci associated with disease risk and quantitative traits, yet most of the variants responsible for risk remain uncharacterized. The majority of GWAS-identified loci are enriched for non-coding single-nucleotide polymorphisms (SNPs) and defining the molecular mechanism of risk is challenging. Many non-coding causal SNPs are hypothesized to alter transcription factor (TF) binding sites as the mechanism by which they affect organismal phenotypes. We employed an integrative genomics approach to identify candidate TF binding motifs that confer breast cancer-specific phenotypes identified by GWAS. We performed de novo motif analysis of regulatory elements, analyzed evolutionary conservation of identified motifs, and assayed TF footprinting data to identify sequence elements that recruit TFs and maintain chromatin landscape in breast cancer-relevant tissue and cell lines. We identified candidate causal SNPs that are predicted to alter TF binding within breast cancer-relevant regulatory regions that are in strong linkage disequilibrium with significantly associated GWAS SNPs. We confirm that the TFs bind with predicted allele-specific preferences using CTCF ChIP-seq data. We used The Cancer Genome Atlas breast cancer patient data to identify ANKLE1 and ZNF404 as the target genes of candidate TF binding site SNPs in the 19p13.11 and 19q13.31 GWAS-identified loci. These SNPs are associated with the expression of ZNF404 and ANKLE1 in breast tissue. This integrative analysis pipeline is a general framework to identify candidate causal variants within regulatory regions and TF binding sites that confer phenotypic variation and disease risk.