Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 121(4): e2310854121, 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38241433

RESUMEN

Noncoding mutation hotspots have been identified in melanoma and many of them occur at the binding sites of E26 transformation-specific (ETS) proteins; however, their formation mechanism and functional impacts are not fully understood. Here, we used UV (Ultraviolet) damage sequencing data and analyzed cyclobutane pyrimidine dimer (CPD) formation, DNA repair, and CPD deamination in human cells at single-nucleotide resolution. Our data show prominent CPD hotspots immediately after UV irradiation at ETS binding sites, particularly at sites with a conserved TTCCGG motif, which correlate with mutation hotspots identified in cutaneous melanoma. Additionally, CPDs are repaired slower at ETS binding sites than in flanking DNA. Cytosine deamination in CPDs to uracil is suggested as an important step for UV mutagenesis. However, we found that CPD deamination is significantly suppressed at ETS binding sites, particularly for the CPD hotspot on the 5' side of the ETS motif, arguing against a role for CPD deamination in promoting ETS-associated UV mutations. Finally, we analyzed a subset of frequently mutated promoters, including the ribosomal protein genes RPL13A and RPS20, and found that mutations in the ETS motif can significantly reduce the promoter activity. Thus, our data identify high UV damage and low repair, but not CPD deamination, as the main mechanism for ETS-associated mutations in melanoma and uncover important roles of often-overlooked mutation hotspots in perturbing gene transcription.


Asunto(s)
Melanoma , Neoplasias Cutáneas , Humanos , Melanoma/genética , Citosina , Desaminación , Neoplasias Cutáneas/genética , Mutación , Dímeros de Pirimidina , Sitios de Unión , Rayos Ultravioleta , Daño del ADN , Reparación del ADN/genética
2.
Nature ; 587(7833): 291-296, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33087930

RESUMEN

Transcription factors recognize specific genomic sequences to regulate complex gene-expression programs. Although it is well-established that transcription factors bind to specific DNA sequences using a combination of base readout and shape recognition, some fundamental aspects of protein-DNA binding remain poorly understood1,2. Many DNA-binding proteins induce changes in the structure of the DNA outside the intrinsic B-DNA envelope. However, how the energetic cost that is associated with distorting the DNA contributes to recognition has proven difficult to study, because the distorted DNA exists in low abundance in the unbound ensemble3-9. Here we use a high-throughput assay that we term SaMBA (saturation mismatch-binding assay) to investigate the role of DNA conformational penalties in transcription factor-DNA recognition. In SaMBA, mismatched base pairs are introduced to pre-induce structural distortions in the DNA that are much larger than those induced by changes in the Watson-Crick sequence. Notably, approximately 10% of mismatches increased transcription factor binding, and for each of the 22 transcription factors that were examined, at least one mismatch was found that increased the binding affinity. Mismatches also converted non-specific sites into high-affinity sites, and high-affinity sites into 'super sites' that exhibit stronger affinity than any known canonical binding site. Determination of high-resolution X-ray structures, combined with nuclear magnetic resonance measurements and structural analyses, showed that many of the DNA mismatches that increase binding induce distortions that are similar to those induced by protein binding-thus prepaying some of the energetic cost incurred from deforming the DNA. Our work indicates that conformational penalties are a major determinant of protein-DNA recognition, and reveals mechanisms by which mismatches can recruit transcription factors and thus modulate replication and repair activities in the cell10,11.


Asunto(s)
Proteínas de Unión al ADN/química , Conformación Molecular , Ácidos Nucleicos Heterodúplex/química , Proteínas de Arabidopsis/química , Emparejamiento Base , Sitios de Unión , Cristalografía por Rayos X , Humanos , Modelos Moleculares , Mutación , Resonancia Magnética Nuclear Biomolecular , Unión Proteica , Proteínas de Saccharomyces cerevisiae/química , Termodinámica , Factores de Transcripción/química
3.
Nucleic Acids Res ; 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38966997

RESUMEN

Development of the malaria parasite, Plasmodium falciparum, is regulated by a limited number of sequence-specific transcription factors (TFs). However, the mechanisms by which these TFs recognize genome-wide binding sites is largely unknown. To address TF specificity, we investigated the binding of two TF subsets that either bind CACACA or GTGCAC DNA sequence motifs and further characterized two additional ApiAP2 TFs, PfAP2-G and PfAP2-EXP, which bind unique DNA motifs (GTAC and TGCATGCA). We also interrogated the impact of DNA sequence and chromatin context on P. falciparum TF binding by integrating high-throughput in vitro and in vivo binding assays, DNA shape predictions, epigenetic post-translational modifications, and chromatin accessibility. We found that DNA sequence context minimally impacts binding site selection for paralogous CACACA-binding TFs, while chromatin accessibility, epigenetic patterns, co-factor recruitment, and dimerization correlate with differential binding. In contrast, GTGCAC-binding TFs prefer different DNA sequence context in addition to chromatin dynamics. Finally, we determined that TFs that preferentially bind divergent DNA motifs may bind overlapping genomic regions due to low-affinity binding to other sequence motifs. Our results demonstrate that TF binding site selection relies on a combination of DNA sequence and chromatin features, thereby contributing to the complexity of P. falciparum gene regulatory mechanisms.

4.
Proc Natl Acad Sci U S A ; 120(11): e2217422120, 2023 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-36888663

RESUMEN

Somatic mutations are highly enriched at transcription factor (TF) binding sites, with the strongest trend being observed for ultraviolet light (UV)-induced mutations in melanomas. One of the main mechanisms proposed for this hypermutation pattern is the inefficient repair of UV lesions within TF-binding sites, caused by competition between TFs bound to these lesions and the DNA repair proteins that must recognize the lesions to initiate repair. However, TF binding to UV-irradiated DNA is poorly characterized, and it is unclear whether TFs maintain specificity for their DNA sites after UV exposure. We developed UV-Bind, a high-throughput approach to investigate the impact of UV irradiation on protein-DNA binding specificity. We applied UV-Bind to ten TFs from eight structural families, and found that UV lesions significantly altered the DNA-binding preferences of all the TFs tested. The main effect was a decrease in binding specificity, but the precise effects and their magnitude differ across factors. Importantly, we found that despite the overall reduction in DNA-binding specificity in the presence of UV lesions, TFs can still compete with repair proteins for lesion recognition, in a manner consistent with their specificity for UV-irradiated DNA. In addition, for a subset of TFs, we identified a surprising but reproducible effect at certain nonconsensus DNA sequences, where UV irradiation leads to a high increase in the level of TF binding. These changes in DNA-binding specificity after UV irradiation, at both consensus and nonconsensus sites, have important implications for the regulatory and mutagenic roles of TFs in the cell.


Asunto(s)
Factores de Transcripción , Rayos Ultravioleta , Humanos , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Unión Proteica/genética , ADN/metabolismo
5.
Nucleic Acids Res ; 51(21): 11600-11612, 2023 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-37889068

RESUMEN

Cooperative DNA-binding by transcription factor (TF) proteins is critical for eukaryotic gene regulation. In the human genome, many regulatory regions contain TF-binding sites in close proximity to each other, which can facilitate cooperative interactions. However, binding site proximity does not necessarily imply cooperative binding, as TFs can also bind independently to each of their neighboring target sites. Currently, the rules that drive cooperative TF binding are not well understood. In addition, it is oftentimes difficult to infer direct TF-TF cooperativity from existing DNA-binding data. Here, we show that in vitro binding assays using DNA libraries of a few thousand genomic sequences with putative cooperative TF-binding events can be used to develop accurate models of cooperativity and to gain insights into cooperative binding mechanisms. Using factors ETS1 and RUNX1 as our case study, we show that the distance and orientation between ETS1 sites are critical determinants of cooperative ETS1-ETS1 binding, while cooperative ETS1-RUNX1 interactions show more flexibility in distance and orientation and can be accurately predicted based on the affinity and sequence/shape features of the binding sites. The approach described here, combining custom experimental design with machine-learning modeling, can be easily applied to study the cooperative DNA-binding patterns of any TFs.


Asunto(s)
Subunidad alfa 2 del Factor de Unión al Sitio Principal , Regulación de la Expresión Génica , Humanos , Subunidad alfa 2 del Factor de Unión al Sitio Principal/genética , Sitios de Unión/genética , Unión Proteica , ADN/química
6.
Genome Res ; 31(7): 1216-1229, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33975875

RESUMEN

Most eukaryotic transcription factors (TFs) are part of large protein families, with members of the same family (i.e., paralogous TFs) recognizing similar DNA-binding motifs but performing different regulatory functions. Many TF paralogs are coexpressed in the cell and thus can compete for target sites across the genome. However, this competition is rarely taken into account when studying the in vivo binding patterns of eukaryotic TFs. Here, we show that direct competition for DNA binding between TF paralogs is a major determinant of their genomic binding patterns. Using yeast proteins Cbf1 and Pho4 as our model system, we designed a high-throughput quantitative assay to capture the genomic binding profiles of competing TFs in a cell-free system. Our data show that Cbf1 and Pho4 greatly influence each other's occupancy by competing for their common putative genomic binding sites. The competition is different at different genomic sites, as dictated by the TFs' expression levels and their divergence in DNA-binding specificity and affinity. Analyses of ChIP-seq data show that the biophysical rules that dictate the competitive TF binding patterns in vitro are also followed in vivo, in the complex cellular environment. Furthermore, the Cbf1-Pho4 competition for genomic sites, as characterized in vitro using our new assay, plays a critical role in the specific activation of their target genes in the cell. Overall, our study highlights the importance of direct TF-TF competition for genomic binding and gene regulation by TF paralogs, and proposes an approach for studying this competition in a quantitative and high-throughput manner.

7.
Nucleic Acids Res ; 47(W1): W127-W135, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31114870

RESUMEN

Non-coding genetic variants/mutations can play functional roles in the cell by disrupting regulatory interactions between transcription factors (TFs) and their genomic target sites. For most human TFs, a myriad of DNA-binding models are available and could be used to predict the effects of DNA mutations on TF binding. However, information on the quality of these models is scarce, making it hard to evaluate the statistical significance of predicted binding changes. Here, we present QBiC-Pred, a web server for predicting quantitative TF binding changes due to nucleotide variants. QBiC-Pred uses regression models of TF binding specificity trained on high-throughput in vitro data. The training is done using ordinary least squares (OLS), and we leverage distributional results associated with OLS estimation to compute, for each predicted change in TF binding, a P-value reflecting our confidence in the predicted effect. We show that OLS models are accurate in predicting the effects of mutations on TF binding in vitro and in vivo, outperforming widely-used PWM models as well as recently developed deep learning models of specificity. QBiC-Pred takes as input mutation datasets in several formats, and it allows post-processing of the results through a user-friendly web interface. QBiC-Pred is freely available at http://qbic.genome.duke.edu.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Programas Informáticos , Factores de Transcripción/genética , Algoritmos , Sitios de Unión/genética , ADN/genética , Humanos , Unión Proteica/genética
8.
Proc Natl Acad Sci U S A ; 114(34): E7054-E7062, 2017 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-28784765

RESUMEN

The ELISA is the mainstay for sensitive and quantitative detection of protein analytes. Despite its utility, ELISA is time-consuming, resource-intensive, and infrastructure-dependent, limiting its availability in resource-limited regions. Here, we describe a self-contained immunoassay platform (the "D4 assay") that converts the sandwich immunoassay into a point-of-care test (POCT). The D4 assay is fabricated by inkjet printing assay reagents as microarrays on nanoscale polymer brushes on glass chips, so that all reagents are "on-chip," and these chips show durable storage stability without cold storage. The D4 assay can interrogate multiple analytes from a drop of blood, is compatible with a smartphone detector, and displays analytical figures of merit that are comparable to standard laboratory-based ELISA in whole blood. These attributes of the D4 POCT have the potential to democratize access to high-performance immunoassays in resource-limited settings without sacrificing their performance.


Asunto(s)
Análisis Químico de la Sangre/métodos , Inmunoensayo/métodos , Polímeros/química , Biomarcadores/sangre , Análisis Químico de la Sangre/instrumentación , Diseño de Equipo , Humanos , Inmunoensayo/instrumentación , Inmunoglobulina G/sangre , Inmunoglobulina M/sangre , Leptina/sangre , Sistemas de Atención de Punto , Impresión
9.
Nucleic Acids Res ; 45(20): 11684-11699, 2017 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-28977539

RESUMEN

Our current understanding of cellular transdifferentiation systems is limited. It is oftentimes unknown, at a genome-wide scale, how much transdifferentiated cells differ quantitatively from both the starting cells and the target cells. Focusing on transdifferentiation of primary human skin fibroblasts by forced expression of myogenic transcription factor MyoD, we performed quantitative analyses of gene expression and chromatin accessibility profiles of transdifferentiated cells compared to fibroblasts and myoblasts. In this system, we find that while many of the early muscle marker genes are reprogrammed, global gene expression and accessibility changes are still incomplete when compared to myoblasts. In addition, we find evidence of epigenetic memory in the transdifferentiated cells, with reminiscent features of fibroblasts being visible both in chromatin accessibility and gene expression. Quantitative analyses revealed a continuum of changes in chromatin accessibility induced by MyoD, and a strong correlation between chromatin-remodeling deficiencies and incomplete gene expression reprogramming. Classification analyses identified genetic and epigenetic features that distinguish reprogrammed from non-reprogrammed sites, and suggested ways to potentially improve transdifferentiation efficiency. Our approach for combining gene expression, DNA accessibility, and protein-DNA binding data to quantify and characterize the efficiency of cellular transdifferentiation on a genome-wide scale can be applied to any transdifferentiation system.


Asunto(s)
Transdiferenciación Celular/genética , Reprogramación Celular/genética , Ensamble y Desensamble de Cromatina/genética , Proteína MioD/genética , Western Blotting , Células Cultivadas , Cromatina/genética , Cromatina/metabolismo , Fibroblastos/citología , Fibroblastos/metabolismo , Perfilación de la Expresión Génica/métodos , Ontología de Genes , Células HEK293 , Humanos , Microscopía Confocal , Proteína MioD/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Piel/citología
10.
Trends Biochem Sci ; 39(9): 381-99, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25129887

RESUMEN

Transcription factors (TFs) influence cell fate by interpreting the regulatory DNA within a genome. TFs recognize DNA in a specific manner; the mechanisms underlying this specificity have been identified for many TFs based on 3D structures of protein-DNA complexes. More recently, structural views have been complemented with data from high-throughput in vitro and in vivo explorations of the DNA-binding preferences of many TFs. Together, these approaches have greatly expanded our understanding of TF-DNA interactions. However, the mechanisms by which TFs select in vivo binding sites and alter gene expression remain unclear. Recent work has highlighted the many variables that influence TF-DNA binding, while demonstrating that a biophysical understanding of these many factors will be central to understanding TF function.


Asunto(s)
Fenómenos Biofísicos/genética , ADN/genética , Genoma/genética , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Biología Computacional , ADN/metabolismo , Humanos , Unión Proteica
11.
Proc Natl Acad Sci U S A ; 112(15): 4654-9, 2015 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-25775564

RESUMEN

DNA binding specificities of transcription factors (TFs) are a key component of gene regulatory processes. Underlying mechanisms that explain the highly specific binding of TFs to their genomic target sites are poorly understood. A better understanding of TF-DNA binding requires the ability to quantitatively model TF binding to accessible DNA as its basic step, before additional in vivo components can be considered. Traditionally, these models were built based on nucleotide sequence. Here, we integrated 3D DNA shape information derived with a high-throughput approach into the modeling of TF binding specificities. Using support vector regression, we trained quantitative models of TF binding specificity based on protein binding microarray (PBM) data for 68 mammalian TFs. The evaluation of our models included cross-validation on specific PBM array designs, testing across different PBM array designs, and using PBM-trained models to predict relative binding affinities derived from in vitro selection combined with deep sequencing (SELEX-seq). Our results showed that shape-augmented models compared favorably to sequence-based models. Although both k-mer and DNA shape features can encode interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space. In addition, analyzing the feature weights of DNA shape-augmented models uncovered TF family-specific structural readout mechanisms that were not revealed by the DNA sequence. As such, this work combines knowledge from structural biology and genomics, and suggests a new path toward understanding TF binding and genome function.


Asunto(s)
ADN/química , ADN/metabolismo , Conformación de Ácido Nucleico , Factores de Transcripción/metabolismo , Algoritmos , Animales , Secuencia de Bases , Sitios de Unión/genética , Biología Computacional/métodos , ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Cinética , Ratones , Modelos Genéticos , Análisis por Matrices de Proteínas , Unión Proteica , Factores de Transcripción/genética
12.
Proc Natl Acad Sci U S A ; 111(48): 17140-5, 2014 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-25313048

RESUMEN

Until now, it has been reasonably assumed that specific base-pair recognition is the only mechanism controlling the specificity of transcription factor (TF)-DNA binding. Contrary to this assumption, here we show that nonspecific DNA sequences possessing certain repeat symmetries, when present outside of specific TF binding sites (TFBSs), statistically control TF-DNA binding preferences. We used high-throughput protein-DNA binding assays to measure the binding levels and free energies of binding for several human TFs to tens of thousands of short DNA sequences with varying repeat symmetries. Based on statistical mechanics modeling, we identify a new protein-DNA binding mechanism induced by DNA sequence symmetry in the absence of specific base-pair recognition, and experimentally demonstrate that this mechanism indeed governs protein-DNA binding preferences.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Ensayos Analíticos de Alto Rendimiento/métodos , Factores de Transcripción/metabolismo , Algoritmos , Emparejamiento Base , Secuencia de Bases , Sitios de Unión/genética , ADN/química , ADN/genética , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Humanos , Modelos Moleculares , Conformación de Ácido Nucleico , Motivos de Nucleótidos/genética , Unión Proteica , Estructura Terciaria de Proteína , Secuencias Repetitivas de Ácidos Nucleicos/genética , Homología de Secuencia de Ácido Nucleico , Termodinámica , Factores de Transcripción/química , Factores de Transcripción/genética
13.
PLoS Comput Biol ; 11(8): e1004429, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26285121

RESUMEN

Recent genome-wide experiments in different eukaryotic genomes provide an unprecedented view of transcription factor (TF) binding locations and of nucleosome occupancy. These experiments revealed that a large fraction of TF binding events occur in regions where only a small number of specific TF binding sites (TFBSs) have been detected. Furthermore, in vitro protein-DNA binding measurements performed for hundreds of TFs indicate that TFs are bound with wide range of affinities to different DNA sequences that lack known consensus motifs. These observations have thus challenged the classical picture of specific protein-DNA binding and strongly suggest the existence of additional recognition mechanisms that affect protein-DNA binding preferences. We have previously demonstrated that repetitive DNA sequence elements characterized by certain symmetries statistically affect protein-DNA binding preferences. We call this binding mechanism nonconsensus protein-DNA binding in order to emphasize the point that specific consensus TFBSs do not contribute to this effect. In this paper, using the simple statistical mechanics model developed previously, we calculate the nonconsensus protein-DNA binding free energy for the entire C. elegans and D. melanogaster genomes. Using the available chromatin immunoprecipitation followed by sequencing (ChIP-seq) results on TF-DNA binding preferences for ~100 TFs, we show that DNA sequences characterized by low predicted free energy of nonconsensus binding have statistically higher experimental TF occupancy and lower nucleosome occupancy than sequences characterized by high free energy of nonconsensus binding. This is in agreement with our previous analysis performed for the yeast genome. We suggest therefore that nonconsensus protein-DNA binding assists the formation of nucleosome-free regions, as TFs outcompete nucleosomes at genomic locations with enhanced nonconsensus binding. In addition, here we perform a new, large-scale analysis using in vitro TF-DNA preferences obtained from the universal protein binding microarrays (PBM) for ~90 eukaryotic TFs belonging to 22 different DNA-binding domain types. As a result of this new analysis, we conclude that nonconsensus protein-DNA binding is a widespread phenomenon that significantly affects protein-DNA binding preferences and need not require the presence of consensus (specific) TFBSs in order to achieve genome-wide TF-DNA binding specificity.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Genoma/genética , Unión Proteica/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Animales , Secuencia de Bases , Sitios de Unión , Caenorhabditis elegans/genética , Biología Computacional , ADN/química , ADN/genética , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Drosophila melanogaster/genética , Modelos Genéticos , Datos de Secuencia Molecular , Termodinámica , Factores de Transcripción/química , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
14.
Nucleic Acids Res ; 42(4): 2099-111, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24243859

RESUMEN

Binding of proteins to particular DNA sites across the genome is a primary determinant of specificity in genome maintenance and gene regulation. DNA-binding specificity is encoded at multiple levels, from the detailed biophysical interactions between proteins and DNA, to the assembly of multi-protein complexes. At each level, variation in the mechanisms used to achieve specificity has led to difficulties in constructing and applying simple models of DNA binding. We review the complexities in protein-DNA binding found at multiple levels and discuss how they confound the idea of simple recognition codes. We discuss the impact of new high-throughput technologies for the characterization of protein-DNA binding, and how these technologies are uncovering new complexities in protein-DNA recognition. Finally, we review the concept of multi-protein recognition codes in which new DNA-binding specificities are achieved by the assembly of multi-protein complexes.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , ADN/química , Proteínas de Unión al ADN/química , Unión Proteica
15.
Nucleic Acids Res ; 42(Web Server issue): W461-7, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24861628

RESUMEN

Most transcription factors (TFs) belong to protein families that share a common DNA binding domain and have very similar DNA binding preferences. However, many paralogous TFs (i.e. members of the same TF family) perform different regulatory functions and interact with different genomic regions in the cell. A potential mechanism for achieving this differential in vivo specificity is through interactions with protein co-factors. Computational tools for studying the genomic binding profiles of paralogous TFs and identifying their putative co-factors are currently lacking. Here, we present an interactive web implementation of COUGER, a classification-based framework for identifying protein co-factors that might provide specificity to paralogous TFs. COUGER takes as input two sets of genomic regions bound by paralogous TFs, and it identifies a small set of putative co-factors that best distinguish the two sets of sequences. To achieve this task, COUGER uses a classification approach, with features that reflect the DNA-binding specificities of the putative co-factors. The identified co-factors are presented in a user-friendly output page, together with information that allows the user to understand and to explore the contributions of individual co-factor features. COUGER can be run as a stand-alone tool or through a web interface: http://couger.oit.duke.edu.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , Programas Informáticos , Factores de Transcripción/metabolismo , Sitios de Unión , Genómica , Internet
16.
Nucleic Acids Res ; 42(Database issue): D148-55, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24214955

RESUMEN

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.


Asunto(s)
ADN/química , Bases de Datos Genéticas , Elementos Reguladores de la Transcripción , Factores de Transcripción/metabolismo , Animales , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Sitios de Unión , Proteínas de Homeodominio/metabolismo , Humanos , Internet , Ratones , Conformación de Ácido Nucleico , Motivos de Nucleótidos
17.
Bioinformatics ; 29(13): i117-25, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23812975

RESUMEN

MOTIVATION: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. RESULTS: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. AVAILABILITY: Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.


Asunto(s)
ADN/metabolismo , Factores de Transcripción/metabolismo , Algoritmos , Sitios de Unión , ADN/química , Genoma , Humanos , Modelos Lineales , Análisis por Matrices de Proteínas , Unión Proteica , Proteínas de Saccharomyces cerevisiae/metabolismo , Máquina de Vectores de Soporte
18.
Genome Res ; 20(2): 201-11, 2010 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-19996087

RESUMEN

The origin recognition complex (ORC) is an essential DNA replication initiation factor conserved in all eukaryotes. In Saccharomyces cerevisiae, ORC binds to specific DNA elements; however, in higher eukaryotes, ORC exhibits little sequence specificity in vitro or in vivo. We investigated the genome-wide distribution of ORC in Drosophila and found that ORC localizes to specific chromosomal locations in the absence of any discernible simple motif. Although no clear sequence motif emerged, we were able to use machine learning approaches to accurately discriminate between ORC-associated sequences and ORC-free sequences based solely on primary sequence. The complex sequence features that define ORC binding sites are highly correlated with nucleosome positioning signals and likely represent a preferred nucleosomal landscape for ORC association. Open chromatin appears to be the underlying feature that is deterministic for ORC binding. ORC-associated sequences are enriched for the histone variant, H3.3, often at transcription start sites, and depleted for bulk nucleosomes. The density of ORC binding along the chromosome is reflected in the time at which a sequence replicates, with early replicating sequences having a high density of ORC binding. Finally, we found a high concordance between sites of ORC binding and cohesin loading, suggesting that, in addition to DNA replication, ORC may be required for the loading of cohesin on DNA in Drosophila.


Asunto(s)
Proteínas de Ciclo Celular/metabolismo , Cromatina/metabolismo , Proteínas Cromosómicas no Histona/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Complejo de Reconocimiento del Origen/metabolismo , Animales , Línea Celular , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN , Cohesinas
19.
RNA ; 17(4): 665-74, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21282347

RESUMEN

Tat specific factor 1 (Tat-SF1) interacts with components of both the transcription and splicing machineries and has been classified as a transcription-splicing factor. Although its function as an HIV-1 dependency factor has been investigated, relatively little is known about the cellular functions of Tat-SF1. To identify target genes of Tat-SF1, we utilized a combination of RNAi and exon-specific microarrays. These arrays, which survey genome-wide changes in transcript and individual exon levels, revealed 450 genes with transcript level changes upon Tat-SF1 depletion. Strikingly, 98% of these target genes were down-regulated upon depletion, indicating that Tat-SF1 generally activates gene expression. We also identified 89 genes that showed differential exon level changes after Tat-SF1 depletion. The 89 genes showed evidence of many different types of alternative exon use consistent with the regulation of transcription initiation sites and RNA processing. Minimal overlap between genes with transcript-level and exon-level changes suggests that Tat-SF1 does not functionally couple transcription and splicing. Biological processes significantly enriched with transcript- and exon-level targets include the cell cycle and nucleic acid metabolism; the insulin signaling pathway was enriched with Tat-SF1 transcript-level targets but not exon-level targets. Additionally, a hexamer, ATGCCG, was over-represented in the promoter region of genes showing changes in transcription initiation upon Tat-SF1 depletion. This may represent a novel motif that Tat-SF1 recognizes during transcription. Together, these findings suggest that Tat-SF1 functions independently in transcription and splicing of cellular genes.


Asunto(s)
Exones , VIH-1/metabolismo , Transactivadores/metabolismo , Factores de Transcripción , Empalme Alternativo , Ciclo Celular/genética , Línea Celular , ADN/metabolismo , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , ARN/metabolismo , Transactivadores/genética
20.
Science ; 381(6664): eadd1250, 2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37733848

RESUMEN

Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.


Asunto(s)
Regulación de la Expresión Génica , Repeticiones de Microsatélite , Factores de Transcripción , Células Eucariotas , Factores de Transcripción/química , Factores de Transcripción/genética , Unión Proteica , Humanos , Animales , Saccharomyces cerevisiae , Dominios Proteicos , Conformación Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA