RESUMO
Bromodomain and extraterminal (BET) proteins bind acetylated lysine residues in histones and nonhistone proteins via tandem bromodomains and regulate chromatin dynamics, cellular processes, and disease procession. Thus targeting BET proteins is a promising strategy for treating various diseases, especially malignant tumors and chronic inflammation. Many pan-BET small-molecule inhibitors have been described, and some of them are in clinical evaluation. Nevertheless, the limited clinical efficacy of the current BET inhibitors is also evident and has inspired the development of new technologies to improve their clinical outcomes and minimize unwanted side effects. In this Review, we summarize the latest protein characteristics and biological functions of BRD4 as an example of BET proteins, analyze the clinical development status and preclinical resistance mechanisms, and discuss recent advances in BRD4-selective inhibitors, dual-target BET inhibitors, proteolysis targeting chimera degraders, and protein-protein interaction inhibitors.
Assuntos
Compostos Orgânicos/uso terapêutico , Fatores de Transcrição/antagonistas & inibidores , Sequência de Aminoácidos , Animais , Linhagem Celular Tumoral , Ensaios Clínicos como Assunto , Descoberta de Drogas , Humanos , Compostos Orgânicos/metabolismo , Compostos Orgânicos/farmacologia , Ligação Proteica/efeitos dos fármacos , Domínios Proteicos , Multimerização Proteica/efeitos dos fármacos , Proteólise/efeitos dos fármacos , Fatores de Transcrição/química , Fatores de Transcrição/metabolismoRESUMO
Bromodomains (BDs) are small protein modules that interact with acetylated marks in histones. These posttranslational modifications are pivotal to regulate gene expression, making BDs promising targets to treat several diseases. While the general structure of BDs is well known, their dynamical features and their interplay with other macromolecules are poorly understood, hampering the rational design of potent and selective inhibitors. Here, we combine extensive molecular dynamics simulations, Markov state modeling, and available structural data to reveal a transiently formed state that is conserved across all BD families. It involves the breaking of two backbone hydrogen bonds that anchor the ZA-loop with the αA helix, opening a cryptic pocket that partially occludes the one associated to histone binding. By analyzing more than 1,900 experimental structures, we unveil just two adopting the hidden state, explaining why it has been previously unnoticed and providing direct structural evidence for its existence. Our results suggest that this state is an allosteric regulatory switch for BDs, potentially related to a recently unveiled BD-DNA-binding mode.
Assuntos
Proteínas de Ciclo Celular/química , Proteínas Correpressoras/química , Proteínas de Ligação a DNA/química , Histona Acetiltransferases/química , Peptídeos e Proteínas de Sinalização Intracelular/química , Fatores Genéricos de Transcrição/química , Fatores de Transcrição/química , Proteína 28 com Motivo Tripartido/química , Sequência de Aminoácidos , Sítios de Ligação , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Proteínas Correpressoras/genética , Proteínas Correpressoras/metabolismo , Cristalografia por Raios X , DNA/química , DNA/genética , DNA/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Regulação da Expressão Gênica , Histona Acetiltransferases/genética , Histona Acetiltransferases/metabolismo , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/genética , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Cadeias de Markov , Simulação de Dinâmica Molecular , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Termodinâmica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Fatores Genéricos de Transcrição/genética , Fatores Genéricos de Transcrição/metabolismo , Proteína 28 com Motivo Tripartido/genética , Proteína 28 com Motivo Tripartido/metabolismoRESUMO
Absolute binding free energy calculations with explicit solvent molecular simulations can provide estimates of protein-ligand affinities, and thus reduce the time and costs needed to find new drug candidates. However, these calculations can be complex to implement and perform. Here, we introduce the software BAT.py, a Python tool that invokes the AMBER simulation package to automate the calculation of binding free energies for a protein with a series of ligands. The software supports the attach-pull-release (APR) and double decoupling (DD) binding free energy methods, as well as the simultaneous decoupling-recoupling (SDR) method, a variant of double decoupling that avoids numerical artifacts associated with charged ligands. We report encouraging initial test applications of this software both to re-rank docked poses and to estimate overall binding free energies. We also show that it is practical to carry out these calculations cheaply by using graphical processing units in common machines that can be built for this purpose. The combination of automation and low cost positions this procedure to be applied in a relatively high-throughput mode and thus stands to enable new applications in early-stage drug discovery.
Assuntos
Descoberta de Drogas , Simulação de Acoplamento Molecular , Proteínas/química , Proteínas/metabolismo , Software , Automação , Sítios de Ligação , Proteínas de Ciclo Celular/química , Proteínas de Ciclo Celular/metabolismo , Custos e Análise de Custo , Descoberta de Drogas/economia , Ligantes , Simulação de Acoplamento Molecular/economia , Simulação de Dinâmica Molecular , Estrutura Molecular , Proteína de Sequência 1 de Leucemia de Células Mieloides/metabolismo , Ligação Proteica , Conformação Proteica , Software/economia , Solventes/química , Termodinâmica , Fatores de Transcrição/química , Fatores de Transcrição/metabolismoRESUMO
D089-0563 is a highly promising anti-cancer compound that selectively binds the transcription-silencing G-quadruplex element (Pu27) at the promoter region of the human c-MYC oncogene; however, its binding mechanism remains elusive. The structure of Pu27 is not available due to its polymorphism, but the G-quadruplex structures of its two shorter derivatives in complex with a ligand (Pu24/Phen-DC3 and Pu22/DC-34) are available and show significant structural variance as well as different ligand binding patterns in the 3' region. Because D089-0563 shares the same scaffold as DC34 while having a significantly different scaffold from Phen-DC3, we picked Pu24 instead of Pu22 for this study in order to gain additional ligand binding insight. Using free ligand molecular dynamics binding simulations (33 µs), we probed the binding of D089-0563 to Pu24. Our clustering analysis identified three binding modes (top, side, and bottom) and subsequent MMPBSA binding energy analysis identified the top mode as the most thermodynamically stable. Our Markov State Model (MSM) analysis revealed that there are three parallel pathways for D089-0563 to the top mode from unbound state and that the ligand binding follows the conformational selection mechanism. Combining our predicted complex structures with the two experimental structures, it is evident that structural differences in the 3' region between Pu24 and Pu22 lead to different binding behaviors despite having similar ligands; this also explains the different promoter activity caused by the two G-quadruplex sequences observed in a recent synthetic biology study. Based on interaction insights, 625 D089-0563 derivatives were designed and docked; 59 of these showed slightly improved docking scores.
Assuntos
Proteínas de Ligação a DNA/química , Quadruplex G , Ligantes , Simulação de Dinâmica Molecular , Fatores de Transcrição/química , Proteínas de Ligação a DNA/metabolismo , Humanos , Ligação Proteica , Fatores de Transcrição/metabolismoRESUMO
Mitochondrial RNA polymerases depend on initiation factors, such as TFB2M in humans and Mtf1 in yeast Saccharomyces cerevisiae, for promoter-specific transcription. These factors drive the melting of promoter DNA, but how they support RNA priming and growth was not understood. We show that the flexible C-terminal tails of Mtf1 and TFB2M play a crucial role in RNA priming by aiding template strand alignment in the active site for high-affinity binding of the initiating nucleotides. Using single-molecule fluorescence approaches, we show that the Mtf1 C-tail promotes RNA growth during initiation by stabilizing the scrunched DNA conformation. Additionally, due to its location in the path of the nascent RNA, the C-tail of Mtf1 serves as a sensor of the RNA-DNA hybrid length. Initially, steric clashes of the Mtf1 C-tail with short RNA-DNA hybrids cause abortive synthesis but clashes with longer RNA-DNA trigger conformational changes for the timely release of the promoter DNA to commence the transition into elongation. The remarkable similarities in the functions of the C-tail and σ3.2 finger of the bacterial factor suggest mechanistic convergence of a flexible element in the transcription initiation factor that engages the DNA template for RNA priming and growth and disengages when needed to generate the elongation complex.
Assuntos
DNA Fúngico/genética , Proteínas Mitocondriais/química , Proteínas Mitocondriais/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Moldes Genéticos , Elongação da Transcrição Genética , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Sequência de Aminoácidos , Sequência de Bases , Biocatálise , DNA Fúngico/química , Cadeias de Markov , Metiltransferases/química , Metiltransferases/metabolismo , Conformação de Ácido Nucleico , Nucleotídeos/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Conformação Proteica , RNA Fúngico/biossíntese , Deleção de Sequência , Relação Estrutura-Atividade , Iniciação da Transcrição GenéticaRESUMO
Bound by transcription factors, DNA motifs (i.e. transcription factor binding sites) are prevalent and important for gene regulation in different tissues at different developmental stages of eukaryotes. Although considerable efforts have been made on elucidating monomeric DNA motif patterns, our knowledge on heterodimeric DNA motifs are still far from complete. Therefore, we propose to develop a computational approach to synthesize a heterodimeric DNA motif from two monomeric DNA motifs. The approach is sequentially divided into two components (Phases A and B). In Phase A, we propose to develop the inference models on how two DNA monomeric motifs can be oriented and overlapped with each other at nucleotide level. In Phase B, given the two monomeric DNA motifs oriented, we further propose to develop DNA-binding family-specific input-output hidden Markov models (IOHMMs) to synthesize a heterodimeric DNA motif. To validate the approach, we execute and cross-validate it with the experimentally verified 618 heterodimeric DNA motifs across 49 DNA-binding family combinations. We observe that our approach can even "rescue" the existing heterodimeric DNA motif pattern (i.e. HOXB2_EOMES) previously published on Nature. Lastly, we apply the proposed approach to infer previously uncharacterized heterodimeric motifs. Their motif instances are supported by DNase accessibility, gene ontology, protein-protein interactions, in vivo ChIP-seq peaks, and even structural data from PDB. A public web-server is built for open accessibility and scientific impact. Its address is listed as follows: http://motif.cs.cityu.edu.hk/custom/MotifKirin.
Assuntos
Biologia Computacional , Genômica/métodos , Motivos de Nucleotídeos/genética , Fatores de Transcrição/genética , Algoritmos , Sítios de Ligação/genética , Replicação do DNA/genética , Regulação da Expressão Gênica no Desenvolvimento/genética , Humanos , Cadeias de Markov , Elementos Reguladores de Transcrição/genética , Análise de Sequência de DNA/métodos , Software , Fatores de Transcrição/químicaRESUMO
Single-pair Förster resonance energy transfer (spFRET) has become an important tool for investigating conformational dynamics in biological systems. To extract dynamic information from the spFRET traces measured with total internal reflection fluorescence microscopy, we extended the hidden Markov model (HMM) approach. In our extended HMM analysis, we incorporated the photon-shot noise from camera-based systems into the HMM. Thus, the variance in Förster resonance energy transfer (FRET) efficiency of the various states, which is typically a fitted parameter, is explicitly included in the analysis estimated from the number of detected photons. It is also possible to include an additional broadening of the FRET state, which would then only reflect the inherent flexibility of the dynamic biological systems. This approach is useful when comparing the dynamics of individual molecules for which the total intensities vary significantly. We used spFRET with the extended HMM analysis to investigate the dynamics of TATA-box-binding protein (TBP) on promoter DNA in the presence of negative cofactor 2 (NC2). We compared the dynamics of two promoters as well as DNAs of different length and labeling location. For the adenovirus major late promoter, four FRET states were observed; three states correspond to different conformations of the DNA in the TBP-DNA-NC2 complex and a four-state model in which the complex has shifted along the DNA. The HMM analysis revealed that the states are connected via a linear, four-well model. For the H2B promoter, more complex dynamics were observed. By clustering the FRET states detected with the HMM analysis, we could compare the general dynamics observed for the two promoter sequences. We observed that the dynamics from a stretched DNA conformation to a bent conformation for the two promoters were similar, whereas the bent conformation of the TBP-DNA-NC2 complex for the H2B promoter is approximately three times more stable than for the adenovirus major late promoter.
Assuntos
DNA/metabolismo , Transferência Ressonante de Energia de Fluorescência , Cadeias de Markov , Proteína de Ligação a TATA-Box/metabolismo , Fatores de Transcrição/metabolismo , DNA/química , Modelos Moleculares , Conformação de Ácido Nucleico , Conformação Proteica , Proteína de Ligação a TATA-Box/química , Fatores de Transcrição/químicaRESUMO
Transcriptional activation domains are essential for gene regulation, but their intrinsic disorder and low primary sequence conservation have made it difficult to identify the amino acid composition features that underlie their activity. Here, we describe a rational mutagenesis scheme that deconvolves the function of four activation domain sequence features-acidity, hydrophobicity, intrinsic disorder, and short linear motifs-by quantifying the activity of thousands of variants in vivo and simulating their conformational ensembles using an all-atom Monte Carlo approach. Our results with a canonical activation domain from the Saccharomyces cerevisiae transcription factor Gcn4 reconcile existing observations into a unified model of its function: the intrinsic disorder and acidic residues keep two hydrophobic motifs from driving collapse. Instead, the most-active variants keep their aromatic residues exposed to the solvent. Our results illustrate how the function of intrinsically disordered proteins can be revealed by high-throughput rational mutagenesis.
Assuntos
Fatores de Transcrição de Zíper de Leucina Básica/química , Proteínas de Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/genética , Fatores de Transcrição/química , Fatores de Transcrição de Zíper de Leucina Básica/fisiologia , Domínio Catalítico , Regulação da Expressão Gênica , Concentração de Íons de Hidrogênio , Modelos Moleculares , Método de Monte Carlo , Mutagênese Sítio-Dirigida , Domínios Proteicos , Proteínas de Saccharomyces cerevisiae/fisiologia , Análise de Sequência de Proteína , Fatores de Transcrição/fisiologiaRESUMO
Methyllysine analogues (MLAs), furnished by aminoethylation of engineered cysteine residues, are widely used surrogates of histone methyllysine and are considered to be effective proxies for studying these epigenetic marks in vitro. Here we report the first structure of a trimethyllysine MLA histone in complex with a protein binding partner, quantify the thermodynamic distinctions between MLAs and their native methyllysine counterparts, and demonstrate that these differences can compromise qualitative interpretations of binding at the nucleosome level. Quantitative measurements with two methyllysine binding protein modules reveal substantial affinity losses for the MLA peptides versus the corresponding native methyllysine species in both cases, although the thermodynamic underpinnings are distinct. MLA and methyllysine adopt distinct conformational geometries when in complex with the BPTF PHD finger, a well-established H3K4me3 binding partner. In this case, an â¼13-fold Kd difference at the peptide level translates to nucleosomal affinities for MLA analogues that fall outside of the detectable range in a pull-down format, whereas the methyllysine species installed by native chemical ligation demonstrates robust binding. Thus, despite their facile production and commercial availability, there is a significant caveat of potentially altered binding affinity when MLAs are used in place of native methyllysine residues.
Assuntos
Antígenos Nucleares/química , Histonas/química , Lisina/análogos & derivados , Proteínas do Tecido Nervoso/química , Dedos de Zinco PHD , Fatores de Transcrição/química , Sequência de Aminoácidos , Humanos , Lisina/química , Ligação Proteica , Processamento de Proteína Pós-Traducional , TermodinâmicaRESUMO
Uncovering the mechanisms that affect the binding specificity of transcription factors (TFs) is critical for understanding the principles of gene regulation. Although sequence-based models have been used successfully to predict TF binding specificities, we found that including DNA shape information in these models improved their accuracy and interpretability. Previously, we developed a method for modeling DNA binding specificities based on DNA shape features extracted from Monte Carlo (MC) simulations. Prediction accuracies of our models, however, have not yet been compared to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) data or Molecular Dynamics (MD) simulations. Here, we integrated DNA shape information extracted from MC or MD simulations and XRC data into predictive models of TF binding and compared their performance. Models that incorporated structural information consistently showed improved performance over sequence-based models regardless of data source. Furthermore, we derived and validated nine additional DNA shape features beyond our original set of four features. The expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width, is available in our R/Bioconductor package DNAshapeR and enables a comprehensive structural description of the double helix on a genome-wide scale.
Assuntos
Algoritmos , Biologia Computacional/métodos , DNA/química , Estudo de Associação Genômica Ampla/métodos , Fatores de Transcrição/química , Sequência de Bases , Cristalografia por Raios X , DNA/genética , DNA/metabolismo , Simulação de Dinâmica Molecular , Método de Monte Carlo , Conformação de Ácido Nucleico , Ligação Proteica , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismoRESUMO
BACKGROUND: Crustacea, the second largest subphylum of Arthropoda, includes species of major ecological and economic importance, such as crabs, lobsters, crayfishes, shrimps, and barnacles. With the rapid development of crustacean aquaculture and biodiversity loss, understanding the gene regulatory mechanisms of growth, reproduction, and development of crustaceans is crucial to both aquaculture development and biodiversity conservation of this group of organisms. In these biological processes, transcription factors (TFs) play a vital role in regulating gene expression. However, crustacean transcription factors are still largely unknown, because the lack of complete genome sequences of most crustacean species hampers the studies on their transcriptional regulation on a system-wide scale. Thus, the current TF databases derived from genome sequences contain TF information for only a few crustacean species and are insufficient to elucidate the transcriptional diversity of such a large animal group. RESULTS: Our database CrusTF ( http://qinlab.sls.cuhk.edu.hk/CrusTF ) provides comprehensive information for evolutionary and functional studies on the crustacean transcriptional regulatory system. CrusTF fills the knowledge gap of transcriptional regulation in crustaceans by exploring publicly available and newly sequenced transcriptomes of 170 crustacean species and identifying 131,941 TFs within 63 TF families. CrusTF features three categories of information: sequence, function, and evolution of crustacean TFs. The database enables searching, browsing and downloading of crustacean TF sequences. CrusTF infers DNA binding motifs of crustacean TFs, thus facilitating the users to predict potential downstream TF targets. The database also presents evolutionary analyses of crustacean TFs, which improve our understanding of the evolution of transcriptional regulatory systems in crustaceans. CONCLUSIONS: Given the importance of TF information in evolutionary and functional studies on transcriptional regulatory systems of crustaceans, this database will constitute a key resource for the research community of crustacean biology and evolutionary biology. Moreover, CrusTF serves as a model for the construction of TF database derived from transcriptome data. A similar approach could be applied to other groups of organisms, for which transcriptomes are more readily available than genomes.
Assuntos
Crustáceos/genética , Bases de Dados Genéticas , Fatores de Transcrição/fisiologia , Transcriptoma , Animais , Filogenia , Fatores de Transcrição/química , Fatores de Transcrição/classificação , Fatores de Transcrição/genéticaRESUMO
Protein-DNA binding is a fundamental component of gene regulatory processes, but it is still not completely understood how proteins recognize their target sites in the genome. Besides hydrogen bonding in the major groove (base readout), proteins recognize minor-groove geometry using positively charged amino acids (shape readout). The underlying mechanism of DNA shape readout involves the correlation between minor-groove width and electrostatic potential (EP). To probe this biophysical effect directly, rather than using minor-groove width as an indirect measure for shape readout, we developed a methodology, DNAphi, for predicting EP in the minor groove and confirmed the direct role of EP in protein-DNA binding using massive sequencing data. The DNAphi method uses a sliding-window approach to mine results from non-linear Poisson-Boltzmann (NLPB) calculations on DNA structures derived from all-atom Monte Carlo simulations. We validated this approach, which only requires nucleotide sequence as input, based on direct comparison with NLPB calculations for available crystal structures. Using statistical machine-learning approaches, we showed that adding EP as a biophysical feature can improve the predictive power of quantitative binding specificity models across 27 transcription factor families. High-throughput prediction of EP offers a novel way to integrate biophysical and genomic studies of protein-DNA binding.
Assuntos
Proteínas de Ligação a DNA/metabolismo , DNA/química , Fatores de Transcrição/metabolismo , Sítios de Ligação , DNA/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Escherichia coli/metabolismo , Fator Proteico para Inversão de Estimulação/metabolismo , Genoma , Genômica , Proteínas de Homeodomínio/metabolismo , Aprendizado de Máquina , Modelos Moleculares , Método de Monte Carlo , Conformação de Ácido Nucleico , Fosfatos/química , Ligação Proteica , Eletricidade Estática , Fatores de Transcrição/químicaRESUMO
Zinc finger proteins are the most common among families of DNA-binding transcription factors. Designer transcription factors generated by the fusion of engineered zinc finger DNA-binding domains (ZF-DBDs) to effector domains have been valuable tools for the modulation of gene expression and for targeted genome editing. However, ZF-DBDs without effector domains have also been shown to effectively modulate gene expression by competing with sequence-specific DNA-binding transcription factors. Here, we describe the methodology and provide a detailed workflow for the cloning, expression, purification, and direct cell delivery of engineered ZF-DBDs. Using this protocol, ZF-DBDs can be generated with high efficiency in less than 2 weeks. We also describe a nonradioactive method for measuring DNA binding affinity of the purified ZF-DBD proteins as well as a method for direct delivery of the purified ZF-DBDs to mammalian cells.
Assuntos
Proteínas de Ligação a DNA/metabolismo , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Proteínas de Ligação a DNA/química , Ensaio de Desvio de Mobilidade Eletroforética , Humanos , Ligação Proteica , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Fatores de Transcrição/química , Dedos de ZincoRESUMO
Acute subcellular protein targeting is a powerful tool to study biological networks. However, signaling at the plasma membrane is highly dynamic, making it difficult to study in space and time. In particular, sustained local control of molecular function is challenging owing to the lateral diffusion of plasma membrane targeted molecules. Herein we present "molecular activity painting" (MAP), a novel technology which combines photoactivatable chemically induced dimerization (pCID) with immobilized artificial receptors. The immobilization of artificial receptors by surface-immobilized antibodies blocks lateral diffusion, enabling rapid and stable "painting" of signaling molecules and their activity at the plasma membrane with micrometer precision. Using this method, we show that painting of the RhoA-myosin activator GEF-H1 induces patterned acto-myosin contraction inside living cells.
Assuntos
Membrana Celular/química , Proteínas de Ligação a DNA , Invenções , Luz , Fatores de Transcrição , Células Cultivadas , Proteínas de Ligação a DNA/química , Dimerização , Invenções/tendências , Fatores de Transcrição/químicaRESUMO
Because standard molecular dynamics (MD) simulations are unable to access time scales of interest in complex biomolecular systems, it is common to "stitch together" information from multiple shorter trajectories using approximate Markov state model (MSM) analysis. However, MSMs may require significant tuning and can yield biased results. Here, by analyzing some of the longest protein MD data sets available (>100 µs per protein), we show that estimators constructed based on exact non-Markovian (NM) principles can yield significantly improved mean first-passage times (MFPTs) for protein folding and unfolding. In some cases, MSM bias of more than an order of magnitude can be corrected when identical trajectory data are reanalyzed by non-Markovian approaches. The NM analysis includes "history" information, higher order time correlations compared to MSMs, that is available in every MD trajectory. The NM strategy is insensitive to fine details of the states used and works well when a fine time-discretization (i.e., small "lag time") is used.
Assuntos
Modelos Moleculares , Proteínas/química , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Cinética , Cadeias de Markov , Proteínas dos Microfilamentos/química , Proteínas dos Microfilamentos/metabolismo , Oligopeptídeos/química , Oligopeptídeos/metabolismo , Peptídeos/química , Peptídeos/metabolismo , Dobramento de Proteína , Desdobramento de Proteína , Proteínas/metabolismo , Fatores de Tempo , Fatores de Transcrição/química , Fatores de Transcrição/metabolismoRESUMO
The dynamics of proteins in the unfolded state can be quantified in computer simulations by calculating a spectrum of relaxation times which describes the time scales over which the population fluctuations decay to equilibrium. If the unfolded state space is discretized, we can evaluate the relaxation time of each state. We derive a simple relation that shows the mean first passage time to any state is equal to the relaxation time of that state divided by the equilibrium population. This explains why mean first passage times from state to state within the unfolded ensemble can be very long but the energy landscape can still be smooth (minimally frustrated). In fact, when the folding kinetics is two-state, all of the unfolded state relaxation times within the unfolded free energy basin are faster than the folding time. This result supports the well-established funnel energy landscape picture and resolves an apparent contradiction between this model and the recently proposed kinetic hub model of protein folding. We validate these concepts by analyzing a Markov state model of the kinetics in the unfolded state and folding of the miniprotein NTL9 (where NTL9 is the N-terminal domain of the ribosomal protein L9), constructed from a 2.9 ms simulation provided by D. E. Shaw Research.
Assuntos
Modelos Químicos , Proteínas/química , Proteínas de Arabidopsis/química , Cinética , Cadeias de Markov , Dobramento de Proteína , Termodinâmica , Fatores de Transcrição/químicaRESUMO
Genetic information, which is stored in the long strand of genomic DNA as chromatin, must be scanned and read out by various transcription factors. First, gene-specific transcription factors, which are relatively small (â¼50 kDa), scan the genome and bind regulatory elements. Such factors then recruit general transcription factors, Mediators, RNA polymerases, nucleosome remodellers, and histone modifiers, most of which are large protein complexes of 1-3 MDa in size. Here, we propose a new model for the functional significance of the size of transcription factors (or complexes) for gene regulation of chromatin domains. Recent findings suggest that chromatin consists of irregularly folded nucleosome fibres (10 nm fibres) and forms numerous condensed domains (e.g., topologically associating domains). Although the flexibility and dynamics of chromatin allow repositioning of genes within the condensed domains, the size exclusion effect of the domain may limit accessibility of DNA sequences by transcription factors. We used Monte Carlo computer simulations to determine the physical size limit of transcription factors that can enter condensed chromatin domains. Small gene-specific transcription factors can penetrate into the chromatin domains and search their target sequences, whereas large transcription complexes cannot enter the domain. Due to this property, once a large complex binds its target site via gene-specific factors it can act as a 'buoy' to keep the target region on the surface of the condensed domain and maintain transcriptional competency. This size-dependent specialization of target-scanning and surface-tethering functions could provide novel insight into the mechanisms of various DNA transactions, such as DNA replication and repair/recombination.
Assuntos
Cromatina/química , Cromatina/metabolismo , Regulação da Expressão Gênica , Método de Monte Carlo , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Transcrição Gênica , Cromatina/genética , Modelos Moleculares , Peso Molecular , Estrutura Terciária de Proteína , Ativação TranscricionalRESUMO
Motif detection has raised as an important task in bioinformatics. Recently, the discovery of motifs that are localized relative to a certain biological area has become an important task in many applications. For example, it is used to discover regulatory sequences beside the transcription start site and the neighborhood of known transcription factor binding sites [1]. Therefore, the idea of context aware motif detection approach is needed. Moreover, there is an interest to use both labeled and unlabeled sets to enhance the motif detection approaches. In this paper, three novel context aware semi-supervised motif detection approaches are proposed, which are self-learning, context aware and co-training context aware systems. In self-learning motif Hidden Markov Model (HMM) is enhanced independently using unlabeled sets. While in co-training, three different models are trained based on three different views which are pre-motif sequences, motif sequences and post-motif sequences. Moreover, our co-training context aware system is suitable for parallelization to enhance its execution time. The approaches were evaluated using human motif sequences and the results show that co-training context aware system has achieved the best results. The results also show that our approach outperforms other related works in [1], [2] and [3].
Assuntos
Biologia Computacional/métodos , Algoritmos , Sequência de Bases , Sítios de Ligação , Imunoprecipitação da Cromatina , DNA/química , Humanos , Cadeias de Markov , Ligação Proteica , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Transcrição GênicaRESUMO
MOTIVATION: Recent experimental advancements allow determining positions of nucleosomes for complete genomes. However, the resulting nucleosome occupancy maps are averages of heterogeneous cell populations. Accordingly, they represent a snapshot of a dynamic ensemble at a single time point with an overlay of many configurations from different cells. To study the organization of nucleosomes along the genome and to understand the mechanisms of nucleosome translocation, it is necessary to retrieve features of specific conformations from the population average. RESULTS: Here, we present a method for identifying non-overlapping nucleosome configurations that combines binary-variable analysis and a Monte Carlo approach with a simulated annealing scheme. In this manner, we obtain specific nucleosome configurations and optimized solutions for the complex positioning patterns from experimental data. We apply the method to compare nucleosome positioning at transcription factor binding sites in different mouse cell types. Our method can model nucleosome translocations at regulatory genomic elements and generate configurations for simulations of the spatial folding of the nucleosome chain. AVAILABILITY: Source code, precompiled binaries, test data and a web-based test installation are freely available at http://bioinformatics.fh-stralsund.de/nucpos/
Assuntos
Método de Monte Carlo , Nucleossomos/química , Animais , Sítios de Ligação , Diferenciação Celular , Camundongos , Nucleossomos/metabolismo , Ligação Proteica/genética , Fatores de Transcrição/química , Fatores de Transcrição/metabolismoRESUMO
BACKGROUND: In previous work, we designed a modified aptamer-free SELEX-seq protocol (afSELEX-seq) for the discovery of transcription factor binding sites. Here, we present original software, TFAST, designed to analyze afSELEX-seq data, validated against our previously generated afSELEX-seq dataset and a model dataset. TFAST is designed with a simple graphical interface (Java) so that it can be installed and executed without extensive expertise in bioinformatics. TFAST completes analysis within minutes on most personal computers. METHODOLOGY: Once afSELEX-seq data are aligned to a target genome, TFAST identifies peaks and, uniquely, compares peak characteristics between cycles. TFAST generates a hierarchical report of graded peaks, their associated genomic sequences, binding site length predictions, and dummy sequences. PRINCIPAL FINDINGS: Including additional cycles of afSELEX-seq improved TFAST's ability to selectively identify peaks, leading to 7,274, 4,255, and 2,628 peaks identified in two-, three-, and four-cycle afSELEX-seq. Inter-round analysis by TFAST identified 457 peaks as the strongest candidates for true binding sites. Separating peaks by TFAST into classes of worst, second-best and best candidate peaks revealed a trend of increasing significance (e-values 4.5 × 10(12), 2.9 × 10(-46), and 1.2 × 10(-73)) and informational content (11.0, 11.9, and 12.5 bits over 15 bp) of discovered motifs within each respective class. TFAST also predicted a binding site length (28 bp) consistent with non-computational experimentally derived results for the transcription factor PapX (22 to 29 bp). CONCLUSIONS/SIGNIFICANCE: TFAST offers a novel and intuitive approach for determining DNA binding sites of proteins subjected to afSELEX-seq. Here, we demonstrate that TFAST, using afSELEX-seq data, rapidly and accurately predicted sequence length and motif for a putative transcription factor's binding site.