|

1.

Targeting circadian transcriptional programs in triple negative breast cancer through a cis-regulatory mechanism.

Pan, Yuanzhong; Chiu, Tsu-Pei; Zhou, Lili; Chan, Priscilla; Kuo, Tia Tyrsett; Battaglin, Francesca; Soni, Shivani; Jayachandran, Priya; Li, Jingyi Jessica; Lenz, Heinz-Josef; Mumenthaler, Shannon M; Rohs, Remo; Torres, Evanthia Roussos; Kay, Steve A.

bioRxiv ; 2024 May 15.

Article En | MEDLINE | ID: mdl-38746115

Circadian clock genes are emerging targets in many types of cancer, but their mechanistic contributions to tumor progression are still largely unknown. This makes it challenging to stratify patient populations and develop corresponding treatments. In this work, we show that in breast cancer, the disrupted expression of circadian genes has the potential to serve as biomarkers. We also show that the master circadian transcription factors (TFs) BMAL1 and CLOCK are required for the proliferation of metastatic mesenchymal stem-like (mMSL) triple-negative breast cancer (TNBC) cells. Using currently available small molecule modulators, we found that a stabilizer of cryptochrome 2 (CRY2), the direct repressor of BMAL1 and CLOCK transcriptional activity, synergizes with inhibitors of proteasome, which is required for BMAL1 and CLOCK function, to repress a transcriptional program comprising circadian cycling genes in mMSL TNBC cells. Omics analyses on drug-treated cells implied that this repression of transcription is mediated by the transcription factor binding sites (TFBSs) features in the cis-regulatory elements (CRE) of clock-controlled genes. Through a massive parallel reporter assay, we defined a set of CRE features that are potentially repressed by the specific drug combination. The identification of cis -element enrichment may serve as a new way of defining and targeting tumor types through the modulation of cis -regulatory programs, and ultimately provide a new paradigm of therapy design for cancer types with unclear drivers like TNBC.

2.

Predicting DNA structure using a deep learning method.

Li, Jinsen; Chiu, Tsu-Pei; Rohs, Remo.

Nat Commun ; 15(1): 1243, 2024 Feb 09.

Article En | MEDLINE | ID: mdl-38336958

Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k-mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies.

Deep Learning , Proteins/metabolism , Protein Binding , Machine Learning , DNA/metabolism

3.

Probing the role of the protonation state of a minor groove-linker histidine in Exd-Hox-DNA binding.

Jiang, Yibei; Chiu, Tsu-Pei; Mitra, Raktim; Rohs, Remo.

Biophys J ; 123(2): 248-259, 2024 Jan 16.

Article En | MEDLINE | ID: mdl-38130056

DNA recognition and targeting by transcription factors (TFs) through specific binding are fundamental in biological processes. Furthermore, the histidine protonation state at the TF-DNA binding interface can significantly influence the binding mechanism of TF-DNA complexes. Nevertheless, the role of histidine in TF-DNA complexes remains underexplored. Here, we employed all-atom molecular dynamics simulations using AlphaFold2-modeled complexes based on previously solved co-crystal structures to probe the role of the His-12 residue in the Extradenticle (Exd)-Sex combs reduced (Scr)-DNA complex when binding to Scr and Ultrabithorax (Ubx) target sites. Our results demonstrate that the protonation state of histidine notably affected the DNA minor-groove width profile and binding free energy. Examining flanking sequences of various binding affinities derived from SELEX-seq experiments, we analyzed the relationship between binding affinity and specificity. We uncovered how histidine protonation leads to increased binding affinity but can lower specificity. Our findings provide new mechanistic insights into the role of histidine in modulating TF-DNA binding.

Drosophila Proteins , Homeodomain Proteins , Animals , Homeodomain Proteins/genetics , Histidine , Drosophila Proteins/metabolism , Drosophila melanogaster/metabolism , DNA/chemistry , Binding Sites , Transcription Factors/metabolism

4.

Deep DNAshape: Predicting DNA shape considering extended flanking regions using a deep learning method.

Li, Jinsen; Chiu, Tsu-Pei; Rohs, Remo.

bioRxiv ; 2023 Oct 24.

Article En | MEDLINE | ID: mdl-37961633

Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA shape plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k -mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, refined DNA shape features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing a deeper understanding of the effects of flanking regions on DNA shape in a target region of a sequence. Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as a versatile and powerful tool for diverse DNA structure-related studies.

5.

Physicochemical models of protein-DNA binding with standard and modified base pairs.

Chiu, Tsu-Pei; Rao, Satyanarayan; Rohs, Remo.

Proc Natl Acad Sci U S A ; 120(4): e2205796120, 2023 01 24.

Article En | MEDLINE | ID: mdl-36656856

DNA-binding proteins play important roles in various cellular processes, but the mechanisms by which proteins recognize genomic target sites remain incompletely understood. Functional groups at the edges of the base pairs (bp) exposed in the DNA grooves represent physicochemical signatures. As these signatures enable proteins to form specific contacts between protein residues and bp, their study can provide mechanistic insights into protein-DNA binding. Existing experimental methods, such as X-ray crystallography, can reveal such mechanisms based on physicochemical interactions between proteins and their DNA target sites. However, the low throughput of structural biology methods limits mechanistic insights for selection of many genomic sites. High-throughput binding assays enable prediction of potential target sites by determining relative binding affinities of a protein to massive numbers of DNA sequences. Many currently available computational methods are based on the sequence of standard Watson-Crick bp. They assume that the contribution of overall binding affinity is independent for each base pair, or alternatively include dinucleotides or short k-mers. These methods cannot directly expand to physicochemical contacts, and they are not suitable to apply to DNA modifications or non-Watson-Crick bp. These variations include DNA methylation, and synthetic or mismatched bp. The proposed method, DeepRec, can predict relative binding affinities as function of physicochemical signatures and the effect of DNA methylation or other chemical modifications on binding. Sequence-based modeling methods are in comparison a coarse-grain description and cannot achieve such insights. Our chemistry-based modeling framework provides a path towards understanding genome function at a mechanistic level.

DNA-Binding Proteins , DNA , Base Pairing , DNA/metabolism , Protein Binding , DNA-Binding Proteins/metabolism , Binding Sites

6.

DeepPBS: Geometric deep learning for interpretable prediction of protein-DNA binding specificity.

Mitra, Raktim; Li, Jinsen; Sagendorf, Jared M; Jiang, Yibei; Chiu, Tsu-Pei; Rohs, Remo.

bioRxiv ; 2023 Dec 16.

Article En | MEDLINE | ID: mdl-38293168

Predicting specificity in protein-DNA interactions is a challenging yet essential task for understanding gene regulation. Here, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity across protein families based on protein-DNA structures. The DeepPBS architecture allows investigation of different family-specific recognition patterns. DeepPBS can be applied to predicted structures, and can aid in the modeling of protein-DNA complexes. DeepPBS is interpretable and can be used to calculate protein heavy atom-level importance scores, demonstrated as a case-study on p53-DNA interface. When aggregated at the protein residue level, these scores conform well with alanine scanning mutagenesis experimental data. The inference time for DeepPBS is sufficiently fast for analyzing simulation trajectories, as demonstrated on a molecular-dynamics simulation of a Drosophila Hox-DNA tertiary complex with its cofactor. DeepPBS and its corresponding data resources offer a foundation for machine-aided protein-DNA interaction studies, guiding experimental choices and complex design, as well as advancing our understanding of molecular interactions.

7.

It is in the flanks: Conformational flexibility of transcription factor binding sites.

Chiu, Tsu-Pei; Li, Jinsen; Jiang, Yibei; Rohs, Remo.

Biophys J ; 121(20): 3765-3767, 2022 10 18.

Article En | MEDLINE | ID: mdl-36182667

Transcription Factors , Protein Binding , Protein Conformation , Binding Sites

8.

Top-Down Crawl: a method for the ultra-rapid and motif-free alignment of sequences with associated binding metrics.

Cooper, Brendon H; Chiu, Tsu-Pei; Rohs, Remo.

Bioinformatics ; 38(22): 5121-5123, 2022 11 15.

Article En | MEDLINE | ID: mdl-36179084

SUMMARY: Several high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer. However, understanding where a k-mer is positioned along a binding site sequence depends on alignment. Here, we present Top-Down Crawl (TDC), an ultra-rapid tool designed for the alignment of k-mer level data in a rank-dependent and position weight matrix (PWM)-independent manner. As the framework only depends on the rank of the input, the method can accept input from many types of experiments (protein binding microarray, SELEX-seq, SMiLE-seq, etc.) without the need for specialized parameterization. Measuring the performance of the alignment using multiple linear regression with 5-fold cross-validation, we find TDC to perform as well as or better than computationally expensive PWM-based methods. AVAILABILITY AND IMPLEMENTATION: TDC can be run online at https://topdowncrawl.usc.edu or locally as a python package available through pip at https://pypi.org/project/TopDownCrawl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Software , Position-Specific Scoring Matrices , Binding Sites , Sequence Analysis, DNA/methods , Protein Binding

9.

Macrophages activated by hepatitis B virus have distinct metabolic profiles and suppress the virus via IL-1ß to downregulate PPARα and FOXO3.

Li, Yumei; Zhu, Yanwen; Feng, Shu; Ishida, Yuji; Chiu, Tsu-Pei; Saito, Takeshi; Wang, Sean; Ann, David K; Ou, Jing-Hsiung James.

Cell Rep ; 40(1): 111068, 2022 Jul 05.

Article En | MEDLINE | ID: mdl-35793631

10.

Macrophages activated by hepatitis B virus have distinct metabolic profiles and suppress the virus via IL-1ß to downregulate PPARα and FOXO3.

Li, Yumei; Zhu, Yanwen; Feng, Shu; Ishida, Yuji; Chiu, Tsu-Pei; Saito, Takeshi; Wang, Sean; Ann, David K; Ou, Jing-Hsiung James.

Cell Rep ; 38(4): 110284, 2022 01 25.

Article En | MEDLINE | ID: mdl-35081341

Macrophages display phenotypic plasticity and can be induced by hepatitis B virus (HBV) to undergo either M1-like pro-inflammatory or M2-like anti-inflammatory polarization. Here, we report that M1-like macrophages stimulated by HBV exhibit a strong HBV-suppressive effect, which is diminished in M2-like macrophages. Transcriptomic analysis reveals that HBV induces the expression of interleukin-1ß (IL-1ß) in M1-like macrophages, which display a high oxidative phosphorylation (OXPHOS) activity distinct from that of conventional M1-like macrophages. Further analysis indicates that OXPHOS attenuates the expression of IL-1ß, which suppresses the expression of peroxisome proliferator-activated receptor α (PPARα) and forkhead box O3 (FOXO3) in hepatocytes to suppress HBV gene expression and replication. Moreover, multiple HBV proteins can induce the expression of IL-1ß in macrophages. Our results thus indicate that macrophages can respond to HBV by producing IL-1ß to suppress HBV replication. However, HBV can also metabolically reprogram macrophages to enhance OXPHOS to minimize this host antiviral response.

Forkhead Box Protein O3/immunology , Hepatitis B/immunology , Interleukin-1beta/immunology , Macrophages/immunology , Macrophages/virology , PPAR gamma/immunology , Animals , Down-Regulation , Forkhead Box Protein O3/metabolism , Hepatitis B virus , Host-Pathogen Interactions/immunology , Humans , Interleukin-1beta/metabolism , Macrophages/metabolism , Male , Mice , Mice, Inbred C57BL , PPAR gamma/metabolism , Virus Replication/immunology

11.

Epigenetic competition reveals density-dependent regulation and target site plasticity of phosphorothioate epigenetics in bacteria.

Wu, Xiaolin; Cao, Bo; Aquino, Patricia; Chiu, Tsu-Pei; Chen, Chao; Jiang, Susu; Deng, Zixin; Chen, Shi; Rohs, Remo; Wang, Lianrong; Galagan, James E; Dedon, Peter C.

Proc Natl Acad Sci U S A ; 117(25): 14322-14330, 2020 06 23.

Article En | MEDLINE | ID: mdl-32518115

Phosphorothioate (PT) DNA modifications-in which a nonbonding phosphate oxygen is replaced with sulfur-represent a widespread, horizontally transferred epigenetic system in prokaryotes and have a highly unusual property of occupying only a small fraction of available consensus sequences in a genome. Using Salmonella enterica as a model, we asked a question of fundamental importance: How do the PT-modifying DndA-E proteins select their GPSAAC/GPSTTC targets? Here, we applied innovative analytical, sequencing, and computational tools to discover a novel behavior for DNA-binding proteins: The Dnd proteins are "parked" at the G6mATC Dam methyltransferase consensus sequence instead of the expected GAAC/GTTC motif, with removal of the 6mA permitting extensive PT modification of GATC sites. This shift in modification sites further revealed a surprising constancy in the density of PT modifications across the genome. Computational analysis showed that GAAC, GTTC, and GATC share common features of DNA shape, which suggests that PT epigenetics are regulated in a density-dependent manner partly by DNA shape-driven target selection in the genome.

Bacteria/genetics , Bacteria/metabolism , DNA, Bacterial/metabolism , Epigenesis, Genetic/physiology , Epigenomics , Phosphates/metabolism , 2-Aminopurine , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Base Sequence , Binding Sites , Consensus Sequence , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , DNA-Binding Proteins/metabolism , Escherichia coli/metabolism , Genome, Bacterial , Salmonella enterica/genetics

12.

TFBSshape: an expanded motif database for DNA shape features of transcription factor binding sites.

Chiu, Tsu-Pei; Xin, Beibei; Markarian, Nicholas; Wang, Yingfei; Rohs, Remo.

Nucleic Acids Res ; 48(D1): D246-D255, 2020 01 08.

Article En | MEDLINE | ID: mdl-31665425

TFBSshape (https://tfbsshape.usc.edu) is a motif database for analyzing structural profiles of transcription factor binding sites (TFBSs). The main rationale for this database is to be able to derive mechanistic insights in protein-DNA readout modes from sequencing data without available structures. We extended the quantity and dimensionality of TFBSshape, from mostly in vitro to in vivo binding and from unmethylated to methylated DNA. This new release of TFBSshape improves its functionality and launches a responsive and user-friendly web interface for easy access to the data. The current expansion includes new entries from the most recent collections of transcription factors (TFs) from the JASPAR and UniPROBE databases, methylated TFBSs derived from in vitro high-throughput EpiSELEX-seq binding assays and in vivo methylated TFBSs from the MeDReaders database. TFBSshape content has increased to 2428 structural profiles for 1900 TFs from 39 different species. The structural profiles for each TFBS entry now include 13 shape features and minor groove electrostatic potential for standard DNA and four shape features for methylated DNA. We improved the flexibility and accuracy for the shape-based alignment of TFBSs and designed new tools to compare methylated and unmethylated structural profiles of TFs and methods to derive DNA shape-preserving nucleotide mutations in TFBSs.

DNA/chemistry , Databases, Genetic , Transcription Factors/metabolism , Binding Sites , DNA/metabolism , DNA Methylation , Mutation , Nucleotide Motifs , Protein Binding , Sequence Analysis, DNA

13.

Systematic prediction of DNA shape changes due to CpG methylation explains epigenetic effects on protein-DNA binding.

Rao, Satyanarayan; Chiu, Tsu-Pei; Kribelbauer, Judith F; Mann, Richard S; Bussemaker, Harmen J; Rohs, Remo.

Epigenetics Chromatin ; 11(1): 6, 2018 02 06.

Article En | MEDLINE | ID: mdl-29409522

BACKGROUND: DNA shape analysis has demonstrated the potential to reveal structure-based mechanisms of protein-DNA binding. However, information about the influence of chemical modification of DNA is limited. Cytosine methylation, the most frequent modification, represents the addition of a methyl group at the major groove edge of the cytosine base. In mammalian genomes, cytosine methylation most frequently occurs at CpG dinucleotides. In addition to changing the chemical signature of C/G base pairs, cytosine methylation can affect DNA structure. Since the original discovery of DNA methylation, major efforts have been made to understand its effect from a sequence perspective. Compared to unmethylated DNA, however, little structural information is available for methylated DNA, due to the limited number of experimentally determined structures. To achieve a better mechanistic understanding of the effect of CpG methylation on local DNA structure, we developed a high-throughput method, methyl-DNAshape, for predicting the effect of cytosine methylation on DNA shape. RESULTS: Using our new method, we found that CpG methylation significantly altered local DNA shape. Four DNA shape features-helix twist, minor groove width, propeller twist, and roll-were considered in this analysis. Distinct distributions of effect size were observed for different features. Roll and propeller twist were the DNA shape features most strongly affected by CpG methylation with an effect size depending on the local sequence context. Methylation-induced changes in DNA shape were predictive of the measured rate of cleavage by DNase I and suggest a possible mechanism for some of the methylation sensitivities that were recently observed for human Pbx-Hox complexes. CONCLUSIONS: CpG methylation is an important epigenetic mark in the mammalian genome. Understanding its role in protein-DNA recognition can further our knowledge of gene regulation. Our high-throughput methyl-DNAshape method can be used to predict the effect of cytosine methylation on DNA shape and its subsequent influence on protein-DNA interactions. This approach overcomes the limited availability of experimental DNA structures that contain 5-methylcytosine.

DNA Methylation , DNA-Binding Proteins/metabolism , DNA/chemistry , Mammals/genetics , Animals , Base Sequence , CpG Islands , Cytosine/chemistry , DNA/metabolism , DNA-Binding Proteins/chemistry , Epigenesis, Genetic , Humans , Mammals/metabolism , Models, Molecular , Nucleic Acid Conformation

14.

Experimental maps of DNA structure at nucleotide resolution distinguish intrinsic from protein-induced DNA deformations.

Azad, Robert N; Zafiropoulos, Dana; Ober, Douglas; Jiang, Yining; Chiu, Tsu-Pei; Sagendorf, Jared M; Rohs, Remo; Tullius, Thomas D.

Nucleic Acids Res ; 46(5): 2636-2647, 2018 03 16.

Article En | MEDLINE | ID: mdl-29390080

Recognition of DNA by proteins depends on DNA sequence and structure. Often unanswered is whether the structure of naked DNA persists in a protein-DNA complex, or whether protein binding changes DNA shape. While X-ray structures of protein-DNA complexes are numerous, the structure of naked cognate DNA is seldom available experimentally. We present here an experimental and computational analysis pipeline that uses hydroxyl radical cleavage to map, at single-nucleotide resolution, DNA minor groove width, a recognition feature widely exploited by proteins. For 11 protein-DNA complexes, we compared experimental maps of naked DNA minor groove width with minor groove width measured from X-ray co-crystal structures. Seven sites had similar minor groove widths as naked DNA and when bound to protein. For four sites, part of the DNA in the complex had the same structure as naked DNA, and part changed structure upon protein binding. We compared the experimental map with minor groove patterns of DNA predicted by two computational approaches, DNAshape and ORChID2, and found good but not perfect concordance with both. This experimental approach will be useful in mapping structures of DNA sequences for which high-resolution structural data are unavailable. This approach allows probing of protein family-dependent readout mechanisms.

DNA-Binding Proteins/metabolism , DNA/chemistry , Binding Sites , DNA/metabolism , Models, Molecular , Nucleic Acid Conformation , Nucleotides/chemistry , Protein Binding

15.

Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding.

Li, Jinsen; Sagendorf, Jared M; Chiu, Tsu-Pei; Pasi, Marco; Perez, Alberto; Rohs, Remo.

Nucleic Acids Res ; 45(22): 12877-12887, 2017 Dec 15.

Article En | MEDLINE | ID: mdl-29165643

Uncovering the mechanisms that affect the binding specificity of transcription factors (TFs) is critical for understanding the principles of gene regulation. Although sequence-based models have been used successfully to predict TF binding specificities, we found that including DNA shape information in these models improved their accuracy and interpretability. Previously, we developed a method for modeling DNA binding specificities based on DNA shape features extracted from Monte Carlo (MC) simulations. Prediction accuracies of our models, however, have not yet been compared to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) data or Molecular Dynamics (MD) simulations. Here, we integrated DNA shape information extracted from MC or MD simulations and XRC data into predictive models of TF binding and compared their performance. Models that incorporated structural information consistently showed improved performance over sequence-based models regardless of data source. Furthermore, we derived and validated nine additional DNA shape features beyond our original set of four features. The expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width, is available in our R/Bioconductor package DNAshapeR and enables a comprehensive structural description of the double helix on a genome-wide scale.

Algorithms , Computational Biology/methods , DNA/chemistry , Genome-Wide Association Study/methods , Transcription Factors/chemistry , Base Sequence , Crystallography, X-Ray , DNA/genetics , DNA/metabolism , Molecular Dynamics Simulation , Monte Carlo Method , Nucleic Acid Conformation , Protein Binding , Reproducibility of Results , Transcription Factors/metabolism

16.

Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein-DNA binding.

Chiu, Tsu-Pei; Rao, Satyanarayan; Mann, Richard S; Honig, Barry; Rohs, Remo.

Nucleic Acids Res ; 45(21): 12565-12576, 2017 Dec 01.

Article En | MEDLINE | ID: mdl-29040720

Protein-DNA binding is a fundamental component of gene regulatory processes, but it is still not completely understood how proteins recognize their target sites in the genome. Besides hydrogen bonding in the major groove (base readout), proteins recognize minor-groove geometry using positively charged amino acids (shape readout). The underlying mechanism of DNA shape readout involves the correlation between minor-groove width and electrostatic potential (EP). To probe this biophysical effect directly, rather than using minor-groove width as an indirect measure for shape readout, we developed a methodology, DNAphi, for predicting EP in the minor groove and confirmed the direct role of EP in protein-DNA binding using massive sequencing data. The DNAphi method uses a sliding-window approach to mine results from non-linear Poisson-Boltzmann (NLPB) calculations on DNA structures derived from all-atom Monte Carlo simulations. We validated this approach, which only requires nucleotide sequence as input, based on direct comparison with NLPB calculations for available crystal structures. Using statistical machine-learning approaches, we showed that adding EP as a biophysical feature can improve the predictive power of quantitative binding specificity models across 27 transcription factor families. High-throughput prediction of EP offers a novel way to integrate biophysical and genomic studies of protein-DNA binding.

DNA-Binding Proteins/metabolism , DNA/chemistry , Transcription Factors/metabolism , Binding Sites , DNA/metabolism , DNA-Binding Proteins/chemistry , Escherichia coli Proteins/metabolism , Factor For Inversion Stimulation Protein/metabolism , Genome , Genomics , Homeodomain Proteins/metabolism , Machine Learning , Models, Molecular , Monte Carlo Method , Nucleic Acid Conformation , Phosphates/chemistry , Protein Binding , Static Electricity , Transcription Factors/chemistry

17.

DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo.

Mathelier, Anthony; Xin, Beibei; Chiu, Tsu-Pei; Yang, Lin; Rohs, Remo; Wasserman, Wyeth W.

Cell Syst ; 3(3): 278-286.e4, 2016 09 28.

Article En | MEDLINE | ID: mdl-27546793

Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding in vitro. Here, we have provided a comprehensive evaluation of in vivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features (helix twist, minor groove width, propeller twist, and roll). Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides in lieu of PSSMs. Incorporating DNA shape information is most beneficial for E2F and MADS-domain TF families. Our findings indicate that incorporating DNA sequence and shape information benefits the modeling of TF binding under complex in vivo conditions.

Transcription Factors/chemistry , Base Sequence , Binding Sites , DNA , Humans , Protein Binding , Transcription Factors/metabolism

18.

DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding.

Chiu, Tsu-Pei; Comoglio, Federico; Zhou, Tianyin; Yang, Lin; Paro, Renato; Rohs, Remo.

Bioinformatics ; 32(8): 1211-3, 2016 04 15.

Article En | MEDLINE | ID: mdl-26668005

UNLABELLED: DNAshapeR predicts DNA shape features in an ultra-fast, high-throughput manner from genomic sequencing data. The package takes either nucleotide sequence or genomic coordinates as input and generates various graphical representations for visualization and further analysis. DNAshapeR further encodes DNA sequence and shape features as user-defined combinations of k-mer and DNA shape features. The resulting feature matrices can be readily used as input of various machine learning software packages for further modeling studies. AVAILABILITY AND IMPLEMENTATION: The DNAshapeR software package was implemented in the statistical programming language R and is freely available through the Bioconductor project at https://www.bioconductor.org/packages/devel/bioc/html/DNAshapeR.html and at the GitHub developer site, http://tsupeichiu.github.io/DNAshapeR/ CONTACT: rohs@usc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

DNA , Genomics , Software , Genome , Programming Languages

19.

Mechanistic insights into metal ion activation and operator recognition by the ferric uptake regulator.

Deng, Zengqin; Wang, Qing; Liu, Zhao; Zhang, Manfeng; Machado, Ana Carolina Dantas; Chiu, Tsu-Pei; Feng, Chong; Zhang, Qi; Yu, Lin; Qi, Lei; Zheng, Jiangge; Wang, Xu; Huo, XinMei; Qi, Xiaoxuan; Li, Xiaorong; Wu, Wei; Rohs, Remo; Li, Ying; Chen, Zhongzhou.

Nat Commun ; 6: 7642, 2015 Jul 02.

Article En | MEDLINE | ID: mdl-26134419

Ferric uptake regulator (Fur) plays a key role in the iron homeostasis of prokaryotes, such as bacterial pathogens, but the molecular mechanisms and structural basis of Fur-DNA binding remain incompletely understood. Here, we report high-resolution structures of Magnetospirillum gryphiswaldense MSR-1 Fur in four different states: apo-Fur, holo-Fur, the Fur-feoAB1 operator complex and the Fur-Pseudomonas aeruginosa Fur box complex. Apo-Fur is a transition metal ion-independent dimer whose binding induces profound conformational changes and confers DNA-binding ability. Structural characterization, mutagenesis, biochemistry and in vivo data reveal that Fur recognizes DNA by using a combination of base readout through direct contacts in the major groove and shape readout through recognition of the minor-groove electrostatic potential by lysine. The resulting conformational plasticity enables Fur binding to diverse substrates. Our results provide insights into metal ion activation and substrate recognition by Fur that suggest pathways to engineer magnetotactic bacteria and antipathogenic drugs.

Bacterial Proteins/metabolism , Cation Transport Proteins/genetics , DNA-Binding Proteins/metabolism , Iron/metabolism , Operator Regions, Genetic , Repressor Proteins/metabolism , Bacterial Proteins/genetics , Circular Dichroism , Crystallization , Magnetospirillum , Microscopy, Electron, Transmission , Protein Conformation , Pseudomonas aeruginosa , Real-Time Polymerase Chain Reaction , Repressor Proteins/genetics , Spectrum Analysis

20.

GBshape: a genome browser database for DNA shape annotations.

Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J; Parker, Stephen C J; Nuzhdin, Sergey V; Tullius, Thomas D; Rohs, Remo.

Nucleic Acids Res ; 43(Database issue): D103-9, 2015 Jan.

Article En | MEDLINE | ID: mdl-25326329

Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species.

DNA/chemistry , Databases, Nucleic Acid , Genome , Molecular Sequence Annotation , Web Browser , Animals , Binding Sites , Humans , Nucleic Acid Conformation , Nucleosomes/metabolism , Transcription Initiation Site