ABSTRACT
A protein superfold is a type of protein fold that is observed in at least three distinct, non-homologous protein families. Structural classification studies have revealed a limited number of prevalent superfolds alongside several infrequent occurring folds, and in α/ß type superfolds, the C-terminal ß-strand tends to favor the edge of the ß-sheet, while the N-terminal ß-strand is often found in the middle. The reasons behind these observations, whether they are due to evolutionary sampling bias or physical interactions, remain unclear. This article offers a physics-based explanation for these observations, specifically for pure parallel ß-sheet topologies. Our investigation is grounded in several established structural rules that are based on physical interactions. We have identified "frustration-free topologies" which are topologies that can satisfy all the rules simultaneously. In contrast, topologies that cannot are termed "frustrated topologies." Our findings reveal that frustration-free topologies represent only a fraction of all theoretically possible patterns, these topologies strongly favor positioning the C-terminal ß-strand at the edge of the ß-sheet and the N-terminal ß-strand in the middle, and there is significant overlap between frustration-free topologies and superfolds. We also used a lattice protein model to thoroughly investigate sequence-structure relationships. Our results show that frustration-free structures are highly designable, while frustrated structures are poorly designable. These findings suggest that superfolds are highly designable due to their lack of frustration, and the preference for positioning C-terminal ß-strands at the edge of the ß-sheet is a direct result of frustration-free topologies. These insights not only enhance our understanding of sequence-structure relationships but also have significant implications for de novo protein design.
Subject(s)
Models, Molecular , Protein Folding , Proteins , Proteins/chemistry , Protein Conformation, beta-Strand , Computational Biology/methodsABSTRACT
Superfolds are folds commonly observed among evolutionarily unrelated multiple superfamilies of proteins. Since discovering superfolds almost two decades ago, structural rules distinguishing superfolds from the other ordinary folds have been explored but remained elusive. Here, we analyzed a typical superfold, the ferredoxin fold, and the fold which reverses the N to C terminus direction from the ferredoxin fold as a case study to find the rule to distinguish superfolds from the other folds. Though all the known structural characteristics for superfolds apply to both the ferredoxin fold and the reverse ferredoxin fold, the reverse fold has been found only in a single superfamily. The database analyses in the present study revealed the structural preferences of αß- and ßα-units; the preferences separate two α-helices in the ferredoxin fold, preventing their collision and stabilizing the fold. In contrast, in the reverse ferredoxin fold, the preferences bring two helices near each other, inducing structural conflict. The Rosetta folding simulations suggested that the ferredoxin fold is physically much more realizable than the reverse ferredoxin fold. Therefore, we propose that minimal structural conflict or minimal frustration among secondary structures is the rule to distinguish a superfold from ordinary folds. Intriguingly, the database analyses revealed that a most stringent structural rule in proteins, the right-handedness of the ßαß-unit, is broken in a set of structures to prevent the frustration, suggesting the proposed rule of minimum frustration among secondary structural units is comparably strong as the right-handedness rule of the ßαß-unit.
Subject(s)
Ferredoxins , Protein Folding , Ferredoxins/metabolism , Protein Conformation , Protein Structure, Secondary , Proteins/chemistryABSTRACT
Motivation: Protein structure alignment is a significant tool to understand evolutionary processes and physicochemical properties of proteins. Important targets of structure alignment are not only monomeric but also oligomeric proteins that sometimes include domain swapping or fusions. Although various protein structural alignment programs have been developed, no method is applicable to any protein pair regardless of the number of chain components and oligomeric states with retaining sequential restrictions: structurally equivalent regions must be aligned in the same order along protein sequences. Results: In this paper, we introduced a new sequential protein structural alignment algorithm MICAN-SQ, which is applicable to protein structures in all oligomeric states. In particular, MICAN-SQ allows the complicated structural alignments of proteins with domain swapping or fusion regions. To validate MICAN-SQ, alignment accuracies were evaluated using curated alignments of monomers and examples of domain swapping, and compared with those of pre-existing protein structural alignment programs. The results of this study show that MICAN-SQ has superior accuracy and robustness in comparison with previous programs and offers limited computational times. We also demonstrate that MICAN-SQ correctly aligns very large complexes and fused proteins. The present computations warrant the consideration of MICAN-SQ for studies of evolutionary and physicochemical properties of monomeric structures and all oligomer types. Availability and implementation: The MICAN program was implemented in C. The source code and executable file can be freely downloaded from http://www.tbp.cse.nagoya-u.ac.jp/MICAN/. Supplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)
Proteins/chemistry , Algorithms , Amino Acid Sequence , Protein Multimerization , SoftwareABSTRACT
As the number of structurally resolved protein-ligand complexes increases, the ligand-binding pockets of many proteins have been found to accommodate multiple different compounds. Effective use of these structural data is important for developing virtual screening (VS) methods that identify bioactive compounds. Here, we introduce a VS method, VS-APPLE (Virtual Screening Algorithm using Promiscuous Protein-Ligand complExes), based on promiscuous protein-ligand binding structures. In VS-APPLE, multiple ligands bound to a pocket are combined into a query template for screening. Both the structural match between a test compound and the multiple-ligand template and the possible collisions between the test compound and the target protein are evaluated by an efficient geometric hashing method. The performance of VS-APPLE was examined on a filtered, clustered version of the Directory of Useful Decoys data set. In Area Under the Curve analyses of this data set, VS-APPLE outperformed several popular screening programs. Judging from the performance of VS-APPLE, the structural data of promiscuous protein-ligand bindings could be further analyzed and exploited for developing VS methods.
Subject(s)
Algorithms , Drug Evaluation, Preclinical/methods , Models, Molecular , Proteins/chemistry , Proteins/metabolism , Benchmarking , Ligands , Protein Conformation , Substrate Specificity , User-Computer InterfaceABSTRACT
BACKGROUND: Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed. RESULTS: We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here. CONCLUSIONS: MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.
Subject(s)
Algorithms , Models, Molecular , Protein Structure, Secondary , Structural Homology, Protein , Proteins/chemistryABSTRACT
A fundamental question in protein evolution is whether nature has exhaustively sampled nearly all possible protein folds throughout evolution, or whether a large fraction of the possible folds remains unexplored. To address this question, we defined a set of rules for ß-sheet topology to predict novel αß-folds and carried out a systematic de novo protein design exploration of the novel αß-folds predicted by the rules. The designs for all eight of the predicted novel αß-folds with a four-stranded ß-sheet, including a knot-forming one, folded into structures close to the design models. Further, the rules predicted more than 10,000 novel αß-folds with five- to eight-stranded ß-sheets; this number far exceeds the number of αß-folds observed in nature so far. This result suggests that a vast number of αß-folds are possible, but have not emerged or have become extinct due to evolutionary bias.
Subject(s)
Protein Folding , Proteins , Protein Structure, Secondary , Proteins/chemistry , Protein Conformation, beta-StrandABSTRACT
The folding energy landscape of proteins has been suggested to be funnel-like with some degree of ruggedness on the slope. How complex the landscape, however, is still rather unclear. Many experiments for globular proteins suggested relative simplicity, whereas molecular simulations of shorter peptides implied more complexity. Here, by using complete conformational sampling of 2 globular proteins, protein G and src SH3 domain and 2 related random peptides, we investigated their energy landscapes, topological properties of folding networks, and folding dynamics. The projected energy surfaces of globular proteins were funneled in the vicinity of the native but also have other quite deep, accessible minima, whereas the randomized peptides have many local basins, including some leading to seriously misfolded forms. Dynamics in the denatured part of the network exhibited basin-hopping itinerancy among many conformations, whereas the protein reached relatively well-defined final stages that led to their native states. We also found that the folding network has the hierarchic nature characterized by the scale-free and the small-world properties.
Subject(s)
Computer Simulation , Nerve Tissue Proteins/chemistry , Protein Folding , src Homology Domains , Kinetics , Protein Conformation , ThermodynamicsABSTRACT
Enhancer-promoter interactions in eukaryotic genomes are often controlled by sequence elements that block the actions of enhancers. Although the experimental evidence suggests that those sequence elements contribute to forming loops of chromatin, the molecular mechanism of how such looping affects the enhancer-blocking activity is still largely unknown. In this article, the roles of DNA looping in enhancer blocking are investigated by numerically simulating the DNA conformation of a prototypical model system of gene regulation. The simulated results show that the enhancer function is indeed blocked when the enhancer is looped out so that it is separated from the promoter, which explains experimental observations of gene expression in the model system. The local structural distortion of DNA caused by looping is important for blocking, so the ability of looping to block enhancers can be lost when the loop length is much larger than the persistence length of the chain.
Subject(s)
DNA/chemistry , Enhancer Elements, Genetic/genetics , Nucleic Acid Conformation , Base Pairing/genetics , Base Sequence , Computer Simulation , DNA/genetics , Entropy , HeLa Cells , Humans , Monte Carlo Method , Promoter Regions, GeneticABSTRACT
Protein design is the inverse approach of the three-dimensional (3D) structure prediction for elucidating the relationship between the 3D structures and amino acid sequences. In general, the computation of the protein design involves a double loop: A loop for amino acid sequence changes and a loop for an exhaustive conformational search for each amino acid sequence. Herein, we propose a novel statistical mechanical design method using Bayesian learning, which can design lattice proteins without the exhaustive conformational search. We consider a thermodynamic hypothesis of the evolution of proteins and apply it to the prior distribution of amino acid sequences. Furthermore, we take the water effect into account in view of the grand canonical picture. As a result, on applying the 2D lattice hydrophobic-polar (HP) model, our design method successfully finds an amino acid sequence for which the target conformation has a unique ground state. However, the performance was not as good for the 3D lattice HP models compared to the 2D models. The performance of the 3D model improves on using a 20-letter lattice proteins. Furthermore, we find a strong linearity between the chemical potential of water and the number of surface residues, thereby revealing the relationship between protein structure and the effect of water molecules. The advantage of our method is that it greatly reduces computation time, because it does not require long calculations for the partition function corresponding to an exhaustive conformational search. As our method uses a general form of Bayesian learning and statistical mechanics and is not limited to lattice proteins, the results presented here elucidate some heuristics used successfully in previous protein design methods.
ABSTRACT
A wide range of de novo design of αß-proteins has been achieved based on the design rules, which describe secondary structure lengths and loop torsion patterns favorable for design target topologies. This paper proposes design rules for register shifts in ßαß-motifs, which have not been reported previously, but are necessary for determining a target structure of de novo design of αß-proteins. By analyzing naturally occurring protein structures in a database, we found preferences for register shifts in ßαß-motifs, and derived the following empirical rules: (1) register shifts must not be negative regardless of torsion types for a constituent loop in ßαß-motifs; (2) preferred register shifts strongly depend on the loop torsion types. To explain these empirical rules by physical interactions, we conducted physics-based simulations for systems mimicking a ßαß-motif that contains the most frequently observed loop type in the database. We performed an exhaustive conformational sampling of the loop region, imposing the exclusion volume and hydrogen bond satisfaction condition. The distributions of register shifts obtained from the simulations agreed well with those of the database analysis, indicating that the empirical rules are a consequence of physical interactions, rather than an evolutionary sampling bias. Our proposed design rules will serve as a guide to making appropriate target structures for the de novo design of αß-proteins.
Subject(s)
Amino Acid Motifs , Proteins/chemistry , Computer Simulation , Databases, Protein , Models, Molecular , Nuclear Magnetic Resonance, Biomolecular , Peptides/chemistry , Protein Binding , Protein Structure, Secondary , Statistics as TopicABSTRACT
Potential inhibitors of a target biomolecule, NAD-dependent deacetylase Sirtuin 1, were identified by a contest-based approach, in which participants were asked to propose a prioritized list of 400 compounds from a designated compound library containing 2.5 million compounds using in silico methods and scoring. Our aim was to identify target enzyme inhibitors and to benchmark computer-aided drug discovery methods under the same experimental conditions. Collecting compound lists derived from various methods is advantageous for aggregating compounds with structurally diversified properties compared with the use of a single method. The inhibitory action on Sirtuin 1 of approximately half of the proposed compounds was experimentally accessed. Ultimately, seven structurally diverse compounds were identified.
ABSTRACT
Currently, one of the most serious problems in protein-folding simulations for de novo structure prediction is conformational sampling of medium-to-large proteins. In vivo, folding of these proteins is mediated by molecular chaperones. Inspired by the functions of chaperonins, we designed a simple chaperonin-like simulation protocol within the framework of the standard fragment assembly method: in our protocol, the strength of the hydrophobic interaction is periodically modulated to help the protein escape from misfolded structures. We tested this protocol for 38 proteins and found that, using a certain defined criterion of success, our method could successfully predict the native structures of 14 targets, whereas only those of 10 targets were successfully predicted using the standard protocol. In particular, for non-alpha-helical proteins, our method yielded significantly better predictions than the standard approach. This chaperonin-inspired protocol that enhanced de novo structure prediction using folding simulations may, in turn, provide new insights into the working principles underlying the chaperonin system.
Subject(s)
Chaperonins/chemistry , Chaperonins/ultrastructure , Models, Chemical , Models, Molecular , Protein Folding , Proteins/chemistry , Proteins/ultrastructure , Computer Simulation , Protein ConformationABSTRACT
In protein structures, the fold is described according to the spatial arrangement of secondary structure elements (SSEs: α-helices and ß-strands) and their connectivity. The connectivity or the pattern of links among SSEs is one of the most important factors for understanding the variety of protein folds. In this study, we introduced the connectivity strings that encode the connectivities by using the types, positions, and connections of SSEs, and computationally enumerated all the connectivities of two-layer αß sandwiches. The calculated connectivities were compared with those in natural proteins determined using MICAN, a nonsequential structure comparison method. For 2α-4ß, among 23,000 of all connectivities, only 48 were free from irregular connectivities such as loop crossing. Of these, only 20 were found in natural proteins and the superfamilies were biased toward certain types of connectivities. A similar disproportional distribution was confirmed for most of other spatial arrangements of SSEs in the two-layer αß sandwiches. We found two connectivity rules that explain the bias well: the abundances of interlayer connecting loops that bridge SSEs in the distinct layers; and nonlocal ß-strand pairs, two spatially adjacent ß-strands located at discontinuous positions in the amino acid sequence. A two-dimensional plot of these two properties indicated that the two connectivity rules are not independent, which may be interpreted as a rule for the cooperativity of proteins.
Subject(s)
Ferredoxins/chemistry , Binding Sites , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Folding , Protein Interaction Domains and MotifsABSTRACT
We propose a new iterative screening contest method to identify target protein inhibitors. After conducting a compound screening contest in 2014, we report results acquired from a contest held in 2015 in this study. Our aims were to identify target enzyme inhibitors and to benchmark a variety of computer-aided drug discovery methods under identical experimental conditions. In both contests, we employed the tyrosine-protein kinase Yes as an example target protein. Participating groups virtually screened possible inhibitors from a library containing 2.4 million compounds. Compounds were ranked based on functional scores obtained using their respective methods, and the top 181 compounds from each group were selected. Our results from the 2015 contest show an improved hit rate when compared to results from the 2014 contest. In addition, we have successfully identified a statistically-warranted method for identifying target inhibitors. Quantitative analysis of the most successful method gave additional insights into important characteristics of the method used.
Subject(s)
Drug Discovery/methods , Enzyme Inhibitors/pharmacology , High-Throughput Screening Assays/methods , Protein Kinase Inhibitors/pharmacology , Proto-Oncogene Proteins c-yes/antagonists & inhibitors , Enzyme Inhibitors/chemistry , Enzyme Inhibitors/metabolism , Humans , Machine Learning , Molecular Structure , Protein Binding , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/metabolism , Proto-Oncogene Proteins c-yes/metabolism , Reproducibility of Results , Structure-Activity RelationshipABSTRACT
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse-grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent-induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 A for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence-specific terms have weak but positive roles. In the benchmark, well-predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up.
Subject(s)
Proteins/chemistry , Alanine , Amino Acid Sequence , Computer Simulation , Consensus Sequence , Hydrogen Bonding , Models, Theoretical , Potentiometry , Protein Folding , Protein Structure, Tertiary , ThermodynamicsABSTRACT
A simple statistical mechanical model proposed by Wako and Saitô has explained the aspects of protein folding surprisingly well. This model was systematically applied to multiple proteins by Muñoz and Eaton and has since been referred to as the Wako-Saitô-Muñoz-Eaton (WSME) model. The success of the WSME model in explaining the folding of many proteins has verified the hypothesis that the folding is dominated by native interactions, which makes the energy landscape globally biased toward native conformation. Using the WSME and other related models, Saitô emphasized the importance of the hierarchical pathway in protein folding; folding starts with the creation of contiguous segments having a native-like configuration and proceeds as growth and coalescence of these segments. The Φ-values calculated for barnase with the WSME model suggested that segments contributing to the folding nucleus are similar to the structural modules defined by the pattern of native atomic contacts. The WSME model was extended to explain folding of multi-domain proteins having a complex topology, which opened the way to comprehensively understanding the folding process of multi-domain proteins. The WSME model was also extended to describe allosteric transitions, indicating that the allosteric structural movement does not occur as a deterministic sequential change between two conformations but as a stochastic diffusive motion over the dynamically changing energy landscape. Statistical mechanical viewpoint on folding, as highlighted by the WSME model, has been renovated in the context of modern methods and ideas, and will continue to provide insights on equilibrium and dynamical features of proteins.
ABSTRACT
We discuss methods and ideas of virtual screening (VS) for drug discovery by examining the performance of VS-APPLE, a recently developed VS method, which extensively utilizes the tendency of single binding pockets to bind diversely different ligands, i.e. promiscuity of binding pockets. In VS-APPLE, multiple ligands bound to a pocket are spatially arranged by maximizing structural overlap of the protein while keeping their relative position and orientation with respect to the pocket surface, which are then combined into a multiple-ligand template for screening test compounds. To greatly reduce the computational cost, comparison of test compound structures are made only with limited regions of the multiple-ligand template. Even when we use the narrow regions with most densely populated atoms for the comparison, VSAPPLE outperforms other conventional VS methods in terms of Area Under the Curve (AUC) measure. This region with densely populated atoms corresponds to the consensus region among multiple ligands. It is typically observed that expansion of the sampled region including more atoms improves screening efficiency. However, for some target proteins, considering only a small consensus region is enough for the effective screening of test compounds. These results suggest that the performance test of VS methods sheds light on the mechanisms of protein-ligand interactions, and elucidation of the protein-ligand interactions should further help improvement of VS methods.
ABSTRACT
A search of broader range of chemical space is important for drug discovery. Different methods of computer-aided drug discovery (CADD) are known to propose compounds in different chemical spaces as hit molecules for the same target protein. This study aimed at using multiple CADD methods through open innovation to achieve a level of hit molecule diversity that is not achievable with any particular single method. We held a compound proposal contest, in which multiple research groups participated and predicted inhibitors of tyrosine-protein kinase Yes. This showed whether collective knowledge based on individual approaches helped to obtain hit compounds from a broad range of chemical space and whether the contest-based approach was effective.
Subject(s)
Drug Evaluation, Preclinical , Protein Kinase Inhibitors/analysis , Protein Kinase Inhibitors/pharmacology , Proto-Oncogene Proteins c-yes/antagonists & inhibitors , Humans , Principal Component Analysis , Proto-Oncogene Proteins c-yes/chemistry , Reproducibility of Results , src-Family Kinases/metabolismABSTRACT
It has been known that topologically different proteins of the same class sometimes share the same spatial arrangement of secondary structure elements (SSEs). However, the frequency by which topologically different structures share the same spatial arrangement of SSEs is unclear. It is important to estimate this frequency because it provides both a deeper understanding of the geometry of protein folds and a valuable suggestion for predicting protein structures with novel folds. Here we clarified the frequency with which protein folds share the same SSE packing arrangement with other folds, the types of spatial arrangement of SSEs that are frequently observed across different folds, and the diversity of protein folds that share the same spatial arrangement of SSEs with a given fold, using a protein structure alignment program MICAN, which we have been developing. By performing comprehensive structural comparison of SCOP fold representatives, we found that approximately 80% of protein folds share the same spatial arrangement of SSEs with other folds. We also observed that many protein pairs that share the same spatial arrangement of SSEs belong to the different classes, often with an opposing N- to C-terminal direction of the polypeptide chain. The most frequently observed spatial arrangement of SSEs was the 2-layer α/ß packing arrangement and it was dispersed among as many as 27% of SCOP fold representatives. These results suggest that the same spatial arrangements of SSEs are adopted by a wide variety of different folds and that the spatial arrangement of SSEs is highly robust against the N- to C-terminal direction of the polypeptide chain.
Subject(s)
Protein Folding , Protein Structure, Secondary , Proteins/chemistryABSTRACT
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.