RESUMEN
Dynamic Monte Carlo simulations of the folding of a globular protein, apoplastocyanin, have been undertaken in the context of a new lattice model of proteins that includes both side chains and a-carbon backbone atoms and that can approximate native conformations at the level of 2 angstroms (root mean square) or better. Starting from random-coil unfolded states, the model apoplastocyanin was folded to a native conformation that is topologically similar to the real protein. The present simulations used a marginal propensity for local secondary structure consistent with but by no means enforcing the native conformation and a full hydrophobicity scale in which any nonbonded pair of side chains could interact. These molecules folded through a punctuated on-site mechanism of assembly where folding initiated at or near one of the turns ultimately found in the native conformation. Thus these simulations represent a partial solution to the globular-protein folding problem.
RESUMEN
The FK506-binding proteins (FKBPs) are a unique group of chaperones found in a wide variety of organisms. They perform a number of cellular functions including protein folding, regulation of cytokines, transport of steroid receptor complexes, nucleic acid binding, histone assembly, and modulation of apoptosis. These functions are mediated by specific domains that adopt distinct tertiary conformations. Using the Threading/ASSEmbly/Refinement (TASSER) approach, tertiary structures were predicted for a total of 45 FKBPs in 23 species. These models were compared with previously characterized FKBP solution structures and the predicted structures were employed to identify groups of homologous proteins. The resulting classification may be utilized to infer functional roles of newly discovered FKBPs. The three-dimensional conformations revealed that this family may have undergone several modifications throughout evolution, including loss of N- and C-terminal regions, duplication of FKBP domains as well as insertions of entire functional motifs. Docking simulations suggest that additional sequence segments outside FKBP domains may modulate the binding affinity of FKBPs to immunosuppressive drugs. The docking models also indicate the presence of a helix-loop-helix (HLH) region within a subset of FKBPs, which may be responsible for the interaction between this group of proteins and nucleic acids.
Asunto(s)
Proteínas de Unión a Tacrolimus/química , Proteínas de Unión a Tacrolimus/clasificación , Secuencia de Aminoácidos , Animales , Sitios de Unión , Bombyx , Simulación por Computador , Cristalografía por Rayos X , Humanos , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , ARN/metabolismo , Homología de Secuencia de Aminoácido , Proteínas de Unión a Tacrolimus/metabolismoRESUMEN
BACKGROUND: The ability to predict the native conformation of a globular protein from its amino-acid sequence is an important unsolved problem of molecular biology. We have previously reported a method in which reduced representations of proteins are folded on a lattice by Monte Carlo simulation, using statistically-derived potentials. When applied to sequences designed to fold into four-helix bundles, this method generated predicted conformations closely resembling the real ones. RESULTS: We now report a hierarchical approach to protein-structure prediction, in which two cycles of the above-mentioned lattice method (the second on a finer lattice) are followed by a full-atom molecular dynamics simulation. The end product of the simulations is thus a full-atom representation of the predicted structure. The application of this procedure to the 60 residue, B domain of staphylococcal protein A predicts a three-helix bundle with a backbone root mean square (rms) deviation of 2.25-3 A from the experimentally determined structure. Further application to a designed, 120 residue monomeric protein, mROP, based on the dimeric ROP protein of Escherichia coli, predicts a left turning, four-helix bundle native state. Although the ultimate assessment of the quality of this prediction awaits the experimental determination of the mROP structure, a comparison of this structure with the set of equivalent residues in the ROP dime- crystal structure indicates that they have a rms deviation of approximately 3.6-4.2 A. CONCLUSION: Thus, for a set of helical proteins that have simple native topologies, the native folds of the proteins can be predicted with reasonable accuracy from their sequences alone. Our approach suggest a direction for future work addressing the protein-folding problem.
RESUMEN
Structural genomics projects aim to solve the experimental structures of all possible protein folds. Such projects entail a conceptual shift from traditional structural biology in which structural information is obtained on known proteins to one in which the structure of a protein is determined first and the function assigned only later. Whereas the goal of converting protein structure into function can be accomplished by traditional sequence motif-based approaches, recent studies have shown that assignment of a protein's biochemical function can also be achieved by scanning its structure for a match to the geometry and chemical identity of a known active site. Importantly, this approach can use low-resolution structures provided by contemporary structure prediction methods. When applied to genomes, structural information (either experimental or predicted) is likely to play an important role in high-throughput function assignment.
Asunto(s)
Genoma , Biología Molecular/métodos , Pliegue de Proteína , Animales , Simulación por Computador , Bases de Datos Factuales , Evolución Molecular , Humanos , Internet , Relación Estructura-ActividadRESUMEN
Computational methods were used to predict the sequences of peptides that bind to the MHC class I molecule, K(b). The rules for predicting binding sequences, which are limited, are based on preferences for certain amino acids in certain positions of the peptide. It is apparent though, that binding can be influenced by the amino acids in all of the positions of the peptide. An artificial neural network (ANN) has the ability to simultaneously analyze the influence of all of the amino acids of the peptide and thus may improve binding predictions. ANNs were compared to statistically analyzed peptides for their abilities to predict the sequences of K(b) binding peptides. ANN systems were trained on a library of binding and nonbinding peptide sequences from a phage display library. Statistical and ANN methods identified strong binding peptides with preferred amino acids. ANNs detected more subtle binding preferences, enabling them to predict medium binding peptides. The ability to predict class I MHC molecule binding peptides is useful for immunolological therapies involving cytotoxic-T cells.
Asunto(s)
Antígenos H-2/metabolismo , Redes Neurales de la Computación , Oligopéptidos/química , Oligopéptidos/metabolismo , Aminoácidos/química , Animales , Bacteriófagos/genética , Sitios de Unión , Unión Competitiva , Pollos , Citotoxicidad Inmunológica , Epítopos/metabolismo , Cómputos Matemáticos , Ratones , Oligopéptidos/aislamiento & purificación , Ovalbúmina , Biblioteca de Péptidos , Linfocitos T Citotóxicos/inmunologíaRESUMEN
As part of an ongoing series of dynamic Monte Carlo simulations of globular protein folding, the nature of the folding pathway, of model four-member beta-barrels and four-helix bundles, under highly idealized conditions in vivo, has been examined. The ribosome is crudely modeled as an inert hard wall on to which the model protein chain is attached. Three cases are considered in detail. The first corresponds to post-translational assembly in which the fully synthesized chain is tethered to the wall and starts out under strongly denaturing conditions. The system is cooled down, and the chain is allowed to fold. Interestingly, the helical motif prefers to assemble parallel to the wall, whereas the beta-barrel, predominantly assembles with its principal axis perpendicular to the wall. In the former case, the dominant intermediate, the helical hairpin, is different from that in free solution, a three-helix bundle. The wall acts to reduce the expanse of configuration space that must be searched and aids in folding. Two situations that might lead to co-translational folding are also simulated. In the first case, to eliminate wall effects, the chain is slowly synthesized in free solution, and in the second case, it is slowly synthesized from the wall. In all cases, the chains are observed to fold post-translationally. While partially folded intermediates are observed during synthesis, they lack the stability to survive until chain synthesis is complete. The implications of these results for the folding in vivo of real protein chains is discussed, and a model of multiple domain protein folding is proposed.
Asunto(s)
Modelos Teóricos , Conformación Proteica , Proteínas , Modelos Estructurales , Método de Montecarlo , Biosíntesis de Proteínas , Desnaturalización Proteica , Proteínas/genética , Ribosomas/metabolismoRESUMEN
A long-standing problem of molecular biology is the prediction of globular protein tertiary structure from the primary sequence. In the context of a new, 24-nearest-neighbor lattice model of proteins that includes both alpha and beta-carbon atoms, the requirements for folding to a unique four-member beta-barrel, four-helix bundles and a model alpha/beta-bundle have been explored. A number of distinct situations are examined, but the common requirements for the formation of a unique native conformation are tertiary interactions plus the presence of relatively small (but not irrelevant) intrinsic turn preferences that select out the native conformer from a manifold of compact states. When side-chains are explicitly included, there are many conformations having the same or a slightly greater number of side-chain contacts as in the native conformation, and it is the local intrinsic turn preferences that produce the conformational selectivity on collapse. The local preference for helix or beta-sheet secondary structure may be at odds with the secondary structure ultimately found in the native conformation. The requisite intrinsic turn populations are about 0.3% for beta-proteins, 2% for mixed alpha/beta-proteins and 6% for helix bundles. In addition, an idealized model of an allosteric conformational transition has been examined. Folding occurs predominantly by a sequential on-site assembly mechanism with folding initiating either at a turn or from an isolated helix or beta-strand (where appropriate). For helical and beta-protein models, similar folding pathways were obtained in diamond lattice simulations, using an entirely different set of local Monte Carlo moves. This argues strongly that the results are universal; that is, they are independent of lattice, protein model or the particular realization of Monte Carlo dynamics. Overall, these simulations demonstrate that the folding of all known protein motifs can be achieved in the context of a single class of lattice models that includes realistic backbone structures and idealized side-chains.
Asunto(s)
Simulación por Computador , Modelos Moleculares , Método de Montecarlo , Conformación Proteica , Algoritmos , Sitio Alostérico , Desnaturalización Proteica , Relación Estructura-Actividad , TermodinámicaRESUMEN
In the context of a simplified diamond lattice model of a six-member, Greek key beta-barrel protein that is closely related in topology to plastocyanin, the nature of the folding and unfolding pathways have been investigated using dynamic Monte Carlo techniques. The mechanism of Greek key assembly is best described as punctuated "on site construction". Folding typically starts at or near a beta-turn, and then the beta-strands sequentially form by using existing folded structure as a scaffold onto which subsequent tertiary structure assembles. On average, beta-strands tend to zip up from one tight bend to the next. After the four-member, beta-barrel assembles, there is a long pause as the random coil portion of the chain containing the long loop thrahes about trying to find the native state. Thus, there is an entropic barrier that must be surmounted. However, while a given piece of the protein may be folding, another section may be unfolding. A competition therefore exists to assemble a fairly stable intermediate before it dissolves. Folding may initiate at any of the tight turns, but the turn closer to the N terminus seems to be preferred due to well-known excluded volume effects. When the protein first starts to fold, there are a multiplicity of folding pathways, but the number of options is reduced as the system gets closer to the native state. In the early stages, the excluded volume effect exerted by the already assembled protein helps subsequent assembly. Then, near the native conformation, the folded parts reduce the accessible conformational space available to the remaining unfolded sections. Unfolding essentially occurs in reverse. Employing a simple statistical mechanical theory, the configurational free energy along the reaction co-ordinate for this model has been constructed. The free energy surface, in agreement with the simulations, provides the following predictions. The transition state is quite near the native state, and consists of five of the six beta-strands being fully assembled, with the remaining long loop plus sixth beta-strand in place, but only partially assembled. It is separated from the beta-barrel intermediate by a free energy barrier of mainly entropic origin and from the native state by a barrier that is primarily energetic in origin. The latter feature is in agreement with the "Cardboard Box" model described by Goldenberg and Creighton but, unlike their model, the transition state is not a high-energy distorted form of the native state.(ABSTRACT TRUNCATED AT 250 WORDS)
Asunto(s)
Conformación Proteica , Algoritmos , Fenómenos Químicos , Química Física , Modelos Moleculares , Estructura Molecular , Método de Montecarlo , Plastocianina , Proteínas , Factores de TiempoRESUMEN
Dynamic Monte Carlo simulations of the folding pathways of alpha-helical protein motifs have been undertaken in the context of a diamond lattice model of globular proteins. The first question addressed in the nature of the assembly process of an alpha-helical hairpin. While the hairpin could, in principle, be formed via the diffusion-collision-adhesion of isolated performed helices, this is not the dominant mechanism of assembly found in the simulations. Rather, the helices that form native hairpins are constructed on-site, with folding initiating at or near the turn in almost all cases. Next, the folding/unfolding pathways of four-helix bundles having tight bends and one and two long loops in the native state are explored. Once again, an on-site construction mechanism of folding obtains, with a hairpin forming first, followed by the formation of a three-helix bundle, and finally the fourth helix of the native bundle assembles. Unfolding is essentially the reverse of folding. A simplified analytic theory is developed that reproduces the equilibrium folding transitions obtained from the simulations remarkably well and, for the dominant folding pathway, correctly identifies the intermediates seen in the simulations. The analytic theory provides the free energy along the reaction co-ordinate and identifies the transition state for all three motifs as being quite close to the native state, with three of the four helices assembled, and approximately one turn of the fourth helix in place. The transition state is separated from the native conformation by a free-energy barrier of mainly energetic origin and from the denatured state by a barrier of mainly entropic origin. The general features of the folding pathway seen in all variants of the model four-helix bundles are similar to those observed in the folding of beta-barrel, Greek key proteins; this suggests that many of the qualitative aspects of folding are invariant to the particular native state topology and secondary structure.
Asunto(s)
Conformación Proteica , Algoritmos , Fenómenos Químicos , Química Física , Modelos Moleculares , Estructura Molecular , Método de Montecarlo , ProteínasRESUMEN
We describe the most general solution to date of the problem of matching globular protein sequences to the appropriate three-dimensional structures. The screening template, against which sequences are tested, is provided by a protein "structural fingerprint" library based on the contact map and the buried/exposed pattern of residues. Then, a lattice Monte Carlo algorithm validates or dismisses the stability of the proposed fold. Examples of known structural similarities between proteins having weakly or unrelated sequences such as the globins and phycocyanins, the eight-member alpha/beta fold of triose phosphate isomerase and even a close structural equivalence between azurin and immunoglobulins are found.
Asunto(s)
Proteínas de Plantas/química , Plastocianina/química , Conformación Proteica , Algoritmos , Azurina/química , Proteínas de la Membrana Bacteriana Externa/química , Proteínas Bacterianas/química , Bases de Datos Factuales , Globinas/química , Cadenas lambda de Inmunoglobulina/genética , Modelos Moleculares , Ficocianina/química , Alineación de Secuencia , Relación Estructura-Actividad , TermodinámicaRESUMEN
The practical exploitation of the vast numbers of sequences in the genome sequence databases is crucially dependent on the ability to identify the function of each sequence. Unfortunately, current methods, including global sequence alignment and local sequence motif identification, are limited by the extent of sequence similarity between sequences of unknown and known function; these methods increasingly fail as the sequence identity diverges into and beyond the twilight zone of sequence identity. To address this problem, a novel method for identification of protein function based directly on the sequence-to-structure-to-function paradigm is described. Descriptors of protein active sites, termed "fuzzy functional forms" or FFFs, are created based on the geometry and conformation of the active site. By way of illustration, the active sites responsible for the disulfide oxidoreductase activity of the glutaredoxin/thioredoxin family and the RNA hydrolytic activity of the T1 ribonuclease family are presented. First, the FFFs are shown to correctly identify their corresponding active sites in a library of exact protein models produced by crystallography or NMR spectroscopy, most of which lack the specified activity. Next, these FFFs are used to screen for active sites in low-to-moderate resolution models produced by ab initio folding or threading prediction algorithms. Again, the FFFs can specifically identify the functional sites of these proteins from their predicted structures. The results demonstrate that low-to-moderate resolution models as produced by state-of-the-art tertiary structure prediction algorithms are sufficient to identify protein active sites. Prediction of a novel function for the gamma subunit of a yeast glycosyl transferase and prediction of the function of two hypothetical yeast proteins whose models were produced via threading are presented. This work suggests a means for the large-scale functional screening of genomic sequence databases based on the prediction of structure from sequence, then on the identification of functional active sites in the predicted structure.
Asunto(s)
Oxidorreductasas , Proteínas/química , Ribonucleasas/química , Tiorredoxinas/química , Algoritmos , Secuencia de Aminoácidos , Sitios de Unión/fisiología , Bases de Datos como Asunto , Proteínas Fúngicas/química , Proteínas Fúngicas/fisiología , Glutarredoxinas , Modelos Moleculares , Conformación Molecular , Datos de Secuencia Molecular , Proteína Disulfuro Reductasa (Glutatión)/química , Pliegue de Proteína , Estructura Terciaria de Proteína , Proteínas/fisiología , Ribonucleasas/fisiología , Alineación de Secuencia , Relación Estructura-ActividadRESUMEN
The MONSSTER (MOdeling of New Structures from Secondary and TEritary Restraints) method for folding of proteins using a small number of long-distance restraints (which can be up to seven times less than the total number of residues) and some knowledge of the secondary structure of regular fragments is described. The method employs a high-coordination lattice representation of the protein chain that incorporates a variety of potentials designed to produce protein-like behaviour. These include statistical preferences for secondary structure, side-chain burial interactions, and a hydrogen-bond potential. Using this algorithm, several globular proteins (1ctf, 2gbl, 2trx, 3fxn, 1mba, 1pcy and 6pti) have been folded to moderate-resolution, native-like compact states. For example, the 68 residue 1ctf molecule having ten loosely defined, long-range restraints was reproducibly obtained with a C alpha-backbone root-mean-square deviation (RMSD) from native of about 4. A. Flavodoxin with 35 restraints has been folded to structures whose average RMSD is 4.28 A. Furthermore, using just 20 restraints, myoglobin, which is a 146 residue helical protein, has been folded to structures whose average RMSD from native is 5.65 A. Plastocyanin with 25 long-range restraints adopts conformations whose average RMSD is 5.44 A. Possible applications of the proposed approach to the refinement of structures from NMR data, homology model-building and the determination of tertiary structure when the secondary structure and a small number of restraints are predicted are briefly discussed.
Asunto(s)
Algoritmos , Simulación por Computador , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Aprotinina/química , Proteínas Bacterianas/química , Gráficos por Computador , Flavodoxina/química , Mioglobina/química , Plastocianina/química , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Tiorredoxinas/químicaRESUMEN
The feasibility of predicting the global fold of small proteins by incorporating predicted secondary and tertiary restraints into ab initio folding simulations has been demonstrated on a test set comprised of 20 non-homologous proteins, of which one was a blind prediction of target 42 in the recent CASP2 contest. These proteins contain from 37 to 100 residues and represent all secondary structural classes and a representative variety of global topologies. Secondary structure restraints are provided by the PHD secondary structure prediction algorithm that incorporates multiple sequence information. Predicted tertiary restraints are derived from multiple sequence alignments via a two-step process. First, seed side-chain contacts are identified from correlated mutation analysis, and then a threading-based algorithm is used to expand the number of these seed contacts. A lattice-based reduced protein model and a folding algorithm designed to incorporate these predicted restraints is described. Depending upon fold complexity, it is possible to assemble native-like topologies whose coordinate root-mean-square deviation from native is between 3.0 A and 6.5 A. The requisite level of accuracy in side-chain contact map prediction can be roughly 25% on average, provided that about 60% of the contact predictions are correct within +/-1 residue and 95% of the predictions are correct within +/-4 residues. Precision in tertiary contact prediction is more critical than absolute accuracy. Furthermore, only a subset of the tertiary contacts, on the order of 25% of the total, is sufficient for successful topology assembly. Overall, this study suggests that the use of restraints derived from multiple sequence alignments combined with a fold assembly algorithm holds considerable promise for the prediction of the global topology of small proteins.
Asunto(s)
Pliegue de Proteína , Secuencia de Aminoácidos , Modelos Químicos , Datos de Secuencia Molecular , Método de Montecarlo , Estructura Secundaria de Proteína , Estructura Terciaria de ProteínaRESUMEN
The application of an automated method for the screening of protein activity based on the sequence-to-structure-to-function paradigm is presented for the complete Escherichia coli genome. First, the structure of the protein is identified from its sequence using a threading algorithm, which aligns the sequences to the best matching structure in a structural database and extends sequence analysis well beyond the limits of local sequence identity. Then, the active site is identified in the resulting sequence-to-structure alignment using a "fuzzy functional form" (FFF), a three-dimensional descriptor of the active site of a protein. Here, this sequence-to-structure-to-function concept is applied to analysis of the complete E. coli genome, i.e. all E. coli open reading frames (ORFs) are screened for the thiol-disulfide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. We show that the method can identify the active sites in ten sequences that are known to or proposed to exhibit this activity. Furthermore, oxidoreductase activity is predicted in two other sequences that have not been identified previously. This method distinguishes protein pairs with similar active sites from proteins pairs that are just topological cousins, i.e. those having similar global folds, but not necessarily similar active sites. Thus, this method provides a novel approach for extraction of active site and functional information based on three-dimensional structures, rather than simple sequence analysis. Prediction of protein activity is fully automated and easily extendible to new functions. Finally, it is demonstrated here that the method can be applied to complete genome database analysis.
Asunto(s)
Escherichia coli/genética , Genoma Bacteriano , Oxidorreductasas , Proteína Disulfuro Reductasa (Glutatión)/metabolismo , Proteínas/química , Tiorredoxinas/química , Algoritmos , Automatización , Sitios de Unión , Bases de Datos Factuales , Escherichia coli/química , Escherichia coli/enzimología , Glutarredoxinas , Sistemas de Lectura Abierta/genética , Conformación Proteica , Proteína Disulfuro Reductasa (Glutatión)/química , Proteína Disulfuro Reductasa (Glutatión)/genética , Pliegue de Proteína , Proteínas/genética , Proteínas/metabolismo , Alineación de Secuencia , Programas Informáticos , Relación Estructura-Actividad , Tiorredoxinas/genética , Tiorredoxinas/metabolismoRESUMEN
Using a simplified protein model, the equilibrium between different oligomeric species of the wild-type GCN4 leucine zipper and seven of its mutants have been predicted. Over the entire experimental concentration range, agreement with experiment is found in five cases, while in two cases agreement is found over a portion of the concentration range. These studies demonstrate a methodology for predicting coiled coil quaternary structure and allow for the dissection of the interactions responsible for the global fold. In agreement with the conclusion of Harbury et al., the results of the simulations indicate that the pattern of hydrophobic and hydrophilic residues alone is insufficient to define a protein's three-dimensional structure. In addition, these simulations indicate that the degree of chain association is determined by the balance between specific side-chain packing preferences and the entropy reduction associated with side-chain burial in higher-order multimers.
Asunto(s)
Simulación por Computador , Proteínas de Unión al ADN , Proteínas Fúngicas/química , Leucina Zippers , Conformación Proteica , Proteínas Quinasas/química , Proteínas de Saccharomyces cerevisiae , Enlace de Hidrógeno , Método de Montecarlo , Mutación , Pliegue de Proteína , TermodinámicaRESUMEN
A hierarchical approach is described for the prediction of the three-dimensional structure and folding pathway of the GCN4 leucine zipper. Dimer assembly is simulated by Monte Carlo dynamics. The resulting lowest energy structures undergo cooperative rearrangement of their hydrophobic core leading to side-chain fixation. The coarse-grained structures are further refined using a molecular dynamics annealing protocol. This produces full atom models with a backbone root-mean-square deviation from the crystal structure of 0.81 A. Thus, we demonstrate the predictive ability of our approach to yield high resolution structures of small coiled coils from their sequence.
Asunto(s)
Proteínas Fúngicas/química , Leucina Zippers , Pliegue de Proteína , Proteínas Quinasas/química , Estructura Secundaria de Proteína , Proteínas de Saccharomyces cerevisiae , Secuencia de Aminoácidos , Cristalografía por Rayos X , Proteínas de Unión al ADN/química , Proteínas Fúngicas/metabolismo , Modelos Moleculares , Datos de Secuencia Molecular , Método de Montecarlo , Proteínas Quinasas/metabolismoRESUMEN
The genome-sequencing projects are providing a detailed 'parts list' of life. A key to comprehending this list is understanding the function of each gene and each protein at various levels. Sequence-based methods for function prediction are inadequate because of the multifunctional nature of proteins. However, just knowing the structure of the protein is also insufficient for prediction of multiple functional sites. Structural descriptors for protein functional sites are crucial for unlocking the secrets in both the sequence and structural-genomics projects.
Asunto(s)
Genes , Proteínas/química , Proteínas/metabolismo , Proteínas/genética , Relación Estructura-ActividadRESUMEN
Knowledge-based potentials are used widely in protein folding and inverse folding algorithms. Two kinds of derivation methods are used. (1) The interactions in a database of known protein structures are assumed to obey a Boltzmann distribution. (2) The stability of the native folds relative to a manifold of misfolded structures is optimized. Here, a set of previously derived contact and secondary structure propensity potentials, taken as the "true" potentials, are employed to construct an artificial protein structural database from protein fragments. Then, new sets of potentials are derived to see how they are related to the true potentials. Using the Boltzmann distribution method, when the stability of the structures in the database lies within a certain range, both contact potentials and secondary structure propensities can be derived separately with remarkable accuracy. In general, the optimization method was found to be less accurate due to errors in the "excess energy" contribution. When the excess energy terms are kept as a constraint, the true potentials are recovered exactly.
Asunto(s)
Bases de Datos como Asunto , Pliegue de Proteína , Proteínas/química , Algoritmos , Aminoácidos/química , Estructura Secundaria de ProteínaRESUMEN
The Z-score of a protein is defined as the energy separation between the native fold and the average of an ensemble of misfolds in the units of the standard deviation of the ensemble. The Z-score is often used as a way of testing the knowledge-based potentials for their ability to recognize the native fold from other alternatives. However, it is not known what range of values the Z-scores should have if one had a correct potential. Here, we offer an estimate of Z-scores extracted from calorimetric measurements of proteins. The energies obtained from these experimental data are compared with those from computer simulations of a lattice model protein. It is suggested that the Z-scores calculated from different knowledge-based potentials are generally too small in comparison with the experimental values.
Asunto(s)
Conformación Proteica , Proteínas/química , Modelos QuímicosRESUMEN
Various existing derivations of the effective potentials of mean force for the two-body interactions between amino acid side chains in proteins are reviewed and compared to each other. The differences between different parameter sets can be traced to the reference state used to define the zero of energy. Depending on the reference state, the transfer free energy or other pseudo-one-body contributions can be present to various extents in two-body parameter sets. It is, however, possible to compare various derivations directly by concentrating on the "excess" energy-a term that describes the difference between a real protein and an ideal solution of amino acids. Furthermore, the number of protein structures available for analysis allows one to check the consistency of the derivation and the errors by comparing parameters derived from various subsets of the whole database. It is shown that pair interaction preferences are very consistent throughout the database. Independently derived parameter sets have correlation coefficients on the order of 0.8, with the mean difference between equivalent entries of 0.1 kT. Also, the low-quality (low resolution, little or no refinement) structures show similar regularities. There are, however, large differences between interaction parameters derived on the basis of crystallographic structures and structures obtained by the NMR refinement. The origin of the latter difference is not yet understood.