Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Nature ; 614(7949): 774-780, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36813896

RESUMO

De novo enzyme design has sought to introduce active sites and substrate-binding pockets that are predicted to catalyse a reaction of interest into geometrically compatible native scaffolds1,2, but has been limited by a lack of suitable protein structures and the complexity of native protein sequence-structure relationships. Here we describe a deep-learning-based 'family-wide hallucination' approach that generates large numbers of idealized protein structures containing diverse pocket shapes and designed sequences that encode them. We use these scaffolds to design artificial luciferases that selectively catalyse the oxidative chemiluminescence of the synthetic luciferin substrates diphenylterazine3 and 2-deoxycoelenterazine. The designed active sites position an arginine guanidinium group adjacent to an anion that develops during the reaction in a binding pocket with high shape complementarity. For both luciferin substrates, we obtain designed luciferases with high selectivity; the most active of these is a small (13.9 kDa) and thermostable (with a melting temperature higher than 95 °C) enzyme that has a catalytic efficiency on diphenylterazine (kcat/Km = 106 M-1 s-1) comparable to that of native luciferases, but a much higher substrate specificity. The creation of highly active and specific biocatalysts from scratch with broad applications in biomedicine is a key milestone for computational enzyme design, and our approach should enable generation of a wide range of luciferases and other enzymes.


Assuntos
Aprendizado Profundo , Luciferases , Biocatálise , Domínio Catalítico , Estabilidade Enzimática , Temperatura Alta , Luciferases/química , Luciferases/metabolismo , Luciferinas/metabolismo , Luminescência , Oxirredução , Especificidade por Substrato
2.
Nat Methods ; 21(1): 117-121, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37996753

RESUMO

Protein-RNA and protein-DNA complexes play critical roles in biology. Despite considerable recent advances in protein structure prediction, the prediction of the structures of protein-nucleic acid complexes without homology to known complexes is a largely unsolved problem. Here we extend the RoseTTAFold machine learning protein-structure-prediction approach to additionally predict nucleic acid and protein-nucleic acid complexes. We develop a single trained network, RoseTTAFoldNA, that rapidly produces three-dimensional structure models with confidence estimates for protein-DNA and protein-RNA complexes. Here we show that confident predictions have considerably higher accuracy than current state-of-the-art methods. RoseTTAFoldNA should be broadly useful for modeling the structure of naturally occurring protein-nucleic acid complexes, and for designing sequence-specific RNA and DNA-binding proteins.


Assuntos
Ácidos Nucleicos , RNA/química , Proteínas de Ligação a DNA/química , DNA/química
3.
Nature ; 600(7889): 547-552, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34853475

RESUMO

There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences1-3. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue-residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback-Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-'hallucinated' sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.


Assuntos
Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Cristalografia por Raios X , Alucinações , Humanos , Conformação Proteica , Proteínas/química , Proteínas/genética
4.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35641150

RESUMO

Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.


Assuntos
Mutação de Sentido Incorreto , Proteínas , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Mutação , Proteínas/química , Proteínas/genética
5.
Proc Natl Acad Sci U S A ; 118(11)2021 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-33712545

RESUMO

The protein design problem is to identify an amino acid sequence that folds to a desired structure. Given Anfinsen's thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the desired structure is the lowest energy state. As this calculation involves not only all possible amino acid sequences but also, all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest-energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest-energy conformation for the designed sequence, and typically discarding a large fraction of designed sequences for which this is not the case. Here, we show that by backpropagating gradients through the transform-restrained Rosetta (trRosetta) structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures in a single calculation. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single-point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by conformational landscape optimization with the standard energy-based sequence design methodology in Rosetta and show that the former can result in energy landscapes with fewer alternative energy minima. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low-resolution trRosetta model serves to disfavor alternative states, and the high-resolution Rosetta model serves to create a deep energy minimum at the design target structure.


Assuntos
Redes Neurais de Computação , Proteínas/química , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Termodinâmica
6.
Proc Natl Acad Sci U S A ; 117(3): 1496-1503, 2020 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-31896580

RESUMO

The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.


Assuntos
Conformação Proteica , Análise de Sequência de Proteína/métodos , Software , Animais , Aprendizado Profundo , Humanos
7.
Proc Natl Acad Sci U S A ; 117(29): 17003-17010, 2020 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-32632011

RESUMO

Rubicon is a potent negative regulator of autophagy and a potential target for autophagy-inducing therapeutics. Rubicon-mediated inhibition of autophagy requires the interaction of the C-terminal Rubicon homology (RH) domain of Rubicon with Rab7-GTP. Here we report the 2.8-Å crystal structure of the Rubicon RH domain in complex with Rab7-GTP. Our structure reveals a fold for the RH domain built around four zinc clusters. The switch regions of Rab7 insert into pockets on the surface of the RH domain in a mode that is distinct from those of other Rab-effector complexes. Rubicon residues at the dimer interface are required for Rubicon and Rab7 to colocalize in living cells. Mutation of Rubicon RH residues in the Rab7-binding site restores efficient autophagic flux in the presence of overexpressed Rubicon, validating the Rubicon RH domain as a promising therapeutic target.


Assuntos
Proteínas Relacionadas à Autofagia , Autofagia/fisiologia , Proteínas rab de Ligação ao GTP , Proteínas Relacionadas à Autofagia/química , Proteínas Relacionadas à Autofagia/metabolismo , Proteínas Relacionadas à Autofagia/fisiologia , Cristalografia por Raios X , Células HeLa , Humanos , Modelos Moleculares , Ligação Proteica , Domínios Proteicos/fisiologia , Proteínas rab de Ligação ao GTP/química , Proteínas rab de Ligação ao GTP/metabolismo , Proteínas rab de Ligação ao GTP/fisiologia , proteínas de unión al GTP Rab7
8.
Proteins ; 89(12): 1824-1833, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34324224

RESUMO

For CASP14, we developed deep learning-based methods for predicting homo-oligomeric and hetero-oligomeric contacts and used them for oligomer modeling. To build structure models, we developed an oligomer structure generation method that utilizes predicted interchain contacts to guide iterative restrained minimization from random backbone structures. We supplemented this gradient-based fold-and-dock method with template-based and ab initio docking approaches using deep learning-based subunit predictions on 29 assembly targets. These methods produced oligomer models with summed Z-scores 5.5 units higher than the next best group, with the fold-and-dock method having the best relative performance. Over the eight targets for which this method was used, the best of the five submitted models had average oligomer TM-score of 0.71 (average oligomer TM-score of the next best group: 0.64), and explicit modeling of inter-subunit interactions improved modeling of six out of 40 individual domains (ΔGDT-TS > 2.0).


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas , Software , Biologia Computacional , Bases de Dados de Proteínas , Aprendizado Profundo , Ligação Proteica , Subunidades Proteicas/química , Subunidades Proteicas/metabolismo , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
9.
Proteins ; 89(12): 1722-1733, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34331359

RESUMO

The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Estrutura Terciária de Proteína , Proteínas , Software , Humanos , Metagenoma/genética , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Análise de Sequência de Proteína
10.
Bioinformatics ; 36(1): 41-48, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31173061

RESUMO

MOTIVATION: Almost all protein residue contact prediction methods rely on the availability of deep multiple sequence alignments (MSAs). However, many proteins from the poorly populated families do not have sufficient number of homologs in the conventional UniProt database. Here we aim to solve this issue by exploring the rich sequence data from the metagenome sequencing projects. RESULTS: Based on the improved MSA constructed from the metagenome sequence data, we developed MapPred, a new deep learning-based contact prediction method. MapPred consists of two component methods, DeepMSA and DeepMeta, both trained with the residual neural networks. DeepMSA was inspired by the recent method DeepCov, which was trained on 441 matrices of covariance features. By considering the symmetry of contact map, we reduced the number of matrices to 231, which makes the training more efficient in DeepMSA. Experiments show that DeepMSA outperforms DeepCov by 10-13% in precision. DeepMeta works by combining predicted contacts and other sequence profile features. Experiments on three benchmark datasets suggest that the contribution from the metagenome sequence data is significant with P-values less than 4.04E-17. MapPred is shown to be complementary and comparable the state-of-the-art methods. The success of MapPred is attributed to three factors: the deeper MSA from the metagenome sequence data, improved feature design in DeepMSA and optimized training by the residual neural networks. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/mappred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Metagenoma , Redes Neurais de Computação , Análise de Sequência de Proteína , Algoritmos , Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos
11.
Proc Natl Acad Sci U S A ; 114(34): 9122-9127, 2017 08 22.
Artigo em Inglês | MEDLINE | ID: mdl-28784799

RESUMO

Residue pairs that directly coevolve in protein families are generally close in protein 3D structures. Here we study the exceptions to this general trend-directly coevolving residue pairs that are distant in protein structures-to determine the origins of evolutionary pressure on spatially distant residues and to understand the sources of error in contact-based structure prediction. Over a set of 4,000 protein families, we find that 25% of directly coevolving residue pairs are separated by more than 5 Å in protein structures and 3% by more than 15 Å. The majority (91%) of directly coevolving residue pairs in the 5-15 Å range are found to be in contact in at least one homologous structure-these exceptions arise from structural variation in the family in the region containing the residues. Thirty-five percent of the exceptions greater than 15 Å are at homo-oligomeric interfaces, 19% arise from family structural variation, and 27% are in repeat proteins likely reflecting alignment errors. Of the remaining long-range exceptions (<1% of the total number of coupled pairs), many can be attributed to close interactions in an oligomeric state. Overall, the results suggest that directly coevolving residue pairs not in repeat proteins are spatially proximal in at least one biologically relevant protein conformation within the family; we find little evidence for direct coupling between residues at spatially separated allosteric and functional sites or for increased direct coupling between residue pairs on putative allosteric pathways connecting them.


Assuntos
Aminoácidos/química , Evolução Molecular , Conformação Proteica , Proteínas/química , Aminoácidos/genética , Aminoácidos/metabolismo , Sítios de Ligação , Cristalografia por Raios X , Bases de Dados de Proteínas , Modelos Moleculares , Ligação Proteica , Domínios Proteicos , Multimerização Proteica , Proteínas/genética , Proteínas/metabolismo
12.
Proteins ; 87(12): 1276-1282, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31325340

RESUMO

Because proteins generally fold to their lowest free energy states, energy-guided refinement in principle should be able to systematically improve the quality of protein structure models generated using homologous structure or co-evolution derived information. However, because of the high dimensionality of the search space, there are far more ways to degrade the quality of a near native model than to improve it, and hence, refinement methods are very sensitive to energy function errors. In the 13th Critial Assessment of techniques for protein Structure Prediction (CASP13), we sought to carry out a thorough search for low energy states in the neighborhood of a starting model using restraints to avoid straying too far. The approach was reasonably successful in improving both regions largely incorrect in the starting models as well as core regions that started out closer to the correct structure. Models with GDT-HA over 70 were obtained for five targets and for one of those, an accuracy of 0.5 å backbone root-mean-square deviation (RMSD) was achieved. An important current challenge is to improve performance in refining oligomers and larger proteins, for which the search problem remains extremely difficult.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Algoritmos , Modelos Moleculares , Reprodutibilidade dos Testes , Termodinâmica
13.
Proteins ; 87(3): 245-253, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30520123

RESUMO

Structural characterization of protein-protein interactions is essential for our ability to study life processes at the molecular level. Computational modeling of protein complexes (protein docking) is important as the source of their structure and as a way to understand the principles of protein interaction. Rapidly evolving comparative docking approaches utilize target/template similarity metrics, which are often based on the protein structure. Although the structural similarity, generally, yields good performance, other characteristics of the interacting proteins (eg, function, biological process, and localization) may improve the prediction quality, especially in the case of weak target/template structural similarity. For the ranking of a pool of models for each target, we tested scoring functions that quantify similarity of Gene Ontology (GO) terms assigned to target and template proteins in three ontology domains-biological process, molecular function, and cellular component (GO-score). The scoring functions were tested in docking of bound, unbound, and modeled proteins. The results indicate that the combined structural and GO-terms functions improve the scoring, especially in the twilight zone of structural similarity, typical for protein models of limited accuracy.


Assuntos
Biologia Computacional , Ontologia Genética , Conformação Proteica , Proteínas/genética , Sítios de Ligação/genética , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Simulação de Acoplamento Molecular , Ligação Proteica/genética , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas/genética , Proteínas/química , Software , Homologia Estrutural de Proteína
14.
Proteins ; 87(12): 1241-1248, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31444975

RESUMO

As a participant in the joint CASP13-CAPRI46 assessment, the ClusPro server debuted its new template-based modeling functionality. The addition of this feature, called ClusPro TBM, was motivated by the previous CASP-CAPRI assessments and by the proven ability of template-based methods to produce higher-quality models, provided templates are available. In prior assessments, ClusPro submissions consisted of models that were produced via free docking of pre-generated homology models. This method was successful in terms of the number of acceptable predictions across targets; however, analysis of results showed that purely template-based methods produced a substantially higher number of medium-quality models for targets for which there were good templates available. The addition of template-based modeling has expanded ClusPro's ability to produce higher accuracy predictions, primarily for homomeric but also for some heteromeric targets. Here we review the newest additions to the ClusPro web server and discuss examples of CASP-CAPRI targets that continue to drive further development. We also describe ongoing work not yet implemented in the server. This includes the development of methods to improve template-based models and the use of co-evolutionary information for data-assisted free docking.


Assuntos
Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Software , Algoritmos , Sítios de Ligação/genética , Bases de Dados de Proteínas , Humanos , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Mapeamento de Interação de Proteínas , Proteínas/química , Proteínas/genética , Homologia Estrutural de Proteína
15.
Biophys J ; 115(5): 809-821, 2018 09 04.
Artigo em Inglês | MEDLINE | ID: mdl-30122295

RESUMO

The energy function is the key component of protein modeling methodology. This work presents a semianalytical approach to the development of contact potentials for protein structure modeling. Residue-residue and atom-atom contact energies were derived by maximizing the probability of observing native sequences in a nonredundant set of protein structures. The optimization task was formulated as an inverse statistical mechanics problem applied to the Potts model. Its solution by pseudolikelihood maximization provides consistent estimates of coupling constants at atomic and residue levels. The best performance was achieved when interacting atoms were grouped according to their physicochemical properties. For individual protein structures, the performance of the contact potentials in distinguishing near-native structures from the decoys is similar to the top-performing scoring functions. The potentials also yielded significant improvement in the protein docking success rates. The potentials recapitulated experimentally determined protein stability changes upon point mutations and protein-protein binding affinities. The approach offers a different perspective on knowledge-based potentials and may serve as the basis for their further development.


Assuntos
Modelos Moleculares , Proteínas/química , Proteínas/metabolismo , Funções Verossimilhança , Mutação Puntual , Conformação Proteica , Estabilidade Proteica , Proteínas/genética , Termodinâmica
16.
Proteins ; 86 Suppl 1: 302-310, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-28905425

RESUMO

The paper presents analysis of our template-based and free docking predictions in the joint CASP12/CAPRI37 round. A new scoring function for template-based docking was developed, benchmarked on the Dockground resource, and applied to the targets. The results showed that the function successfully discriminates the incorrect docking predictions. In correctly predicted targets, the scoring function was complemented by other considerations, such as consistency of the oligomeric states among templates, similarity of the biological functions, biological interface relevance, etc. The scoring function still does not distinguish well biological from crystal packing interfaces, and needs further development for the docking of bundles of α-helices. In the case of the trimeric targets, sequence-based methods did not find common templates, despite similarity of the structures, suggesting complementary use of structure- and sequence-based alignments in comparative docking. The results showed that if a good docking template is found, an accurate model of the interface can be built even from largely inaccurate models of individual subunits. Free docking however is very sensitive to the quality of the individual models. However, our newly developed contact potential detected approximate locations of the binding sites.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Multimerização Proteica , Proteínas/química , Software , Humanos , Ligação Proteica , Análise de Sequência de Proteína
17.
Proteins ; 85(3): 470-478, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27701777

RESUMO

Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å Cα RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação de Acoplamento Molecular/métodos , Proteínas/química , Software , Motivos de Aminoácidos , Benchmarking , Sítios de Ligação , Cristalografia por Raios X , Ligação Proteica , Conformação Proteica , Projetos de Pesquisa , Termodinâmica
18.
Proteins ; 85(1): 39-45, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27756103

RESUMO

Structural characterization of protein-protein interactions is essential for understanding life processes at the molecular level. However, only a fraction of protein interactions have experimentally resolved structures. Thus, reliable computational methods for structural modeling of protein interactions (protein docking) are important for generating such structures and understanding the principles of protein recognition. Template-based docking techniques that utilize structural similarity between target protein-protein interaction and cocrystallized protein-protein complexes (templates) are gaining popularity due to generally higher reliability than that of the template-free docking. However, the template-based approach lacks explicit penalties for intermolecular penetration, as opposed to the typical free docking where such penalty is inherent due to the shape complementarity paradigm. Thus, template-based docking models are commonly assumed to require special treatment to remove large structural penetrations. In this study, we compared clashes in the template-based and free docking of the same proteins, with crystallographically determined and modeled structures. The results show that for the less accurate protein models, free docking produces fewer clashes than the template-based approach. However, contrary to the common expectation, in acceptable and better quality docking models of unbound crystallographically determined proteins, the clashes in the template-based docking are comparable to those in the free docking, due to the overall higher quality of the template-based docking predictions. This suggests that the free docking refinement protocols can in principle be applied to the template-based docking predictions as well. Proteins 2016; 85:39-45. © 2016 Wiley Periodicals, Inc.


Assuntos
Simulação de Acoplamento Molecular , Proteínas/química , Sítios de Ligação , Biologia Computacional/métodos , Cristalografia por Raios X , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Estrutura Secundária de Proteína , Software , Homologia Estrutural de Proteína
19.
Proteins ; 83(9): 1563-70, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25488330

RESUMO

Structural characterization of protein-protein interactions is important for understanding life processes. Because of the inherent limitations of experimental techniques, such characterization requires computational approaches. Along with the traditional protein-protein docking (free search for a match between two proteins), comparative (template-based) modeling of protein-protein complexes has been gaining popularity. Its development puts an emphasis on full and partial structural similarity between the target protein monomers and the protein-protein complexes previously determined by experimental techniques (templates). The template-based docking relies on the quality and diversity of the template set. We present a carefully curated, nonredundant library of templates containing 4950 full structures of binary complexes and 5936 protein-protein interfaces extracted from the full structures at 12 Å distance cut-off. Redundancy in the libraries was removed by clustering the PDB structures based on structural similarity. The value of the clustering threshold was determined from the analysis of the clusters and the docking performance on a benchmark set. High structural quality of the interfaces in the template and validation sets was achieved by automated procedures and manual curation. The library is included in the Dockground resource for molecular recognition studies at http://dockground.bioinformatics.ku.edu.


Assuntos
Biologia Computacional/métodos , Simulação de Acoplamento Molecular , Mapeamento de Interação de Proteínas/métodos , Estrutura Terciária de Proteína , Proteínas/química , Sítios de Ligação , Análise por Conglomerados , Cristalografia por Raios X , Bases de Dados de Proteínas , Internet , Ligação Proteica , Proteínas/classificação , Proteínas/metabolismo , Reprodutibilidade dos Testes
20.
Proteins ; 83(5): 891-7, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25712716

RESUMO

Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have predefined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native C(α) RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the "real case scenario," as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu.


Assuntos
Simulação de Acoplamento Molecular/normas , Sequência de Aminoácidos , Dados de Sequência Molecular , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Estrutura Secundária de Proteína , Proteínas/química , Padrões de Referência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA