Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36460624

RESUMO

Protein model quality assessment plays an important role in protein structure prediction, protein design and drug discovery. In this work, DeepUMQA2, a substantially improved version of DeepUMQA for protein model quality assessment, is proposed. First, sequence features containing protein co-evolution information and structural features reflecting family information are extracted to complement model-dependent features. Second, a novel backbone network based on triangular multiplication update and axial attention mechanism is designed to enhance information exchange between inter-residue pairs. On CASP13 and CASP14 datasets, the performance of DeepUMQA2 increases by 20.5 and 20.4% compared with DeepUMQA, respectively (measured by top 1 loss). Moreover, on the three-month CAMEO dataset (11 March to 04 June 2022), DeepUMQA2 outperforms DeepUMQA by 15.5% (measured by local AUC0,0.2) and ranks first among all competing server methods in CAMEO blind test. Experimental results show that DeepUMQA2 outperforms state-of-the-art model quality assessment methods, such as ProQ3D-LDDT, ModFOLD8, and DeepAccNet and DeepUMQA2 can select more suitable best models than state-of-the-art protein structure methods, such as AlphaFold2, RoseTTAFold and I-TASSER, provided themselves.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Modelos Moleculares , Redes Neurais de Computação , Proteínas/química , Conformação Proteica
2.
Proteins ; 91(8): 1089-1096, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37158708

RESUMO

Machine learning research concerning protein structure has seen a surge in popularity over the last years with promising advances for basic science and drug discovery. Working with macromolecular structure in a machine learning context requires an adequate numerical representation, and researchers have extensively studied representations such as graphs, discretized 3D grids, and distance maps. As part of CASP14, we explored a new and conceptually simple representation in a blind experiment: atoms as points in 3D, each with associated features. These features-initially just the basic element type of each atom-are updated through a series of neural network layers featuring rotation-equivariant convolutions. Starting from all atoms, we further aggregate information at the level of alpha carbons before making a prediction at the level of the entire protein structure. We find that this approach yields competitive results in protein model quality assessment despite its simplicity and despite the fact that it incorporates minimal prior information and is trained on relatively little data. Its performance and generality are particularly noteworthy in an era where highly complex, customized machine learning methods such as AlphaFold 2 have come to dominate protein structure prediction.


Assuntos
Redes Neurais de Computação , Proteínas , Rotação , Proteínas/química , Aprendizado de Máquina , Descoberta de Drogas
3.
Proteins ; 91(12): 1861-1870, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37553848

RESUMO

This article reports and analyzes the results of protein complex model accuracy estimation by our methods (DeepUMQA3 and GraphGPSM) in the 15th Critical Assessment of techniques for protein Structure Prediction (CASP15). The new deep learning-based multimeric complex model accuracy estimation methods are proposed based on the ensemble of three-level features coupling with deep residual/graph neural networks. For the input multimeric complex model, we describe it from three levels: overall complex features, intra-monomer features, and inter-monomer features. We designed an overall ultrafast shape recognition (USR) to characterize the relationship between local residues and the overall complex topology, and an inter-monomer USR to characterize the relationship between the residues of one monomer and the topology of other monomers. DeepUMQA3 (Group name: GuijunLab-RocketX) ranked first in the interface residue accuracy estimation of CASP15. The Pearson correlation between the interface residue Local Distance Difference Test (lDDT) predicted by DeepUMQA3 and the real lDDT is 0.570, the only method that exceeds 0.5. Among the top 5 methods, DeepUMQA3 achieved the highest Pearson correlation of lDDT on 25 out of 39 targets. GraphGPSM (Group name: GuijunLab-PAthreader) has TM-score Pearson correlations greater than 0.9 on 14 targets, showing a good ability to estimate the overall fold accuracy. The DeepUMQA3 server is available at http://zhanglab-bioinf.com/DeepUMQA/ and the GraphGPSM server is available at http://zhanglab-bioinf.com/GraphGPSM/.


Assuntos
Aprendizado Profundo , Conformação Proteica , Biologia Computacional/métodos , Proteínas/química , Redes Neurais de Computação
4.
Proteins ; 91(12): 1871-1878, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37314190

RESUMO

In CASP15, there was a greater emphasis on multimeric modeling than in previous experiments, with assembly structures nearly doubling in number (41 up from 22) since the previous round. CASP15 also included a new estimation of model accuracy (EMA) category in recognition of the importance of objective quality assessment (QA) for quaternary structure models. ModFOLDdock is a multimeric model QA server developed by the McGuffin group at the University of Reading, which brings together a range of single-model, clustering, and deep learning methods to form a consensus of approaches. For CASP15, three variants of ModFOLDdock were developed to optimize for the different facets of the quality estimation problem. The standard ModFOLDdock variant produced predicted scores optimized for positive linear correlations with the observed scores. The ModFOLDdockR variant produced predicted scores optimized for ranking, that is, the top-ranked models have the highest accuracy. In addition, the ModFOLDdockS variant used a quasi-single model approach to score each model on an individual basis. The scores from all three variants achieved strongly positive Pearson correlation coefficients with the CASP observed scores (oligo-lDDT) in excess of 0.70, which were maintained across both homomeric and heteromeric model populations. In addition, at least one of the ModFOLDdock variants was consistently ranked in the top two methods across all three EMA categories. Specifically, for overall global fold prediction accuracy, ModFOLDdock placed second and ModFOLDdockR placed third; for overall interface quality prediction accuracy, ModFOLDdockR, ModFOLDdock, and ModFOLDdockS were placed above all other predictor methods, and ModFOLDdockR and ModFOLDdockS were placed second and third respectively for individual residue confidence scores. The ModFOLDdock server is available at: https://www.reading.ac.uk/bioinf/ModFOLDdock/. ModFOLDdock is also available as part of the MultiFOLD docker package: https://hub.docker.com/r/mcguffin/multifold.


Assuntos
Proteínas , Software , Conformação Proteica , Proteínas/genética , Proteínas/química , Modelos Moleculares , Biologia Computacional
5.
Proteins ; 91(12): 1889-1902, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37357816

RESUMO

Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and performed very well in estimating the global structure accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analyzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA.


Assuntos
Aprendizado Profundo , Modelos Moleculares , Proteínas/química
6.
Softw Syst Model ; 22(1): 13-29, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36033973

RESUMO

Participatory enterprise modeling is about gathering domain experts and involving them directly in the creation of models, aided by modeling experts. It is meant to increase commitment to and quality of models. This paper presents an exploratory study focusing on the subjective view of the domain experts. We investigated the influence of direct collaboration versus individual modeling, and the influence of model revisions by modeling experts on psychological ownership and perceived model quality. We chose process modeling as a particular form of enterprise modeling. Our results give hint that domain experts working individually with a modeling expert perceive model quality as higher than those working collaboratively whereas psychological ownership did not show any difference. Revisions caused changes in the subjects' assessments only of model quality. Moreover, we will present qualitative results from interviews we led with the participants. They reveal interesting insight on how outcome and perception of the procedure and the method in both settings can be positively influenced. The interviews also emphasize the special role of the method experts who are sometimes even considered as co-owners of the model.

7.
Proteins ; 90(12): 2091-2102, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35842895

RESUMO

The estimation of protein model accuracy (EMA) or model quality assessment (QA) is important for protein structure prediction. An accurate EMA algorithm can guide the refinement of models or pick the best model or best parts of models from a pool of predicted tertiary structures. We developed two novel methods: MASS2 and LAW, for predicting residue-specific or local qualities of individual models, which incorporate residual neural networks and graph neural networks, respectively. These two methods use similar features extracted from protein models but different architectures of neural networks to predict the local accuracies of single models. MASS2 and LAW participated in the QA category of CASP14, and according to our evaluations based on CASP14 official criteria, MASS2 and LAW are the best and second-best methods based on the Z-scores of ASE/100, AUC, and ULR-1.F1. We also evaluated MASS2, LAW, and the residue-specific predicted deviations (between model and native structure) generated by AlphaFold2 on CASP14 AlphaFold2 tertiary structure (TS) models. LAW achieved comparable or better performances compared to the predicted deviations generated by AlphaFold2 on AlphaFold2 TS models, even though LAW was not trained on any AlphaFold2 TS models. Specifically, LAW performed better on AUC and ULR scores, and AlphaFold2 performed better on ASE scores. This means that AlphaFold2 is better at predicting deviations, but LAW is better at classifying accurate and inaccurate residues and detecting unreliable local regions. MASS2 and LAW can be freely accessed from http://dna.cs.miami.edu/MASS2-CASP14/ and http://dna.cs.miami.edu/LAW-CASP14/, respectively.


Assuntos
Biologia Computacional , Proteínas , Biologia Computacional/métodos , Modelos Moleculares , Proteínas/química , Redes Neurais de Computação , Algoritmos , Conformação Proteica
8.
J Comput Chem ; 43(17): 1140-1150, 2022 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-35475517

RESUMO

The native structures of proteins, except for notable exceptions of intrinsically disordered proteins, in general take their most stable conformation in the physiological condition to maintain their structural framework so that their biological function can be properly carried out. Experimentally, the stability of a protein can be measured by several means, among which the pulling experiment using the atomic force microscope (AFM) stands as a unique method. AFM directly measures the resistance from unfolding, which can be quantified from the observed force-extension profile. It has been shown that key features observed in an AFM pulling experiment can be well reproduced by computational molecular dynamics simulations. Here, we applied computational pulling for estimating the accuracy of computational protein structure models under the hypothesis that the structural stability would positively correlated with the accuracy, i.e. the closeness to the native, of a model. We used in total 4929 structure models for 24 target proteins from the Critical Assessment of Techniques of Structure Prediction (CASP) and investigated if the magnitude of the break force, that is, the force required to rearrange the model's structure, from the force profile was sufficient information for selecting near-native models. We found that near-native models can be successfully selected by examining their break forces suggesting that high break force indeed indicates high stability of models. On the other hand, there were also near-native models that had relatively low peak forces. The mechanisms of the stability exhibited by the break forces were explored and discussed.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Conformação Proteica , Proteínas/química , Software
9.
Proteins ; 89(12): 1940-1948, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34324227

RESUMO

In CASP, blind testing of model accuracy estimation methods has been conducted on models submitted by tertiary structure prediction servers. In CASP14, model accuracy estimation results were evaluated in terms of both global and local structure accuracy, as in the previous CASPs. Unlike the previous CASPs that did not show pronounced improvements in performance, the best single-model method (from the Baker group) showed an improved performance in CASP14, particularly in evaluating global structure accuracy when compared to both the best single-model methods in previous CASPs and the best multi-model methods in the current CASP. Although the CASP14 experiment on model accuracy estimation did not deal with the structures generated by AlphaFold2, new challenges that have arisen due to the success of AlphaFold2 are discussed.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas , Software , Biologia Computacional , Proteínas/química , Proteínas/metabolismo , Reprodutibilidade dos Testes , Análise de Sequência de Proteína/métodos
10.
Proteins ; 89(12): 1834-1843, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34176161

RESUMO

The goal of CASP experiments is to monitor the progress in the protein structure prediction field. During the 14th CASP edition we aimed to test our capabilities of predicting structures of protein complexes. Our protocol for modeling protein assemblies included both template-based modeling and free docking. Structural templates were identified using sensitive sequence-based searches. If sequence-based searches failed, we performed structure-based template searches using selected CASP server models. In the absence of reliable templates we applied free docking starting from monomers generated by CASP servers. We evaluated and ranked models of protein complexes using an improved version of our protein structure quality assessment method, VoroMQA, taking into account both interaction interface and global structure scores. If reliable templates could be identified, generally accurate models of protein assemblies were generated with the exception of an antibody-antigen interaction. The success of free docking mainly depended on the accuracy of initial subunit models and on the scoring of docking solutions. To put our overall results in perspective, we analyzed our performance in the context of other CASP groups. Although the subunits in our assembly models often were not of the top quality, these models had, overall, the best-predicted intersubunit interfaces according to several accuracy measures. We attribute our relative success primarily to the emphasis on the interaction interface when modeling and scoring.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas , Software , Homologia Estrutural de Proteína , Sítios de Ligação , Biologia Computacional , Simulação de Acoplamento Molecular , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
11.
Environ Monit Assess ; 193(4): 238, 2021 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-33783638

RESUMO

In the practical application of air protection, diverse dispersion models are used to calculate the concentration of contaminants in the air. They usually involve a universal character, which typically makes them sufficient for use in almost all conditions, with the exception of those clearly deviating from the average. This is especially relevant to industrial objects of large areas, introducing a great amount of heat and mechanical energy into the air. For such cases, the standard models can be extended in order to adapt them to the unusual local diffusion conditions. Next, to be applied in practice, they must have undergone validation to document the correctness of its operation. The article describes the process of validation of the air quality assessment model containing extended procedures to incorporate special factors affecting atmospheric dispersion in a coke industry. The set of statistical indicators, obtained on the basis of SF6 field experiment, evaluate its performance. The short comparison with some popular models of general-purpose character and an assessment of the suitability of individual indicators for validation purposes are also presented.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Coque , Poluentes Atmosféricos/análise , Poluição do Ar/análise , Difusão , Monitoramento Ambiental , Modelos Teóricos
12.
J Pak Med Assoc ; 71(5): 1413-1419, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-34091625

RESUMO

OBJECTIVE: To evaluate the effect of education given according to daily living activities model on arm dysfunction, lymphoedema and quality of life in patients undergoing breast cancer surgery. METHODS: The randomised controlled study was done at a tertiary hospital and comprised patients undergoing breast cancer surgery who underwent breast cancer surgery from November 2017 to October 2018. After randomisation, the intervention group received education through specifically-designed tools, while the control group received routine care. Data was collected using a patient information form, the subjective perception of post-operative functional impairment of the arm scale, Katz index of daily living activities, the disabilities of the arm, shoulder and hand scale and the short form of the quality of life scale. Three interviews were conducted at post-surgery 1st week, 1st month and 3rd month. Data was analysed using SPSS 23. RESULTS: Of the 58 subjects, 29(50%) each were cases and controls. The overall mean age was 48.9±9 years. In the intervention group, the measurements of the upper arm circumference were significantly better than the control group (p<0.05). Also there were significant differences between the groups in terms of scales and indices used (p<0.05). CONCLUSIONS: The intervention group recovered earlies than the control group.


Assuntos
Neoplasias da Mama , Linfedema , Atividades Cotidianas , Adulto , Braço , Neoplasias da Mama/cirurgia , Humanos , Linfedema/etiologia , Pessoa de Meia-Idade , Qualidade de Vida
13.
BMC Bioinformatics ; 21(Suppl 4): 246, 2020 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-32631256

RESUMO

BACKGROUND: Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS: We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS: MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/ .


Assuntos
Biologia Computacional/métodos , Proteínas/química , Modelos Moleculares
14.
BMC Bioinformatics ; 21(1): 157, 2020 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-32334508

RESUMO

BACKGROUND: Quality assessment of protein tertiary structure prediction models, in which structures of the best quality are selected from decoys, is a major challenge in protein structure prediction, and is crucial to determine a model's utility and potential applications. Estimating the quality of a single model predicts the model's quality based on the single model itself. In general, the Pearson correlation value of the quality assessment method increases in tandem with an increase in the quality of the model pool. However, there is no consensus regarding the best method to select a few good models from the poor quality model pool. RESULTS: We introduce a novel single-model quality assessment method for poor quality models that uses simple linear combinations of six features. We perform weighted search and linear regression on a large dataset of models from the 12th Critical Assessment of Protein Structure Prediction (CASP12) and benchmark the results on CASP13 models. We demonstrate that our method achieves outstanding performance on poor quality models. CONCLUSIONS: According to results of poor protein structure assessment based on six features, contact prediction and relying on fewer prediction features can improve selection accuracy.


Assuntos
Modelos Moleculares , Proteínas/química , Benchmarking , Biologia Computacional/métodos , Conformação Proteica
15.
Proteins ; 88(8): 939-947, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-31697420

RESUMO

Structures of proteins complexed with other proteins, peptides, or ligands are essential for investigation of molecular mechanisms. However, the experimental structures of protein complexes of interest are often not available. Therefore, computational methods are widely used to predict these structures, and, of those methods, template-based modeling is the most successful. In the rounds 38-45 of the Critical Assessment of PRediction of Interactions (CAPRI), we applied template-based modeling for 9 of 11 protein-protein and protein-peptide interaction targets, resulting in medium and high-quality models for six targets. For the protein-oligosaccharide docking targets, we used constraints derived from template structures, and generated models of at least acceptable quality for most of the targets. Apparently, high flexibility of oligosaccharide molecules was the main cause preventing us from obtaining models of higher quality. We also participated in the CAPRI scoring challenge, the goal of which was to identify the highest quality models from a large pool of decoys. In this experiment, we tested VoroMQA, a scoring method based on interatomic contact areas. The results showed VoroMQA to be quite effective in scoring strongly binding and obligatory protein complexes, but less successful in the case of transient interactions. We extensively used manual intervention in both CAPRI modeling and scoring experiments. This oftentimes allowed us to select the correct templates from available alternatives and to limit the search space during the model scoring.


Assuntos
Simulação de Acoplamento Molecular , Oligossacarídeos/química , Peptídeos/química , Proteínas/química , Software , Sequência de Aminoácidos , Sítios de Ligação , Humanos , Ligantes , Oligossacarídeos/metabolismo , Peptídeos/metabolismo , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Proteínas/metabolismo , Projetos de Pesquisa , Homologia Estrutural de Proteína
16.
Entropy (Basel) ; 22(3)2020 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-33286116

RESUMO

MaxEnt is a popular maximum entropy-based algorithm originally developed for modelling species distribution, but increasingly used for land-cover classification. In this article, we used MaxEnt as a single-class land-cover classification and explored if recommended procedures for generating high-quality species distribution models also apply for generating high-accuracy land-cover classification. We used remote sensing imagery and randomly selected ground-true points for four types of land covers (built, grass, deciduous, evergreen) to generate 1980 classification maps using MaxEnt. We calculated different accuracy discrimination and quality model metrics to determine if these metrics were suitable proxies for estimating the accuracy of land-cover classification outcomes. Correlation analysis between model quality metrics showed consistent patterns for the relationships between metrics, but not for all land-covers. Relationship between model quality metrics and land-cover classification accuracy were land-cover-dependent. While for built cover there was no consistent patterns of correlations for any quality metrics; for grass, evergreen and deciduous, there was a consistent association between quality metrics and classification accuracy. We recommend evaluating the accuracy of land-cover classification results by using proper discrimination accuracy coefficients (e.g., Kappa, Overall Accuracy), and not placing all the confidence in model's quality metrics as a reliable indicator of land-cover classification results.

17.
Proteins ; 87(12): 1351-1360, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31436360

RESUMO

Scoring model structure is an essential component of protein structure prediction that can affect the prediction accuracy tremendously. Users of protein structure prediction results also need to score models to select the best models for their application studies. In Critical Assessment of techniques for protein Structure Prediction (CASP), model accuracy estimation methods have been tested in a blind fashion by providing models submitted by the tertiary structure prediction servers for scoring. In CASP13, model accuracy estimation results were evaluated in terms of both global and local structure accuracy. Global structure accuracy estimation was evaluated by the quality of the models selected by the global structure scores and by the absolute estimates of the global scores. Residue-wise, local structure accuracy estimations were evaluated by three different measures. A new measure introduced in CASP13 evaluates the ability to predict inaccurately modeled regions that may be improved by refinement. An intensive comparative analysis on CASP13 and the previous CASPs revealed that the tertiary structure models generated by the CASP13 servers show very distinct features. Higher consensus toward models of higher global accuracy appeared even for free modeling targets, and many models of high global accuracy were not well optimized at the atomic level. This is related to the new technology in CASP13, deep learning for tertiary contact prediction. The tertiary model structures generated by deep learning pose a new challenge for EMA (estimation of model accuracy) method developers. Model accuracy estimation itself is also an area where deep learning can potentially have an impact, although current EMA methods have not fully explored that direction.


Assuntos
Biologia Computacional , Modelos Moleculares , Conformação Proteica , Proteínas/ultraestrutura , Algoritmos , Bases de Dados de Proteínas , Aprendizado Profundo , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína , Software
18.
Proteins ; 87(12): 1222-1232, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31294859

RESUMO

Proteins frequently interact with each other, and the knowledge of structures of the corresponding protein complexes is necessary to understand how they function. Computational methods are increasingly used to provide structural models of protein complexes. Not surprisingly, community-wide Critical Assessment of protein Structure Prediction (CASP) experiments have recently started monitoring the progress in this research area. We participated in CASP13 with the aim to evaluate our current capabilities in modeling of protein complexes and to gain a better understanding of factors that exert the largest impact on these capabilities. To model protein complexes in CASP13, we applied template-based modeling, free docking and hybrid techniques that enabled us to generate models of the topmost quality for 27 of 42 multimers. If templates for protein complexes could be identified, we modeled the structures with reasonable accuracy by straightforward homology modeling. If only partial templates were available, it was nevertheless possible to predict the interaction interfaces correctly or to generate acceptable models for protein complexes by combining template-based modeling with docking. If no templates were available, we used rigid-body docking with limited success. However, in some free docking models, despite the incorrect subunit orientation and missed interface contacts, the approximate location of protein binding sites was identified correctly. Apparently, our overall performance in docking was limited by the quality of monomer models and by the imperfection of scoring methods. The impact of human intervention on our results in modeling of protein complexes was significant indicating the need for improvements of automatic methods.


Assuntos
Biologia Computacional , Complexos Multiproteicos/ultraestrutura , Conformação Proteica , Proteínas/ultraestrutura , Sítios de Ligação/genética , Bases de Dados de Proteínas , Humanos , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Complexos Multiproteicos/química , Complexos Multiproteicos/genética , Ligação Proteica/genética , Mapeamento de Interação de Proteínas , Multimerização Proteica/genética , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína , Homologia Estrutural de Proteína
19.
Proteins ; 87(12): 1165-1178, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-30985027

RESUMO

Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.


Assuntos
Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Software , Algoritmos , Bases de Dados de Proteínas , Aprendizado Profundo , Modelos Moleculares , Redes Neurais de Computação , Dobramento de Proteína , Estrutura Terciária de Proteína/genética , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína
20.
Proteins ; 87(12): 1378-1387, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31571280

RESUMO

Critical blind assessment of structure prediction techniques is crucial for the scientific community to establish the state of the art, identify bottlenecks, and guide future developments. In Critical Assessment of Techniques in Structure Prediction (CASP), human experts assess the performance of participating methods in relation to the difficulty of the prediction task in a biennial experiment on approximately 100 targets. Yet, the development of automated computational modeling methods requires more frequent evaluation cycles and larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements CASP by conducting fully automated blind prediction evaluations based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the Protein Data Bank (PDB). Each week, CAMEO publishes benchmarking results for predictions corresponding to a set of about 20 targets collected during a 4-day prediction window. CAMEO benchmarking data are generated consistently for all methods at the same point in time, enabling developers to cross-validate their method's performance, and referring to their results in publications. Many successful participants of CASP have used CAMEO-either by directly benchmarking their methods within the system or by comparing their own performance to CAMEO reference data. CAMEO offers a variety of scores reflecting different aspects of structure modeling, for example, binding site accuracy, homo-oligomer interface quality, or accuracy of local model confidence estimates. By introducing the "bestSingleTemplate" method based on structure superpositions as a reference for the accuracy of 3D modeling predictions, CAMEO facilitates objective comparison of techniques and fosters the development of advanced methods.


Assuntos
Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Software , Algoritmos , Benchmarking , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA