RESUMO
In CASP12, 2 types of data-assisted protein structure modeling were experimented. Either SAXS experimental data or cross-linking experimental data was provided for a selected number of CASP12 targets that the CASP12 predictor could utilize for better protein structure modeling. We devised 2 separate energy terms for SAXS data and cross-linking data to drive the model structures into more native-like structures that satisfied the given experimental data as much as possible. In CASP11, we successfully performed protein structure modeling using simulated sparse and ambiguously assigned NOE data and/or correct residue-residue contact information, where the only energy term that folded the protein into its native structure was the term which was originated from the given experimental data. However, the 2 types of experimental data provided in CASP12 were far from being sufficient enough to fold the target protein into its native structure because SAXS data provides only the overall shape of the molecule and the cross-linking contact information provides only very low-resolution distance information. For this reason, we combined the SAXS or cross-linking energy term with our regular modeling energy function that includes both the template energy term and the de novo energy terms. By optimizing the newly formulated energy function, we obtained protein models that fit better with provided SAXS data than the X-ray structure of the target. However, the improvement of the model relative to the 1 modeled without the SAXS data, was not significant. Consistent structural improvement was achieved by incorporating cross-linking data into the protein structure modeling.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Proteínas/química , Espalhamento a Baixo Ângulo , Algoritmos , Humanos , Simulação de Dinâmica Molecular , Difração de Raios XRESUMO
For protein structure modeling in the CASP12 experiment, we have developed a new protocol based on our previous CASP11 approach. The global optimization method of conformational space annealing (CSA) was applied to 3 stages of modeling: multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain re-modeling. For better template selection and model selection, we updated our model quality assessment (QA) method with the newly developed SVMQA (support vector machine for quality assessment). For 3D chain building, we updated our energy function by including restraints generated from predicted residue-residue contacts. New energy terms for the predicted secondary structure and predicted solvent accessible surface area were also introduced. For difficult targets, we proposed a new method, LEEab, where the template term played a less significant role than it did in LEE, complemented by increased contributions from other terms such as the predicted contact term. For TBM (template-based modeling) targets, LEE performed better than LEEab, but for FM targets, LEEab was better. For model refinement, we modified our CASP11 molecular dynamics (MD) based protocol by using explicit solvents and tuning down restraint weights. Refinement results from MD simulations that used a new augmented statistical energy term in the force field were quite promising. Finally, when using inaccurate information (such as the predicted contacts), it was important to use the Lorentzian function for which the maximal penalty arising from wrong information is always bounded.
Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Modelos Moleculares , Simulação de Dinâmica Molecular , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Algoritmos , Cristalografia por Raios X , Humanos , Modelos Estatísticos , Domínios e Motivos de Interação entre Proteínas , Análise de Sequência de Proteína , Máquina de Vetores de SuporteRESUMO
We have developed a protein loop structure prediction method by combining a new energy function, which we call EPLM (energy for protein loop modeling), with the conformational space annealing (CSA) global optimization algorithm. The energy function includes stereochemistry, dynamic fragment assembly, distance-scaled finite ideal gas reference (DFIRE), and generalized orientation- and distance-dependent terms. For the conformational search of loop structures, we used the CSA algorithm, which has been quite successful in dealing with various hard global optimization problems. We assessed the performance of EPLM with two widely used loop-decoy sets, Jacobson and RAPPER, and compared the results against the DFIRE potential. The accuracy of model selection from a pool of loop decoys as well as de novo loop modeling starting from randomly generated structures was examined separately. For the selection of a nativelike structure from a decoy set, EPLM was more accurate than DFIRE in the case of the Jacobson set and had similar accuracy in the case of the RAPPER set. In terms of sampling more nativelike loop structures, EPLM outperformed EDFIRE for both decoy sets. This new approach equipped with EPLM and CSA can serve as the state-of-the-art de novo loop modeling method.
Assuntos
Bioquímica/métodos , Modelos Químicos , Proteínas/química , Conformação Proteica , Dobramento de ProteínaRESUMO
For the template-based modeling (TBM) of CASP11 targets, we have developed three new protein modeling protocols (nns for server prediction and LEE and LEER for human prediction) by improving upon our previous CASP protocols (CASP7 through CASP10). We applied the powerful global optimization method of conformational space annealing to three stages of optimization, including multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain remodeling. For more successful fold recognition, a new alignment method called CRFalign was developed. It can incorporate sensitive positional and environmental dependence in alignment scores as well as strong nonlinear correlations among various features. Modifications and adjustments were made to the form of the energy function and weight parameters pertaining to the chain building procedure. For the side-chain remodeling step, residue-type dependence was introduced to the cutoff value that determines the entry of a rotamer to the side-chain modeling library. The improved performance of the nns server method is attributed to successful fold recognition achieved by combining several methods including CRFalign and to the current modeling formulation that can incorporate native-like structural aspects present in multiple templates. The LEE protocol is identical to the nns one except that CASP11-released server models are used as templates. The success of LEE in utilizing CASP11 server models indicates that proper template screening and template clustering assisted by appropriate cluster ranking promises a new direction to enhance protein 3D modeling. Proteins 2016; 84(Suppl 1):221-232. © 2015 Wiley Periodicals, Inc.
Assuntos
Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Internet , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia Estrutural de Proteína , TermodinâmicaRESUMO
In the template-based modeling (TBM) category of CASP10 experiment, we introduced a new protocol called protein modeling system (PMS) to generate accurate protein structures in terms of side-chains as well as backbone trace. In the new protocol, a global optimization algorithm, called conformational space annealing (CSA), is applied to the three layers of TBM procedure: multiple sequence-structure alignment, 3D chain building, and side-chain re-modeling. For 3D chain building, we developed a new energy function which includes new distance restraint terms of Lorentzian type (derived from multiple templates), and new energy terms that combine (physical) energy terms such as dynamic fragment assembly (DFA) energy, DFIRE statistical potential energy, hydrogen bonding term, etc. These physical energy terms are expected to guide the structure modeling especially for loop regions where no template structures are available. In addition, we developed a new quality assessment method based on random forest machine learning algorithm to screen templates, multiple alignments, and final models. For TBM targets of CASP10, we find that, due to the combination of three stages of CSA global optimizations and quality assessment, the modeling accuracy of PMS improves at each additional stage of the protocol. It is especially noteworthy that the side-chains of the final PMS models are far more accurate than the models in the intermediate steps.