Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Bioinformatics ; 38(7): 1895-1903, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35134108

RESUMO

MOTIVATION: Protein model quality assessment is a key component of protein structure prediction. In recent research, the voxelization feature was used to characterize the local structural information of residues, but it may be insufficient for describing residue-level topological information. Design features that can further reflect residue-level topology when combined with deep learning methods are therefore crucial to improve the performance of model quality assessment. RESULTS: We developed a deep-learning method, DeepUMQA, based on Ultrafast Shape Recognition (USR) for the residue-level single-model quality assessment. In the framework of the deep residual neural network, the residue-level USR feature was introduced to describe the topological relationship between the residue and overall structure by calculating the first moment of a set of residue distance sets and then combined with 1D, 2D and voxelization features to assess the quality of the model. Experimental results on the CASP13, CASP14 test datasets and CAMEO blind test show that USR could supplement the voxelization features to comprehensively characterize residue structure information and significantly improve model assessment accuracy. The performance of DeepUMQA ranks among the top during the state-of-the-art single-model quality assessment methods, including ProQ2, ProQ3, ProQ3D, Ornate, VoroMQA, ProteinGCN, ResNetQA, QDeep, GraphQA, ModFOLD6, ModFOLD7, ModFOLD8, QMEAN3, QMEANDisCo3 and DeepAccNet. AVAILABILITY AND IMPLEMENTATION: The DeepUMQA server is freely available at http://zhanglab-bioinf.com/DeepUMQA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Proteínas/química , Redes Neurais de Computação , Biologia Computacional/métodos
2.
Bioinformatics ; 38(19): 4513-4521, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35962986

RESUMO

MOTIVATION: With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS: In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION: http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Humanos , Software , Proteínas/química , Bases de Dados de Proteínas , Domínios Proteicos
3.
Bioinformatics ; 37(23): 4357-4365, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34245242

RESUMO

MOTIVATION: Massive local minima on the protein energy landscape often cause traditional conformational sampling algorithms to be easily trapped in local basin regions, because they find it difficult to overcome high-energy barriers. Also, the lowest energy conformation may not correspond to the native structure due to the inaccuracy of energy models. This study investigates whether these two problems can be alleviated by a sequential niche technique without loss of accuracy. RESULTS: A sequential niche multimodal conformational sampling algorithm for protein structure prediction (SNfold) is proposed in this study. In SNfold, a derating function is designed based on the knowledge learned from the previous sampling and used to construct a series of sampling-guided energy functions. These functions then help the sampling algorithm overcome high-energy barriers and avoid the re-sampling of the explored regions. In inaccurate protein energy models, the high-energy conformation that may correspond to the native structure can be sampled with successively updated sampling-guided energy functions. The proposed SNfold is tested on 300 benchmark proteins, 24 CASP13 and 19 CASP14 FM targets. Results show that SNfold correctly folds (TM-score ≥ 0.5) 231 out of 300 proteins. In particular, compared with Rosetta restrained by distance (Rosetta-dist), SNfold achieves higher average TM-score and improves the sampling efficiency by more than 100 times. On several CASP FM targets, SNfold also shows good performance compared with four state-of-the-art servers in CASP. As a plug-in conformational sampling algorithm, SNfold can be extended to other protein structure prediction methods. AVAILABILITY AND IMPLEMENTATION: The source code and executable versions are freely available at https://github.com/iobio-zjut/SNfold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Conformação Proteica , Proteínas/química , Software , Benchmarking
4.
Bioinformatics ; 38(1): 99-107, 2021 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-34459867

RESUMO

MOTIVATION: With the great progress of deep learning-based inter-residue contact/distance prediction, the discrete space formed by fragment assembly cannot satisfy the distance constraint well. Thus, the optimal solution of the continuous space may not be achieved. Designing an effective closed-loop continuous dihedral angle optimization strategy that complements the discrete fragment assembly is crucial to improve the performance of the distance-assisted fragment assembly method. RESULTS: In this article, we proposed a de novo protein structure prediction method called IPTDFold based on closed-loop iterative partition sampling, topology adjustment and residue-level distance deviation optimization. First, local dihedral angle crossover and mutation operators are designed to explore the conformational space extensively and achieve information exchange between the conformations in the population. Then, the dihedral angle rotation model of loop region with partial inter-residue distance constraints is constructed, and the rotation angle satisfying the constraints is obtained by differential evolution algorithm, so as to adjust the spatial position relationship between the secondary structures. Finally, the residue distance deviation is evaluated according to the difference between the conformation and the predicted distance, and the dihedral angle of the residue is optimized with biased probability. The final model is generated by iterating the above three steps. IPTDFold is tested on 462 benchmark proteins, 24 FM targets of CASP13 and 20 FM targets of CASP14. Results show that IPTDFold is significantly superior to the distance-assisted fragment assembly method Rosetta_D (Rosetta with distance). In particular, the prediction accuracy of IPTDFold does not decrease as the length of the protein increases. When using the same FastRelax protocol, the prediction accuracy of IPTDFold is significantly superior to that of trRosetta without orientation constraints, and is equivalent to that of the full version of trRosetta. AVAILABILITYAND IMPLEMENTATION: The source code and executable are freely available at https://github.com/iobio-zjut/IPTDFold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Biologia Computacional/métodos , Proteínas/química , Software , Algoritmos , Estrutura Secundária de Proteína , Conformação Proteica
5.
Bioinformatics ; 37(23): 4350-4356, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34185079

RESUMO

MOTIVATION: The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. RESULTS: A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages: The first is a modal exploration stage, in which a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. The second is a modal maintaining stage, where an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on a large set of 320 non-redundant proteins, where MMpred obtains models with TM-score≥0.5 on 291 cases, which is 28% higher than that of Rosetta guided with the same set of distance constraints. In addition, on 320 benchmark proteins, the enhanced version of MMpred (E-MMpred) has 167 targets better than trRosetta when the best of five models are evaluated. The average TM-score of the best model of E-MMpred is 0.732, which is comparable to trRosetta (0.730). AVAILABILITY AND IMPLEMENTATION: The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Biologia Computacional/métodos , Proteínas/química , Software , Algoritmos
6.
Bioinformatics ; 36(8): 2443-2450, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-31860059

RESUMO

MOTIVATION: Regions that connect secondary structure elements in a protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of protein structure prediction can be improved using a loop-specific sampling strategy. RESULTS: A novel de novo protein structure prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score ≥ 0.5 models on 95 standard test proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. AVAILABILITY AND IMPLEMENTATION: The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Conformação Proteica , Estrutura Secundária de Proteína , Software
7.
IEEE Trans Evol Comput ; 24(3): 536-550, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33603321

RESUMO

Various mutation strategies show distinct advantages in differential evolution (DE). The cooperation of multiple strategies in the evolutionary process may be effective. This paper presents an underestimation-assisted global and local cooperative DE to simultaneously enhance the effectiveness and efficiency. In the proposed algorithm, two phases, namely, the global exploration and the local exploitation, are performed in each generation. In the global phase, a set of trial vectors is produced for each target individual by employing multiple strategies with strong exploration capability. Afterward, an adaptive underestimation model with a self-adapted slope control parameter is proposed to evaluate these trial vectors, the best of which is selected as the candidate. In the local phase, the better-based strategies guided by individuals that are better than the target individual are designed. For each individual accepted in the global phase, multiple trial vectors are generated by using these strategies and filtered by the underestimation value. The cooperation between the global and local phases includes two aspects. First, both of them concentrate on generating better individuals for the next generation. Second, the global phase aims to locate promising regions quickly while the local phase serves as a local search for enhancing convergence. Moreover, a simple mechanism is designed to determine the parameter of DE adaptively in the searching process. Finally, the proposed approach is applied to predict the protein 3D structure. Experimental studies on classical benchmark functions, CEC test sets, and protein structure prediction problem show that the proposed approach is superior to the competitors.

8.
Artigo em Inglês | MEDLINE | ID: mdl-35594218

RESUMO

Domain boundary prediction is one of the most important problems in the study of protein structure and function, especially for large proteins. At present, most domain boundary prediction methods have low accuracy and limitations in dealing with multi-domain proteins. In this study, we develop a sequence-based protein domain boundary prediction, named DomBpred. In DomBpred, the input sequence is first classified as either a single-domain protein or a multi-domain protein through a designed effective sequence metric based on a constructed single-domain sequence library. For the multi-domain protein, a domain-residue clustering algorithm inspired by Ising model is proposed to cluster the spatially close residues according inter-residue distance. The unclassified residues and the residues at the edge of the cluster are then tuned by the secondary structure to form potential cut points. Finally, a domain boundary scoring function is proposed to recursively evaluate the potential cut points to generate the domain boundary. DomBpred is tested on a large-scale test set of FUpred comprising 2549 proteins. Experimental results show that DomBpred better performs than the state-of-the-art methods in classifying whether protein sequences are composed by single or multiple domains, and the Matthew's correlation coefficient is 0.882. Moreover, on 849 multi-domain proteins, the domain boundary distance and normalised domain overlap scores of DomBpred are 0.523 and 0.824, respectively, which are 5.0% and 4.2% higher than those of the best comparison method, respectively. Comparison with other methods on the given test set shows that DomBpred outperforms most state-of-the-art sequence-based methods and even achieves better results than the top-level template-based method. The executable program is freely available at https://github.com/iobio-zjut/DomBpred and the online server at http://zhanglab-bioinf.com/DomBpred/.


Assuntos
Algoritmos , Proteínas , Domínios Proteicos , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Análise por Conglomerados
9.
Artigo em Inglês | MEDLINE | ID: mdl-32750861

RESUMO

De novo protein structure prediction is a challenging problem that requires both an accurate energy function and an efficient conformation sampling method. In this study, a de novo structure prediction method, named CoDiFold, is proposed. In CoDiFold, contacts and distance profiles are organically combined into the Rosetta low-resolution energy function to improve the accuracy of energy function. As a result, the correlation between energy and root mean square deviation (RMSD) is improved. In addition, a population-based multi-mutation strategy is designed to balance the exploration and exploitation of conformation space sampling. The average RMSD of the models generated by the proposed protocol is decreased by 49.24 and 45.21 percent in the test set with 43 proteins compared with those of Rosetta and QUARK de novo protocols, respectively. The results also demonstrate that the structures predicted by proposed CoDiFold are comparable to the state-of-the-art methods for the 10 FM targets of CASP13. The source code and executable versions are freely available at http://github.com/iobio-zjut/CoDiFold.


Assuntos
Proteínas , Software , Algoritmos , Modelos Moleculares , Conformação Proteica , Proteínas/genética
10.
Artigo em Inglês | MEDLINE | ID: mdl-31180869

RESUMO

Ab initio protein structure prediction is one of the most challenging problems in computational biology. Multistage algorithms are widely used in ab initio protein structure prediction. The different computational costs of a multistage algorithm for different proteins are important to be considered. In this study, a population-based algorithm guided by information entropy (PAIE), which includes exploration and exploitation stages, is proposed for protein structure prediction. In PAIE, an entropy-based stage switch strategy is designed to switch from the exploration stage to the exploitation stage. Torsion angle statistical information is also deduced from the first stage and employed to enhance the exploitation in the second stage. Results indicate that an improvement in the performance of protein structure prediction in a benchmark of 30 proteins and 17 other free modeling targets in CASP.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/química , Entropia , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína
11.
IEEE/ACM Trans Comput Biol Bioinform ; 17(3): 1068-1081, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-30295627

RESUMO

Ab initio protein tertiary structure prediction is one of the long-standing problems in structural bioinformatics. With the help of residue-residue contact and secondary structure prediction information, the accuracy of ab initio structure prediction can be enhanced. In this study, an improved differential evolution with secondary structure and residue-residue contact information referred to as SCDE is proposed for protein structure prediction. In SCDE, two score models based on secondary structure and contact information are proposed, and two selection strategies, namely, secondary structure-based selection strategy and contact-based selection strategy, are designed to guide conformation space search. A probability distribution function is designed to balance these two selection strategies. Experimental results on a benchmark dataset with 28 proteins and four free model targets in CASP12 demonstrate that the proposed SCDE is effective and efficient.


Assuntos
Biologia Computacional/métodos , Estrutura Secundária de Proteína , Proteínas , Algoritmos , Bases de Dados de Proteínas , Proteínas/química , Proteínas/genética
12.
IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1419-1429, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-30668479

RESUMO

Accurately identifying DNA-binding proteins (DBPs) from protein sequence information is an important but challenging task for protein function annotations. In this paper, we establish a novel computational method, named TargetDBP, for accurately targeting DBPs from primary sequences. In TargetDBP, four single-view features, i.e., AAC (Amino Acid Composition), PsePSSM (Pseudo Position-Specific Scoring Matrix), PsePRSA (Pseudo Predicted Relative Solvent Accessibility), and PsePPDBS (Pseudo Predicted Probabilities of DNA-Binding Sites), are first extracted to represent different base features, respectively. Second, differential evolution algorithm is employed to learn the weights of four base features. Using the learned weights, we weightedly combine these base features to form the original super feature. An excellent subset of the super feature is then selected by using a suitable feature selection algorithm SVM-REF+CBR (Support Vector Machine Recursive Feature Elimination with Correlation Bias Reduction). Finally, the prediction model is learned via using support vector machine on the selected feature subset. We also construct a new gold-standard and non-redundant benchmark dataset from PDB database to evaluate and compare the proposed TargetDBP with other existing predictors. On this new dataset, TargetDBP can achieve higher performance than other state-of-the-art predictors. The TargetDBP web server and datasets are freely available at http://csbio.njust.edu.cn/bioinf/targetdbp/ for academic use.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Algoritmos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Bases de Dados de Proteínas , Matrizes de Pontuação de Posição Específica , Máquina de Vetores de Suporte
13.
IEEE/ACM Trans Comput Biol Bioinform ; 17(6): 2119-2130, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31107659

RESUMO

De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.


Assuntos
Algoritmos , Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Inteligência Artificial , Modelos Moleculares , Mutação/genética , Proteínas/genética
14.
IEEE Trans Cybern ; 49(4): 1353-1364, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-29994744

RESUMO

As we know, the performance of differential evolution (DE) highly depends on the mutation strategy. However, it is difficult to choose a suitable mutation strategy for a specific problem or different running stages. This paper proposes an underestimation-based multimutation strategy (UMS) for DE. In the UMS, a set of candidate offsprings are simultaneously generated for each target individual by utilizing multiple mutation strategies. Then a cheap abstract convex underestimation model is built based on some selected individuals to obtain the underestimation value of each candidate offspring. According to the quality of each candidate offspring measured by the underestimation value, the most promising candidate solution is chosen as the offspring. Compared to the existing probability-based multimutation techniques, no mutation strategies are lost during the search process as each mutation strategy has the same probability to generate a candidate solution. Moreover, no extra function evaluations are produced because the candidate solutions are filtered by the underestimation value. The UMS is integrated into some DE variants and compared with their original algorithms and several advanced DE approaches over the CEC 2013 and 2014 benchmark sets. Additionally, a well-known real-world problem is employed to evaluate the performance of the UMS. Experimental results show that the proposed UMS can improve the performance of the advanced DE variants.

15.
IEEE Trans Nanobioscience ; 18(4): 567-577, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31180866

RESUMO

Protein structure prediction has been a long-standing problem for the past decades. In particular, the loop region structure remains an obstacle in forming an accurate protein tertiary structure because of its flexibility. In this study, Rama torsion angle and secondary structure feature-guided differential evolution named RSDE is proposed to predict three-dimensional structure with the exploitation on the loop region structure. In RSDE, the structure of the loop region is improved by the following: loop-based cross operator, which interchanges configuration of a randomly selected loop region between individuals, and loop-based mutate operator, which considers torsion angle feature into conformational sampling. A stochastic ranking selective strategy is designed to select conformations with low energy and near-native structure. Moreover, the conformational resampling method, which uses previously learned knowledge to guide subsequent sampling, is proposed to improve the sampling efficiency. Experiments on a total of 28 test proteins reveals that the proposed RSDE is effective and can obtain native-like models.


Assuntos
Modelos Moleculares , Conformação Proteica
16.
IEEE Trans Cybern ; 47(9): 2730-2741, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28613195

RESUMO

In differential evolution (DE), different strategies applied in different evolutionary stages may be more effective than a single strategy used in the entire evolutionary process. However, it is not trivial to appropriately determine the evolutionary stage. In this paper, we present an abstract convex underestimation-assisted multistage DE. In the proposed algorithm, the underestimation is calculated through the supporting vectors of some neighboring individuals. Based on the variation of the average underestimation error (UE), the evolutionary process is divided into three stages. Each stage includes a pool of suitable candidate strategies. At the beginning of each generation, the evolutionary stage is first estimated according to the average UE of the previous generation. Subsequently, a strategy is automatically chosen from the corresponding candidate pool to create a mutant vector. In addition, a centroid-based strategy which utilizes the information of multiple superior individuals is designed to balance the population diversity and convergence speed in the second stage. Experiments are conducted on 23 widely used test functions, CEC 2013, and CEC 2014 benchmark sets to demonstrate the performance of the proposed algorithm. The results reveal that the proposed algorithm exhibits better performance compared with several advanced DE variants and some non-DE approaches.

17.
IEEE Trans Nanobioscience ; 16(7): 618-633, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28885157

RESUMO

Protein structure prediction can be considered as a multimodal optimization problem for sampling the protein conformational space associated with an extremely complex energy landscape. To address this problem, a conformational space sampling method using multi-subpopulation differential evolution, MDE, is proposed. MDE first devotes to generate given numbers of concerned modal under the ultrafast shape recognition-based modal identification protocol, which regards each individual as one modal at beginning. Then, differential evolution is used for keeping the preserved modal survival in the evolution process. Meanwhile, a local descent direction used to sample along with is constructed based on the abstract convex underestimate technique for modal enhancement, which could enhance the ability of sampling in the region with lower energy. Through the sampling process of evolution, several certain clusters contain a series of conformations in proportion to the energy score will be obtained. Representative conformations in the generated clusters can be directly picked out as decoy conformations for further refinement with no extra clustering operation needs. A total of 20 target proteins are tested, in which ten target proteins are tested for comparison with Rosetta and three evolutionary algorithms, and ten easy/hard target proteins in CASP 11 are tested for further verifying the effectiveness of MDE. Test results show strong sampling ability that MDE holds, and near-native conformations can be effectively obtained.


Assuntos
Modelos Moleculares , Modelos Estatísticos , Conformação Proteica , Proteínas/química , Análise de Sequência de Proteína/métodos , Biologia Computacional
18.
IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1288-1301, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28113726

RESUMO

De novo protein structure prediction aims to search for low-energy conformations as it follows the thermodynamics hypothesis that places native conformations at the global minimum of the protein energy surface. However, the native conformation is not necessarily located in the lowest-energy regions owing to the inaccuracies of the energy model. This study presents a differential evolution algorithm using distance profile-based selection strategy to sample conformations with reasonable structure effectively. In the proposed algorithm, besides energy, the residue-residue distance is considered another measure of the conformation. The average distance errors of decoys between the distance of each residue pair and the corresponding distance in the distance profiles are first calculated when the trial conformation yields a larger energy value than that of the target. Then, the distance acceptance probability of the trial conformation is designed based on distance profiles if the trial conformation obtains a lower average distance error compared with that of the target conformation. The trial conformation is accepted to the next generation in accordance with its distance acceptance probability. By using the dual constraints of energy and distance in guiding sampling, the algorithm can sample conformations with lower energies and more reasonable structures. Experimental results of 28 benchmark proteins show that the proposed algorithm can effectively predict near-native protein structures.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Modelos Moleculares
19.
Artigo em Inglês | MEDLINE | ID: mdl-26552093

RESUMO

To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more efficiently obtain the near-native protein structure.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Simulação por Computador , Modelos Estatísticos , Método de Monte Carlo , Conformação Proteica
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa