Pesquisa | Portal de Pesquisa da BVS

Recent Advances and Challenges in Protein Structure Prediction.

Peng, Chun-Xiang; Liang, Fang; Xia, Yu-Hao; Zhao, Kai-Long; Hou, Ming-Hua; Zhang, Gui-Jun.

J Chem Inf Model ; 64(1): 76-95, 2024 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-38109487

RESUMO

Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.

Assuntos

Inteligência Artificial , Descoberta de Drogas , Dobramento de Proteína , Projetos de Pesquisa

Structural analogue-based protein structure domain assembly assisted by deep learning.

Peng, Chun-Xiang; Zhou, Xiao-Gen; Xia, Yu-Hao; Liu, Jun; Hou, Ming-Hua; Zhang, Gui-Jun.

Bioinformatics ; 38(19): 4513-4521, 2022 09 30.

Artigo em Inglês | MEDLINE | ID: mdl-35962986

RESUMO

MOTIVATION: With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS: In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION: http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Humanos , Software , Proteínas/química , Bases de Dados de Proteínas , Domínios Proteicos

A sequential niche multimodal conformational sampling algorithm for protein structure prediction.

Xia, Yu-Hao; Peng, Chun-Xiang; Zhou, Xiao-Gen; Zhang, Gui-Jun.

Bioinformatics ; 37(23): 4357-4365, 2021 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-34245242

RESUMO

MOTIVATION: Massive local minima on the protein energy landscape often cause traditional conformational sampling algorithms to be easily trapped in local basin regions, because they find it difficult to overcome high-energy barriers. Also, the lowest energy conformation may not correspond to the native structure due to the inaccuracy of energy models. This study investigates whether these two problems can be alleviated by a sequential niche technique without loss of accuracy. RESULTS: A sequential niche multimodal conformational sampling algorithm for protein structure prediction (SNfold) is proposed in this study. In SNfold, a derating function is designed based on the knowledge learned from the previous sampling and used to construct a series of sampling-guided energy functions. These functions then help the sampling algorithm overcome high-energy barriers and avoid the re-sampling of the explored regions. In inaccurate protein energy models, the high-energy conformation that may correspond to the native structure can be sampled with successively updated sampling-guided energy functions. The proposed SNfold is tested on 300 benchmark proteins, 24 CASP13 and 19 CASP14 FM targets. Results show that SNfold correctly folds (TM-score ≥ 0.5) 231 out of 300 proteins. In particular, compared with Rosetta restrained by distance (Rosetta-dist), SNfold achieves higher average TM-score and improves the sampling efficiency by more than 100 times. On several CASP FM targets, SNfold also shows good performance compared with four state-of-the-art servers in CASP. As a plug-in conformational sampling algorithm, SNfold can be extended to other protein structure prediction methods. AVAILABILITY AND IMPLEMENTATION: The source code and executable versions are freely available at https://github.com/iobio-zjut/SNfold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Proteínas , Conformação Proteica , Proteínas/química , Software , Benchmarking

Underestimation-Assisted Global-Local Cooperative Differential Evolution and the Application to Protein Structure Prediction.

Zhou, Xiao-Gen; Peng, Chun-Xiang; Liu, Jun; Zhang, Yang; Zhang, Gui-Jun.

IEEE Trans Evol Comput ; 24(3): 536-550, 2020 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-33603321

RESUMO

Various mutation strategies show distinct advantages in differential evolution (DE). The cooperation of multiple strategies in the evolutionary process may be effective. This paper presents an underestimation-assisted global and local cooperative DE to simultaneously enhance the effectiveness and efficiency. In the proposed algorithm, two phases, namely, the global exploration and the local exploitation, are performed in each generation. In the global phase, a set of trial vectors is produced for each target individual by employing multiple strategies with strong exploration capability. Afterward, an adaptive underestimation model with a self-adapted slope control parameter is proposed to evaluate these trial vectors, the best of which is selected as the candidate. In the local phase, the better-based strategies guided by individuals that are better than the target individual are designed. For each individual accepted in the global phase, multiple trial vectors are generated by using these strategies and filtered by the underestimation value. The cooperation between the global and local phases includes two aspects. First, both of them concentrate on generating better individuals for the next generation. Second, the global phase aims to locate promising regions quickly while the local phase serves as a local search for enhancing convergence. Moreover, a simple mechanism is designed to determine the parameter of DE adaptively in the searching process. Finally, the proposed approach is applied to predict the protein 3D structure. Experimental studies on classical benchmark functions, CEC test sets, and protein structure prediction problem show that the proposed approach is superior to the competitors.

DomBpred: Protein Domain Boundary Prediction Based on Domain-Residue Clustering Using Inter-Residue Distance.

Yu, Zhong-Ze; Peng, Chun-Xiang; Liu, Jun; Zhang, Biao; Zhou, Xiao-Gen; Zhang, Gui-Jun.

IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 912-922, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-35594218

RESUMO

Domain boundary prediction is one of the most important problems in the study of protein structure and function, especially for large proteins. At present, most domain boundary prediction methods have low accuracy and limitations in dealing with multi-domain proteins. In this study, we develop a sequence-based protein domain boundary prediction, named DomBpred. In DomBpred, the input sequence is first classified as either a single-domain protein or a multi-domain protein through a designed effective sequence metric based on a constructed single-domain sequence library. For the multi-domain protein, a domain-residue clustering algorithm inspired by Ising model is proposed to cluster the spatially close residues according inter-residue distance. The unclassified residues and the residues at the edge of the cluster are then tuned by the secondary structure to form potential cut points. Finally, a domain boundary scoring function is proposed to recursively evaluate the potential cut points to generate the domain boundary. DomBpred is tested on a large-scale test set of FUpred comprising 2549 proteins. Experimental results show that DomBpred better performs than the state-of-the-art methods in classifying whether protein sequences are composed by single or multiple domains, and the Matthew's correlation coefficient is 0.882. Moreover, on 849 multi-domain proteins, the domain boundary distance and normalised domain overlap scores of DomBpred are 0.523 and 0.824, respectively, which are 5.0% and 4.2% higher than those of the best comparison method, respectively. Comparison with other methods on the given test set shows that DomBpred outperforms most state-of-the-art sequence-based methods and even achieves better results than the top-level template-based method. The executable program is freely available at https://github.com/iobio-zjut/DomBpred and the online server at http://zhanglab-bioinf.com/DomBpred/.

Assuntos

Algoritmos , Proteínas , Domínios Proteicos , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Análise por Conglomerados

De novo Protein Structure Prediction by Coupling Contact With Distance Profile.

Peng, Chun-Xiang; Zhou, Xiao-Gen; Zhang, Gui-Jun.

IEEE/ACM Trans Comput Biol Bioinform ; 19(1): 395-406, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-32750861

RESUMO

De novo protein structure prediction is a challenging problem that requires both an accurate energy function and an efficient conformation sampling method. In this study, a de novo structure prediction method, named CoDiFold, is proposed. In CoDiFold, contacts and distance profiles are organically combined into the Rosetta low-resolution energy function to improve the accuracy of energy function. As a result, the correlation between energy and root mean square deviation (RMSD) is improved. In addition, a population-based multi-mutation strategy is designed to balance the exploration and exploitation of conformation space sampling. The average RMSD of the models generated by the proposed protocol is decreased by 49.24 and 45.21 percent in the test set with 43 proteins compared with those of Rosetta and QUARK de novo protocols, respectively. The results also demonstrate that the structures predicted by proposed CoDiFold are comparable to the state-of-the-art methods for the 10 FM targets of CASP13. The source code and executable versions are freely available at http://github.com/iobio-zjut/CoDiFold.

Assuntos

Proteínas , Software , Algoritmos , Modelos Moleculares , Conformação Proteica , Proteínas/genética

[Sequence comparison of the hemagglutinin gene of the duck-origin H9N2 subtype avian influenza viruses].

Wan, Chun-He; Fu, Guang-Hua; Cheng, Long-Fei; Shi, Shao-Hua; Chen, Hong-Mei; Peng, Chun-Xiang; Lin, Fang; Lin, Jian-Sheng; Huang, Yu.

Bing Du Xue Bao ; 28(2): 158-64, 2012 Mar.

Artigo em Zh | MEDLINE | ID: mdl-22519178

RESUMO

To demonstrate the phylogenetic evolution, the molecular characteristics of the motif of HA protein cleavage site and the varieties at the receptor binding sites of the hemagglutinin gene of the duck-origin H9N2 subtype avian influenza viruses, sequence alignment and phylogenetic analysis were performed by MEGA 4.1 Neighbor-Joining method.. The results revealed that the duck-origin H9N2 AIV viruses originated from CK/BJ/1/94-like and North-Ame-like, all the duck-origin H9N2 AIV viruses from mainland China belonged to CK/BJ/1/94-like and formed multiple genotypes through complicated re-assortment, while other duck-origin H9N2 AIV, isolated from other countries in Aisa, American and European such as Korea, Japan, Alberta, Austria, Switzerland, Iran, belonged to the North-Ame-like phylogenetic lineage. The amino acids at positions 183, 190, and 226 of the receptor binding sites of North-Ame-like group isolates had highly conserved H, E and Q respectively. In contrast with duck-origin H9N2 AIV viruses isolates from mainland China, the amino acids had N at positions 183, A, T, or V at 190, L or Q at 226, which was the same as the chicken-origin H9N2 AIV from mainland China. Most newly isolated chicken-origin H9N2 AIV in Fujian Province in Southern China had L at position 226 emphasized the higher risk of cross-infection between the chicken-origin and duck-origin H9N2 AIV in China.

Assuntos

Glicoproteínas de Hemaglutininação de Vírus da Influenza/química , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Vírus da Influenza A Subtipo H9N2/genética , Influenza Aviária/virologia , Doenças das Aves Domésticas/virologia , Animais , China , Patos , Vírus da Influenza A Subtipo H9N2/química , Vírus da Influenza A Subtipo H9N2/classificação , Vírus da Influenza A Subtipo H9N2/isolamento & purificação , Vírus da Influenza A/química , Vírus da Influenza A/classificação , Vírus da Influenza A/genética , Dados de Sequência Molecular , Filogenia , Alinhamento de Sequência

Epidemiological investigation and genome analysis of duck circovirus in Southern China.

Wan, Chun-He; Fu, Guang-Hua; Shi, Shao-Hua; Cheng, Long-Fei; Chen, Hong-Mei; Peng, Chun-Xiang; Lin, Su; Huang, Yu.

Virol Sin ; 26(5): 289-96, 2011 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-21979568

RESUMO

Duck circovirus (DuCV), a potential immunosuppressive virus, was investigated in Southern China from March 2006 to December 2009 by using a polymerase chain reaction (PCR) based method. In this study, a total of 138 sick or dead duck samples from 18 different farms were examined with an average DuCV infection rate of â¼35%. It was found that ducks between the ages of 40â¼60 days were more susceptible to DuCV. There was no evidence showing that the DuCV virus was capable of vertical transmission. Farms with positive PCR results exhibited no regularly apparent clinical abnormalities such as feathering disorders, growth retardation or lower-than-average weight. The complete genomes of 9 strains from Fujian Province and 1 from Zhejiang Province were sequenced and analyzed. The 10 DuCV genomes, compared with others genomes downloaded from GenBank, ranged in size from 1988 to 1996 base pairs, with sequence identities ranging from 83.2% to 99.8%. Phylogenetic analysis based on genome sequences demonstrated that DuCVs can be divided into two distinct genetic genotypes, Group I (the Euro-USA lineage) and Group II (the Taiwan lineage), with approximately 10.0% genetic difference between the two types. Molecular epidemiological data suggest there is no obvious difference among DuCV strains isolated from different geographic locations or different species, including Duck, Muscovy duck, Mule duck, Cheery duck, Mulard duck and Pekin duck.

Assuntos

Infecções por Circoviridae/veterinária , Circovirus/classificação , Circovirus/genética , DNA Viral/genética , Genoma Viral , Doenças das Aves Domésticas/virologia , Animais , China , Infecções por Circoviridae/virologia , Circovirus/isolamento & purificação , Análise por Conglomerados , DNA Viral/química , Patos , Genótipo , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNA , Homologia de Sequência

[Genome cloning and sequence analysis of duck circovirus].

Fu, Guang-Hua; Cheng, Long-Fei; Shi, Shao-Hua; Peng, Chun-Xiang; Chen, Hong-Mei; Huang, Yu.

Bing Du Xue Bao ; 24(2): 138-43, 2008 Jun.

Artigo em Zh | MEDLINE | ID: mdl-18533346

RESUMO

To reveal the molecular biological characteristics of genome of circovirus in infected ducks, two nucleotide fragments were amplified by overlapping PCRs using DNA extracted from various tissues of ducks. After they had been assembled together, the nucleotide components, the genome organization and the phylogenetic scale of the sequence were analyzed. The results showed that the obtained sequence is a circular DNA with a total length of 1995nt. It contains 6 open reading frames (ORFs), and shares a high identity of 97.4% with the MuDCV circovirus sequence presented in GenBank (AY228555). These results indicate that the amplified product stems from duck circovirus sequence.

Assuntos

Circovirus/genética , Patos/virologia , Animais , Sequência de Bases , Circovirus/classificação , Clonagem Molecular , Variação Genética , Dados de Sequência Molecular , Filogenia , Reação em Cadeia da Polimerase , Análise de Sequência de DNA

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA