Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 133
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38517699

RESUMEN

The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.


Asunto(s)
Microscopía por Crioelectrón , Microscopía por Crioelectrón/métodos , Modelos Moleculares , Conformación Proteica
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38600663

RESUMEN

Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Alineación de Secuencia , Secuencia de Aminoácidos , Proteínas/química , Análisis de Secuencia de Proteína/métodos
3.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36460624

RESUMEN

Protein model quality assessment plays an important role in protein structure prediction, protein design and drug discovery. In this work, DeepUMQA2, a substantially improved version of DeepUMQA for protein model quality assessment, is proposed. First, sequence features containing protein co-evolution information and structural features reflecting family information are extracted to complement model-dependent features. Second, a novel backbone network based on triangular multiplication update and axial attention mechanism is designed to enhance information exchange between inter-residue pairs. On CASP13 and CASP14 datasets, the performance of DeepUMQA2 increases by 20.5 and 20.4% compared with DeepUMQA, respectively (measured by top 1 loss). Moreover, on the three-month CAMEO dataset (11 March to 04 June 2022), DeepUMQA2 outperforms DeepUMQA by 15.5% (measured by local AUC0,0.2) and ranks first among all competing server methods in CAMEO blind test. Experimental results show that DeepUMQA2 outperforms state-of-the-art model quality assessment methods, such as ProQ3D-LDDT, ModFOLD8, and DeepAccNet and DeepUMQA2 can select more suitable best models than state-of-the-art protein structure methods, such as AlphaFold2, RoseTTAFold and I-TASSER, provided themselves.


Asunto(s)
Algoritmos , Biología Computacional , Biología Computacional/métodos , Modelos Moleculares , Redes Neurales de la Computación , Proteínas/química , Conformación Proteica
4.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37317619

RESUMEN

The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.


Asunto(s)
Algoritmos , Proteínas , Conformación Proteica , Bases de Datos de Proteínas , Proteínas/química , Redes Neurales de la Computación
5.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-36920090

RESUMEN

AlphaFold2 achieved a breakthrough in protein structure prediction through the end-to-end deep learning method, which can predict nearly all single-domain proteins at experimental resolution. However, the prediction accuracy of full-chain proteins is generally lower than that of single-domain proteins because of the incorrect interactions between domains. In this work, we develop an inter-domain distance prediction method, named DeepIDDP. In DeepIDDP, we design a neural network with attention mechanisms, where two new inter-domain features are used to enhance the ability to capture the interactions between domains. Furthermore, we propose a data enhancement strategy termed DPMSA, which is employed to deal with the absence of co-evolutionary information on targets. We integrate DeepIDDP into our previously developed domain assembly method SADA, termed SADA-DeepIDDP. Tested on a given multi-domain benchmark dataset, the accuracy of SADA-DeepIDDP inter-domain distance prediction is 11.3% and 21.6% higher than trRosettaX and trRosetta, respectively. The accuracy of the domain assembly model is 2.5% higher than that of SADA. Meanwhile, we reassemble 68 human multi-domain protein models with TM-score ≤ 0.80 from the AlphaFold protein structure database, where the average TM-score is improved by 11.8% after the reassembly by our method. The online server is at http://zhanglab-bioinf.com/DeepIDDP/.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Humanos , Redes Neurales de la Computación , Proteínas/química , Bases de Datos de Proteínas , Biología Computacional
6.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38018909

RESUMEN

Model quality evaluation is a crucial part of protein structural biology. How to distinguish high-quality models from low-quality models, and to assess which high-quality models have relatively incorrect regions for improvement, are remain a challenge. More importantly, the quality assessment of multimer models is a hot topic for structure prediction. In this study, we propose GraphCPLMQA, a novel approach for evaluating residue-level model quality that combines graph coupled networks and embeddings from protein language models. The GraphCPLMQA consists of a graph encoding module and a transform-based convolutional decoding module. In encoding module, the underlying relational representations of sequence and high-dimensional geometry structure are extracted by protein language models with Evolutionary Scale Modeling. In decoding module, the mapping connection between structure and quality is inferred by the representations and low-dimensional features. Specifically, the triangular location and residue level contact order features are designed to enhance the association between the local structure and the overall topology. Experimental results demonstrate that GraphCPLMQA using single-sequence embedding achieves the best performance compared with the CASP15 residue-level interface evaluation methods among 9108 models in the local residue interface test set of CASP15 multimers. In CAMEO blind test (20 May 2022 to 13 August 2022), GraphCPLMQA ranked first compared with other servers (https://www.cameo3d.org/quality-estimation). GraphCPLMQA also outperforms state-of-the-art methods on 19, 035 models in CASP13 and CASP14 monomer test set.


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Biología Computacional/métodos , Proteínas/química , Lenguaje
7.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35152277

RESUMEN

With the rapid progress of deep learning in cryo-electron microscopy and protein structure prediction, improving the accuracy of the protein structure model by using a density map and predicted contact/distance map through deep learning has become an urgent need for robust methods. Thus, designing an effective protein structure optimization strategy based on the density map and predicted contact/distance map is critical to improving the accuracy of structure refinement. In this article, a protein structure optimization method based on the density map and predicted contact/distance map by deep-learning technology was proposed in accordance with the result of matching between the density map and the initial model. Physics- and knowledge-based energy functions, integrated with Cryo-EM density map data and deep-learning data, were used to optimize the protein structure in the simulation. The dynamic confidence score was introduced to the iterative process for choosing whether it is a density map or a contact/distance map to dominate the movement in the simulation to improve the accuracy of refinement. The protocol was tested on a large set of 224 non-homologous membrane proteins and generated 214 structural models with correct folds, where 4.5% of structural models were generated from structural models with incorrect folds. Compared with other state-of-the-art methods, the major advantage of the proposed methods lies in the skills for using density map and contact/distance map in the simulation, as well as the new energy function in the re-assembly simulations. Overall, the results demonstrated that this strategy is a valuable approach and ready to use for atomic-level structure refinement using cryo-EM density map and predicted contact/distance map.


Asunto(s)
Aprendizaje Profundo , Microscopía por Crioelectrón/métodos , Proteínas de la Membrana , Modelos Moleculares , Conformación Proteica
8.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35284936

RESUMEN

Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The hidden Markov model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile-profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins show that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared with the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrate that the average TM-score of VFlib was 16.00% higher than that of NNMake.


Asunto(s)
Pliegue de Proteína , Proteínas , Algoritmos , Análisis por Conglomerados , Bases de Datos de Proteínas , Estructura Secundaria de Proteína , Proteínas/química
9.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34849573

RESUMEN

Meta contact, which combines different contact maps into one to improve contact prediction accuracy and effectively reduce the noise from a single contact map, is a widely used method. However, protein structure prediction using meta contact cannot fully exploit the information carried by original contact maps. In this work, a multi contact-based folding method under the evolutionary algorithm framework, MultiCFold, is proposed. In MultiCFold, the thorough information of different contact maps is directly used by populations to guide protein structure folding. In addition, noncontact is considered as an effective supplement to contact information and can further assist protein folding. MultiCFold is tested on a set of 120 nonredundant proteins, and the average TM-score and average RMSD reach 0.617 and 5.815 Å, respectively. Compared with the meta contact-based method, MetaCFold, average TM-score and average RMSD have a 6.62 and 8.82% improvement. In particular, the import of noncontact information increases the average TM-score by 6.30%. Furthermore, MultiCFold is compared with four state-of-the-art methods of CASP13 on the 24 FM targets, and results show that MultiCFold is significantly better than other methods after the full-atom relax procedure.


Asunto(s)
Pliegue de Proteína , Proteínas , Algoritmos , Biología Computacional/métodos , Modelos Moleculares , Conformación Proteica , Proteínas/química
10.
Bioinformatics ; 39(10)2023 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-37740296

RESUMEN

MOTIVATION: Model quality assessment is a crucial part of protein structure prediction and a gateway to proper usage of models in biomedical applications. Many methods have been proposed for assessing the quality of structural models of protein monomers, but few methods for evaluating protein complex models. As protein complex structure prediction becomes a new challenge, there is an urgent need for model quality assessment methods that can accurately assess the accuracy of interface residues of complex structures. RESULTS: Here, we present DeepUMQA3, a web server for evaluating the accuracy of interface residues of protein complex structures using deep neural networks. For an input complex structure, features are extracted from three levels of overall complex, intra-monomer, and inter-monomer, and an improved deep residual neural network is used to predict per-residue lDDT and interface residue accuracy. DeepUMQA3 ranks first in the blind test of interface residue accuracy estimation in CASP15, with Pearson, Spearman, and AUC of 0.564, 0.535, and 0.755 under the lDDT measurement, which are 17.6%, 23.6%, and 10.9% higher than the second best method, respectively. DeepUMQA3 can also assess the accuracy of all residues in the entire complex and distinguish high- and low-precision residues. AVAILABILITY AND IMPLEMENTATION: The web sever of DeepUMQA3 are freely available at http://zhanglab-bioinf.com/DeepUMQA_server/.

11.
PLoS Comput Biol ; 19(9): e1011438, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37695768

RESUMEN

The study of protein folding mechanism is a challenge in molecular biology, which is of great significance for revealing the movement rules of biological macromolecules, understanding the pathogenic mechanism of folding diseases, and designing protein engineering materials. Based on the hypothesis that the conformational sampling trajectory contain the information of folding pathway, we propose a protein folding pathway prediction algorithm named Pathfinder. Firstly, Pathfinder performs large-scale sampling of the conformational space and clusters the decoys obtained in the sampling. The heterogeneous conformations obtained by clustering are named seed states. Then, a resampling algorithm that is not constrained by the local energy basin is designed to obtain the transition probabilities of seed states. Finally, protein folding pathways are inferred from the maximum transition probabilities of seed states. The proposed Pathfinder is tested on our developed test set (34 proteins). For 11 widely studied proteins, we correctly predicted their folding pathways and specifically analyzed 5 of them. For 13 proteins, we predicted their folding pathways to be further verified by biological experiments. For 6 proteins, we analyzed the reasons for the low prediction accuracy. For the other 4 proteins without biological experiment results, potential folding pathways were predicted to provide new insights into protein folding mechanism. The results reveal that structural analogs may have different folding pathways to express different biological functions, homologous proteins may contain common folding pathways, and α-helices may be more prone to early protein folding than ß-strands.


Asunto(s)
Algoritmos , Biología Molecular , Análisis por Conglomerados , Conformación Molecular , Pliegue de Proteína
12.
J Chem Inf Model ; 64(1): 76-95, 2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38109487

RESUMEN

Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas , Pliegue de Proteína , Proyectos de Investigación
13.
Nucleic Acids Res ; 50(W1): W235-W245, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35536281

RESUMEN

Most proteins in nature contain multiple folding units (or domains). The revolutionary success of AlphaFold2 in single-domain structure prediction showed potential to extend deep-learning techniques for multi-domain structure modeling. This work presents a significantly improved method, DEMO2, which integrates analogous template structural alignments with deep-learning techniques for high-accuracy domain structure assembly. Starting from individual domain models, inter-domain spatial restraints are first predicted with deep residual convolutional networks, where full-length structure models are assembled using L-BFGS simulations under the guidance of a hybrid energy function combining deep-learning restraints and analogous multi-domain template alignments searched from the PDB. The output of DEMO2 contains deep-learning inter-domain restraints, top-ranked multi-domain structure templates, and up to five full-length structure models. DEMO2 was tested on a large-scale benchmark and the blind CASP14 experiment, where DEMO2 was shown to significantly outperform its predecessor and the state-of-the-art protein structure prediction methods. By integrating with new deep-learning techniques, DEMO2 should help fill the rapidly increasing gap between the improved ability of tertiary structure determination and the high demand for the high-quality multi-domain protein structures. The DEMO2 server is available at https://zhanggroup.org/DEMO/.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Proteínas/química , Conformación Proteica , Biología Computacional/métodos , Programas Informáticos
14.
Proteins ; 91(12): 1861-1870, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37553848

RESUMEN

This article reports and analyzes the results of protein complex model accuracy estimation by our methods (DeepUMQA3 and GraphGPSM) in the 15th Critical Assessment of techniques for protein Structure Prediction (CASP15). The new deep learning-based multimeric complex model accuracy estimation methods are proposed based on the ensemble of three-level features coupling with deep residual/graph neural networks. For the input multimeric complex model, we describe it from three levels: overall complex features, intra-monomer features, and inter-monomer features. We designed an overall ultrafast shape recognition (USR) to characterize the relationship between local residues and the overall complex topology, and an inter-monomer USR to characterize the relationship between the residues of one monomer and the topology of other monomers. DeepUMQA3 (Group name: GuijunLab-RocketX) ranked first in the interface residue accuracy estimation of CASP15. The Pearson correlation between the interface residue Local Distance Difference Test (lDDT) predicted by DeepUMQA3 and the real lDDT is 0.570, the only method that exceeds 0.5. Among the top 5 methods, DeepUMQA3 achieved the highest Pearson correlation of lDDT on 25 out of 39 targets. GraphGPSM (Group name: GuijunLab-PAthreader) has TM-score Pearson correlations greater than 0.9 on 14 targets, showing a good ability to estimate the overall fold accuracy. The DeepUMQA3 server is available at http://zhanglab-bioinf.com/DeepUMQA/ and the GraphGPSM server is available at http://zhanglab-bioinf.com/GraphGPSM/.


Asunto(s)
Aprendizaje Profundo , Conformación Proteica , Biología Computacional/métodos , Proteínas/química , Redes Neurales de la Computación
15.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34355233

RESUMEN

Advances in the prediction of the inter-residue distance for a protein sequence have increased the accuracy to predict the correct folds of proteins with distance information. Here, we propose a distance-guided protein folding algorithm based on generalized descent direction, named GDDfold, which achieves effective structural perturbation and potential minimization in two stages. In the global stage, random-based direction is designed using evolutionary knowledge, which guides conformation population to cross potential barriers and explore conformational space rapidly in a large range. In the local stage, locally rugged potential landscape can be explored with the aid of conjugate-based direction integrated into a specific search strategy, which can improve the exploitation ability. GDDfold is tested on 347 proteins of a benchmark set, 24 template-free modeling (FM) approaches targets of CASP13 and 20 FM targets of CASP14. Results show that GDDfold correctly folds [template modeling (TM) score ≥ = 0.5] 316 out of 347 proteins, where 65 proteins have TM scores that are greater than 0.8, and significantly outperforms Rosetta-dist (distance-assisted fragment assembly method) and L-BFGSfold (distance geometry optimization method). On CASP FM targets, GDDfold is comparable with five state-of-the-art full-version methods, namely, Quark, RaptorX, Rosetta, MULTICOM and trRosetta in the CASP 13 and 14 server groups.


Asunto(s)
Biología Computacional/métodos , Pliegue de Proteína , Proteínas/química , Algoritmos , Conformación Proteica
16.
Bioinformatics ; 38(7): 1895-1903, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35134108

RESUMEN

MOTIVATION: Protein model quality assessment is a key component of protein structure prediction. In recent research, the voxelization feature was used to characterize the local structural information of residues, but it may be insufficient for describing residue-level topological information. Design features that can further reflect residue-level topology when combined with deep learning methods are therefore crucial to improve the performance of model quality assessment. RESULTS: We developed a deep-learning method, DeepUMQA, based on Ultrafast Shape Recognition (USR) for the residue-level single-model quality assessment. In the framework of the deep residual neural network, the residue-level USR feature was introduced to describe the topological relationship between the residue and overall structure by calculating the first moment of a set of residue distance sets and then combined with 1D, 2D and voxelization features to assess the quality of the model. Experimental results on the CASP13, CASP14 test datasets and CAMEO blind test show that USR could supplement the voxelization features to comprehensively characterize residue structure information and significantly improve model assessment accuracy. The performance of DeepUMQA ranks among the top during the state-of-the-art single-model quality assessment methods, including ProQ2, ProQ3, ProQ3D, Ornate, VoroMQA, ProteinGCN, ResNetQA, QDeep, GraphQA, ModFOLD6, ModFOLD7, ModFOLD8, QMEAN3, QMEANDisCo3 and DeepAccNet. AVAILABILITY AND IMPLEMENTATION: The DeepUMQA server is freely available at http://zhanglab-bioinf.com/DeepUMQA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Proteínas/química , Redes Neurales de la Computación , Biología Computacional/métodos
17.
Bioinformatics ; 38(2): 556-558, 2022 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-34546290

RESUMEN

MOTIVATION: Accurately identifying protein-ATP binding poses is significantly valuable for both basic structure biology and drug discovery. Although many docking methods have been designed, most of them require a user-defined binding site and are difficult to achieve a high-quality protein-ATP docking result. It is critical to develop a protein-ATP-specific blind docking method without user-defined binding sites. RESULTS: Here, we present ATPdock, a template-based method for docking ATP into protein. For each query protein, if no pocket site is given, ATPdock first identifies its most potential pocket using ATPbind, an ATP-binding site predictor; then, the template pocket, which is most similar to the given or identified pocket, is searched from the database of pocket-ligand structures using APoc, a pocket structural alignment tool; thirdly, the rough docking pose of ATP (rdATP) is generated using LS-align, a ligand structural alignment tool, to align the initial ATP pose to the template ligand corresponding to template pocket; finally, the Metropolis Monte Carlo simulation is used to fine-tune the rdATP under the guidance of AutoDock Vina energy function. Benchmark tests show that ATPdock significantly outperforms other state-of-the-art methods in docking accuracy. AVAILABILITY AND IMPLEMENTATION: https://jun-csbio.github.io/atpdock/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Adenosina Trifosfato , Proteínas , Ligandos , Proteínas/química , Sitios de Unión , Unión Proteica , Adenosina Trifosfato/metabolismo , Simulación del Acoplamiento Molecular
18.
Bioinformatics ; 38(19): 4513-4521, 2022 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-35962986

RESUMEN

MOTIVATION: With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS: In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION: http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Humanos , Programas Informáticos , Proteínas/química , Bases de Datos de Proteínas , Dominios Proteicos
19.
Anal Biochem ; 670: 115132, 2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-36997014

RESUMEN

Accurate identification of protein-protein interaction (PPI) sites is significantly important for understanding the mechanism of life and developing new drugs. However, it is expensive and time-consuming to identify PPI sites using wet-lab experiments. Developing computational methods is a new road to identify PPI sites, which can accelerate the procedure of PPI-related research. In this study, we propose a novel deep learning-based method (called D-PPIsite) to improve the accuracy of sequence-based PPI site prediction. In D-PPIsite, four discriminative sequence-driven features, i.e., position specific scoring matrix, relative solvent accessibility, position information and physical properties, are employed to feed into a well-designed deep learning module, consisting of convolutional, squeeze and excitation, and fully connected layers, to learn a prediction model. To reduce the risk of a single prediction model getting stuck in local optima, multiple prediction models with different initialization parameters are selected and integrated into one final model using the mean ensemble strategy. Experimental results on five independent testing data sets demonstrate that the proposed D-PPIsite can achieve an average accuracy of 80.2% and precision of 36.9%, covering 53.5% of all PPI sites while achieving the average Matthews correlation coefficient value (0.330) that is significantly higher than most of existing state-of-the-art prediction methods. We implement a new standalone-version predictor for predicting PPI sites, which is freely available at https://github.com/MingDongup/D-PPIsite for academic use.


Asunto(s)
Redes Neurales de la Computación , Proteínas
20.
J Chem Inf Model ; 63(20): 6451-6461, 2023 10 23.
Artículo en Inglés | MEDLINE | ID: mdl-37788318

RESUMEN

With the development of deep learning, almost all single-domain proteins can be predicted at experimental resolution. However, the structure prediction of multi-domain proteins remains a challenge. Achieving end-to-end protein domain assembly and further improving the accuracy of the full-chain modeling by accurately predicting inter-domain orientation while improving the assembly efficiency will provide significant insights into structure-based drug discovery. In this work, we propose an End-to-End Domain Assembly method based on deep learning, named E2EDA. We first develop RMNet, an EfficientNetV2-based deep learning model that fuses multiple features using an attention mechanism to predict inter-domain rigid motion. Then, the predicted rigid motions are transformed into inter-domain spatial transformations to directly assemble the full-chain model. Finally, the scoring strategy RMscore is designed to select the best model from multiple assembled models. The experimental results show that the average TM-score of the model assembled by E2EDA on the benchmark set (282) is 0.827, which is better than those of other domain assembly methods SADA (0.792) and DEMO (0.730). Meanwhile, on our constructed multi-domain data set from AlphaFold DB, the model reassembled by E2EDA is 7.0% higher in TM-score compared to the full-chain model predicted by AlphaFold2, indicating that E2EDA can capture more accurate inter-domain orientations to improve the quality of the model predicted by AlphaFold2. Furthermore, compared to SADA and AlphaFold2, E2EDA reduced the average runtime on the benchmark by 64.7% and 19.2%, respectively, indicating that E2EDA can significantly improve assembly efficiency through an end-to-end approach. The online server is available at http://zhanglab-bioinf.com/E2EDA.


Asunto(s)
Aprendizaje Profundo , Dominios Proteicos , Proteínas/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA