Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38517699

RESUMO

The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.


Assuntos
Microscopia Crioeletrônica , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Conformação Proteica
2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34849573

RESUMO

Meta contact, which combines different contact maps into one to improve contact prediction accuracy and effectively reduce the noise from a single contact map, is a widely used method. However, protein structure prediction using meta contact cannot fully exploit the information carried by original contact maps. In this work, a multi contact-based folding method under the evolutionary algorithm framework, MultiCFold, is proposed. In MultiCFold, the thorough information of different contact maps is directly used by populations to guide protein structure folding. In addition, noncontact is considered as an effective supplement to contact information and can further assist protein folding. MultiCFold is tested on a set of 120 nonredundant proteins, and the average TM-score and average RMSD reach 0.617 and 5.815 Å, respectively. Compared with the meta contact-based method, MetaCFold, average TM-score and average RMSD have a 6.62 and 8.82% improvement. In particular, the import of noncontact information increases the average TM-score by 6.30%. Furthermore, MultiCFold is compared with four state-of-the-art methods of CASP13 on the 24 FM targets, and results show that MultiCFold is significantly better than other methods after the full-atom relax procedure.


Assuntos
Dobramento de Proteína , Proteínas , Algoritmos , Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Proteínas/química
3.
Nucleic Acids Res ; 50(W1): W235-W245, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35536281

RESUMO

Most proteins in nature contain multiple folding units (or domains). The revolutionary success of AlphaFold2 in single-domain structure prediction showed potential to extend deep-learning techniques for multi-domain structure modeling. This work presents a significantly improved method, DEMO2, which integrates analogous template structural alignments with deep-learning techniques for high-accuracy domain structure assembly. Starting from individual domain models, inter-domain spatial restraints are first predicted with deep residual convolutional networks, where full-length structure models are assembled using L-BFGS simulations under the guidance of a hybrid energy function combining deep-learning restraints and analogous multi-domain template alignments searched from the PDB. The output of DEMO2 contains deep-learning inter-domain restraints, top-ranked multi-domain structure templates, and up to five full-length structure models. DEMO2 was tested on a large-scale benchmark and the blind CASP14 experiment, where DEMO2 was shown to significantly outperform its predecessor and the state-of-the-art protein structure prediction methods. By integrating with new deep-learning techniques, DEMO2 should help fill the rapidly increasing gap between the improved ability of tertiary structure determination and the high demand for the high-quality multi-domain protein structures. The DEMO2 server is available at https://zhanggroup.org/DEMO/.


Assuntos
Algoritmos , Aprendizado Profundo , Proteínas/química , Conformação Proteica , Biologia Computacional/métodos , Software
4.
Nucleic Acids Res ; 50(W1): W454-W464, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35420129

RESUMO

Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)-(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.


Assuntos
Aprendizado Profundo , Proteínas , Algoritmos , Conformação Proteica , Proteínas/química , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos , Software , Modelos Químicos
5.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34355233

RESUMO

Advances in the prediction of the inter-residue distance for a protein sequence have increased the accuracy to predict the correct folds of proteins with distance information. Here, we propose a distance-guided protein folding algorithm based on generalized descent direction, named GDDfold, which achieves effective structural perturbation and potential minimization in two stages. In the global stage, random-based direction is designed using evolutionary knowledge, which guides conformation population to cross potential barriers and explore conformational space rapidly in a large range. In the local stage, locally rugged potential landscape can be explored with the aid of conjugate-based direction integrated into a specific search strategy, which can improve the exploitation ability. GDDfold is tested on 347 proteins of a benchmark set, 24 template-free modeling (FM) approaches targets of CASP13 and 20 FM targets of CASP14. Results show that GDDfold correctly folds [template modeling (TM) score ≥ = 0.5] 316 out of 347 proteins, where 65 proteins have TM scores that are greater than 0.8, and significantly outperforms Rosetta-dist (distance-assisted fragment assembly method) and L-BFGSfold (distance geometry optimization method). On CASP FM targets, GDDfold is comparable with five state-of-the-art full-version methods, namely, Quark, RaptorX, Rosetta, MULTICOM and trRosetta in the CASP 13 and 14 server groups.


Assuntos
Biologia Computacional/métodos , Dobramento de Proteína , Proteínas/química , Algoritmos , Conformação Proteica
6.
Bioinformatics ; 38(7): 1895-1903, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35134108

RESUMO

MOTIVATION: Protein model quality assessment is a key component of protein structure prediction. In recent research, the voxelization feature was used to characterize the local structural information of residues, but it may be insufficient for describing residue-level topological information. Design features that can further reflect residue-level topology when combined with deep learning methods are therefore crucial to improve the performance of model quality assessment. RESULTS: We developed a deep-learning method, DeepUMQA, based on Ultrafast Shape Recognition (USR) for the residue-level single-model quality assessment. In the framework of the deep residual neural network, the residue-level USR feature was introduced to describe the topological relationship between the residue and overall structure by calculating the first moment of a set of residue distance sets and then combined with 1D, 2D and voxelization features to assess the quality of the model. Experimental results on the CASP13, CASP14 test datasets and CAMEO blind test show that USR could supplement the voxelization features to comprehensively characterize residue structure information and significantly improve model assessment accuracy. The performance of DeepUMQA ranks among the top during the state-of-the-art single-model quality assessment methods, including ProQ2, ProQ3, ProQ3D, Ornate, VoroMQA, ProteinGCN, ResNetQA, QDeep, GraphQA, ModFOLD6, ModFOLD7, ModFOLD8, QMEAN3, QMEANDisCo3 and DeepAccNet. AVAILABILITY AND IMPLEMENTATION: The DeepUMQA server is freely available at http://zhanglab-bioinf.com/DeepUMQA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Proteínas/química , Redes Neurais de Computação , Biologia Computacional/métodos
7.
Bioinformatics ; 38(19): 4513-4521, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35962986

RESUMO

MOTIVATION: With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS: In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION: http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Humanos , Software , Proteínas/química , Bases de Dados de Proteínas , Domínios Proteicos
8.
Cell Mol Life Sci ; 79(3): 176, 2022 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-35247097

RESUMO

The brain-expressed ubiquilins (UBQLNs) 1, 2 and 4 are a family of ubiquitin adaptor proteins that participate broadly in protein quality control (PQC) pathways, including the ubiquitin proteasome system (UPS). One family member, UBQLN2, has been implicated in numerous neurodegenerative diseases including ALS/FTD. UBQLN2 typically resides in the cytoplasm but in disease can translocate to the nucleus, as in Huntington's disease where it promotes the clearance of mutant Huntingtin. How UBQLN2 translocates to the nucleus and clears aberrant nuclear proteins, however, is not well understood. In a mass spectrometry screen to discover UBQLN2 interactors, we identified a family of small (13 kDa), highly homologous uncharacterized proteins, RTL8, and confirmed the interaction between UBQLN2 and RTL8 both in vitro using recombinant proteins and in vivo using mouse brain tissue. Under endogenous and overexpressed conditions, RTL8 localizes to nucleoli. When co-expressed with UBQLN2, RTL8 promotes nuclear translocation of UBQLN2. RTL8 also facilitates UBQLN2's nuclear translocation during heat shock. UBQLN2 and RTL8 colocalize within ubiquitin-enriched subnuclear structures containing PQC components. The robust effect of RTL8 on the nuclear translocation and subnuclear localization of UBQLN2 does not extend to the other brain-expressed ubiquilins, UBQLN1 and UBQLN4. Moreover, compared to UBQLN1 and UBQLN4, UBQLN2 preferentially stabilizes RTL8 levels in human cell lines and in mouse brain, supporting functional heterogeneity among UBQLNs. As a novel UBQLN2 interactor that recruits UBQLN2 to specific nuclear compartments, RTL8 may regulate UBQLN2 function in nuclear protein quality control.


Assuntos
Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Proteínas de Membrana/metabolismo , Proteínas Adaptadoras de Transdução de Sinal/deficiência , Proteínas Adaptadoras de Transdução de Sinal/genética , Sequência de Aminoácidos , Animais , Proteínas Relacionadas à Autofagia/deficiência , Proteínas Relacionadas à Autofagia/genética , Proteínas Relacionadas à Autofagia/metabolismo , Encéfalo/metabolismo , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Nucléolo Celular/metabolismo , Células HEK293 , Humanos , Proteínas de Membrana/química , Proteínas de Membrana/genética , Camundongos , Camundongos Knockout , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Ligação Proteica , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas Recombinantes/biossíntese , Proteínas Recombinantes/química , Proteínas Recombinantes/isolamento & purificação , Alinhamento de Sequência , Temperatura , Ubiquitina/metabolismo
9.
Bioinformatics ; 37(23): 4357-4365, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34245242

RESUMO

MOTIVATION: Massive local minima on the protein energy landscape often cause traditional conformational sampling algorithms to be easily trapped in local basin regions, because they find it difficult to overcome high-energy barriers. Also, the lowest energy conformation may not correspond to the native structure due to the inaccuracy of energy models. This study investigates whether these two problems can be alleviated by a sequential niche technique without loss of accuracy. RESULTS: A sequential niche multimodal conformational sampling algorithm for protein structure prediction (SNfold) is proposed in this study. In SNfold, a derating function is designed based on the knowledge learned from the previous sampling and used to construct a series of sampling-guided energy functions. These functions then help the sampling algorithm overcome high-energy barriers and avoid the re-sampling of the explored regions. In inaccurate protein energy models, the high-energy conformation that may correspond to the native structure can be sampled with successively updated sampling-guided energy functions. The proposed SNfold is tested on 300 benchmark proteins, 24 CASP13 and 19 CASP14 FM targets. Results show that SNfold correctly folds (TM-score ≥ 0.5) 231 out of 300 proteins. In particular, compared with Rosetta restrained by distance (Rosetta-dist), SNfold achieves higher average TM-score and improves the sampling efficiency by more than 100 times. On several CASP FM targets, SNfold also shows good performance compared with four state-of-the-art servers in CASP. As a plug-in conformational sampling algorithm, SNfold can be extended to other protein structure prediction methods. AVAILABILITY AND IMPLEMENTATION: The source code and executable versions are freely available at https://github.com/iobio-zjut/SNfold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Conformação Proteica , Proteínas/química , Software , Benchmarking
10.
Bioinformatics ; 38(1): 99-107, 2021 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-34459867

RESUMO

MOTIVATION: With the great progress of deep learning-based inter-residue contact/distance prediction, the discrete space formed by fragment assembly cannot satisfy the distance constraint well. Thus, the optimal solution of the continuous space may not be achieved. Designing an effective closed-loop continuous dihedral angle optimization strategy that complements the discrete fragment assembly is crucial to improve the performance of the distance-assisted fragment assembly method. RESULTS: In this article, we proposed a de novo protein structure prediction method called IPTDFold based on closed-loop iterative partition sampling, topology adjustment and residue-level distance deviation optimization. First, local dihedral angle crossover and mutation operators are designed to explore the conformational space extensively and achieve information exchange between the conformations in the population. Then, the dihedral angle rotation model of loop region with partial inter-residue distance constraints is constructed, and the rotation angle satisfying the constraints is obtained by differential evolution algorithm, so as to adjust the spatial position relationship between the secondary structures. Finally, the residue distance deviation is evaluated according to the difference between the conformation and the predicted distance, and the dihedral angle of the residue is optimized with biased probability. The final model is generated by iterating the above three steps. IPTDFold is tested on 462 benchmark proteins, 24 FM targets of CASP13 and 20 FM targets of CASP14. Results show that IPTDFold is significantly superior to the distance-assisted fragment assembly method Rosetta_D (Rosetta with distance). In particular, the prediction accuracy of IPTDFold does not decrease as the length of the protein increases. When using the same FastRelax protocol, the prediction accuracy of IPTDFold is significantly superior to that of trRosetta without orientation constraints, and is equivalent to that of the full version of trRosetta. AVAILABILITYAND IMPLEMENTATION: The source code and executable are freely available at https://github.com/iobio-zjut/IPTDFold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Biologia Computacional/métodos , Proteínas/química , Software , Algoritmos , Estrutura Secundária de Proteína , Conformação Proteica
11.
Bioinformatics ; 37(23): 4350-4356, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34185079

RESUMO

MOTIVATION: The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. RESULTS: A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages: The first is a modal exploration stage, in which a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. The second is a modal maintaining stage, where an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on a large set of 320 non-redundant proteins, where MMpred obtains models with TM-score≥0.5 on 291 cases, which is 28% higher than that of Rosetta guided with the same set of distance constraints. In addition, on 320 benchmark proteins, the enhanced version of MMpred (E-MMpred) has 167 targets better than trRosetta when the best of five models are evaluated. The average TM-score of the best model of E-MMpred is 0.732, which is comparable to trRosetta (0.730). AVAILABILITY AND IMPLEMENTATION: The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Biologia Computacional/métodos , Proteínas/química , Software , Algoritmos
12.
PLoS Comput Biol ; 17(3): e1008865, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33770072

RESUMO

The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.


Assuntos
Redes Neurais de Computação , Proteínas , Análise de Sequência de Proteína/métodos , Biologia Computacional , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Reprodutibilidade dos Testes
13.
Proc Natl Acad Sci U S A ; 116(32): 15930-15938, 2019 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-31341084

RESUMO

Most proteins exist with multiple domains in cells for cooperative functionality. However, structural biology and protein folding methods are often optimized for single-domain structures, resulting in a rapidly growing gap between the improved capability for tertiary structure determination and high demand for multidomain structure models. We have developed a pipeline, termed DEMO, for constructing multidomain protein structures by docking-based domain assembly simulations, with interdomain orientations determined by the distance profiles from analogous templates as detected through domain-level structure alignments. The pipeline was tested on a comprehensive benchmark set of 356 proteins consisting of 2-7 continuous and discontinuous domains, for which DEMO generated models with correct global fold (TM-score > 0.5) for 86% of cases with continuous domains and for 100% of cases with discontinuous domain structures, starting from randomly oriented target-domain structures. DEMO was also applied to reassemble multidomain targets in the CASP12 and CASP13 experiments using domain structures excised from the top server predictions, where the full-length DEMO models showed a significantly improved quality over the original server models. Finally, sparse restraints of mass spectrometry-generated cross-linking data and cryo-EM density maps are incorporated into DEMO, resulting in improvements in the average TM-score by 6.3% and 12.5%, respectively. The results demonstrate an efficient approach to assembling multidomain structures, which can be easily used for automated, genome-scale multidomain protein structure assembly.


Assuntos
Proteínas/química , Reagentes de Ligações Cruzadas/química , Microscopia Crioeletrônica , Bases de Dados de Proteínas , Modelos Moleculares , Domínios Proteicos , Software
14.
Proteins ; 89(12): 1911-1921, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34382712

RESUMO

This article reports and analyzes the results of protein contact and distance prediction by our methods in the 14th Critical Assessment of techniques for protein Structure Prediction (CASP14). A new deep learning-based contact/distance predictor was employed based on the ensemble of two complementary coevolution features coupling with deep residual networks. We also improved our multiple sequence alignment (MSA) generation protocol with wholesale meta-genome sequence databases. On 22 CASP14 free modeling (FM) targets, the proposed model achieved a top-L/5 long-range precision of 63.8% and a mean distance bin error of 1.494. Based on the predicted distance potentials, 11 out of 22 FM targets and all of the 14 FM/template-based modeling (TBM) targets have correctly predicted folds (TM-score >0.5), suggesting that our approach can provide reliable distance potentials for ab initio protein folding.


Assuntos
Aprendizado Profundo , Modelos Moleculares , Proteínas , Alinhamento de Sequência/métodos , Software , Biologia Computacional , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
15.
Proteins ; 89(12): 1734-1751, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34331351

RESUMO

In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.


Assuntos
Aprendizado Profundo , Ligação de Hidrogênio , Modelos Moleculares , Proteínas , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Biologia Computacional , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Software
16.
Bioinformatics ; 36(12): 3749-3757, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32227201

RESUMO

MOTIVATION: Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. RESULTS: We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew's correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. AVAILABILITY AND IMPLEMENTATION: https://zhanglab.ccmb.med.umich.edu/FUpred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Algoritmos , Biologia Computacional , Redes Neurais de Computação , Domínios Proteicos , Software
17.
Bioinformatics ; 36(8): 2443-2450, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-31860059

RESUMO

MOTIVATION: Regions that connect secondary structure elements in a protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of protein structure prediction can be improved using a loop-specific sampling strategy. RESULTS: A novel de novo protein structure prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score ≥ 0.5 models on 95 standard test proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. AVAILABILITY AND IMPLEMENTATION: The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Conformação Proteica , Estrutura Secundária de Proteína , Software
18.
J Proteome Res ; 19(4): 1351-1360, 2020 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-32200634

RESUMO

As the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, the careful analysis of its transmission and cellular mechanisms is sorely needed. In this Communication, we first analyzed two recent studies that concluded that snakes are the intermediate hosts of 2019-nCoV and that the 2019-nCoV spike protein insertions share a unique similarity to HIV-1. However, the reimplementation of the analyses, built on larger scale data sets using state-of-the-art bioinformatics methods and databases, presents clear evidence that rebuts these conclusions. Next, using metagenomic samples from Manis javanica, we assembled a draft genome of the 2019-nCoV-like coronavirus, which shows 73% coverage and 91% sequence identity to the 2019-nCoV genome. In particular, the alignments of the spike surface glycoprotein receptor binding domain revealed four times more variations in the bat coronavirus RaTG13 than in the Manis coronavirus compared with 2019-nCoV, suggesting the pangolin as a missing link in the transmission of 2019-nCoV from bats to human.


Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/virologia , Genoma Viral/genética , Interações Hospedeiro-Patógeno , Modelos Moleculares , Pneumonia Viral/virologia , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/genética , Sequência de Aminoácidos , Animais , Betacoronavirus/classificação , COVID-19 , Eutérios/virologia , HIV-1/genética , Humanos , Metagenoma , Pandemias , Estrutura Terciária de Proteína , SARS-CoV-2 , Alinhamento de Sequência , Análise de Sequência de Proteína , Serpentes/virologia
19.
IEEE Trans Evol Comput ; 24(3): 536-550, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33603321

RESUMO

Various mutation strategies show distinct advantages in differential evolution (DE). The cooperation of multiple strategies in the evolutionary process may be effective. This paper presents an underestimation-assisted global and local cooperative DE to simultaneously enhance the effectiveness and efficiency. In the proposed algorithm, two phases, namely, the global exploration and the local exploitation, are performed in each generation. In the global phase, a set of trial vectors is produced for each target individual by employing multiple strategies with strong exploration capability. Afterward, an adaptive underestimation model with a self-adapted slope control parameter is proposed to evaluate these trial vectors, the best of which is selected as the candidate. In the local phase, the better-based strategies guided by individuals that are better than the target individual are designed. For each individual accepted in the global phase, multiple trial vectors are generated by using these strategies and filtered by the underestimation value. The cooperation between the global and local phases includes two aspects. First, both of them concentrate on generating better individuals for the next generation. Second, the global phase aims to locate promising regions quickly while the local phase serves as a local search for enhancing convergence. Moreover, a simple mechanism is designed to determine the parameter of DE adaptively in the searching process. Finally, the proposed approach is applied to predict the protein 3D structure. Experimental studies on classical benchmark functions, CEC test sets, and protein structure prediction problem show that the proposed approach is superior to the competitors.

20.
Nat Commun ; 15(1): 3187, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622116

RESUMO

Transcription is crucial for the expression of genetic information and its efficient and accurate termination is required for all living organisms. Rho-dependent termination could rapidly terminate unwanted premature RNAs and play important roles in bacterial adaptation to changing environments. Although Rho has been discovered for about five decades, the regulation mechanisms of Rho-dependent termination are still not fully elucidated. Here we report that Rof is a conserved antiterminator and determine the cryogenic electron microscopy structure of Rho-Rof antitermination complex. Rof binds to the open-ring Rho hexamer and inhibits the initiation of Rho-dependent termination. Rof's N-terminal α-helix undergoes conformational changes upon binding with Rho, and is key in facilitating Rof-Rho interactions. Rof binds to Rho's primary binding site (PBS) and excludes Rho from binding with PBS ligand RNA at the initiation step. Further in vivo analyses in Salmonella Typhimurium show that Rof is required for virulence gene expression and host cell invasion, unveiling a physiological function of Rof and transcription termination in bacterial pathogenesis.


Assuntos
Fator Rho , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Virulência/genética , Fator Rho/genética , Fator Rho/metabolismo , Regulação Bacteriana da Expressão Gênica , Transcrição Gênica , Bactérias/genética , Salmonella typhimurium/genética , Salmonella typhimurium/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA