Pesquisa | Portal Regional da BVS

1.

Diffusion models in bioinformatics and computational biology.

Guo, Zhiye; Liu, Jian; Wang, Yanli; Chen, Mengrui; Wang, Duolin; Xu, Dong; Cheng, Jianlin.

Nat Rev Bioeng ; 2(2): 136-154, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38576453

RESUMO

Denoising diffusion models embody a type of generative artificial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three diffusion modelling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks and score stochastic differential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein-ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source diffusion model tools and consider the future applications of diffusion models in bioinformatics.

2.

Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15.

Liu, Jian; Guo, Zhiye; Wu, Tianqi; Roy, Raj S; Quadir, Farhan; Chen, Chen; Cheng, Jianlin.

Commun Biol ; 6(1): 1140, 2023 11 10.

Artigo em Inglês | MEDLINE | ID: mdl-37949999

RESUMO

To enhance the AlphaFold-Multimer-based protein complex structure prediction, we developed a quaternary structure prediction system (MULTICOM) to improve the input fed to AlphaFold-Multimer and evaluate and refine its outputs. MULTICOM samples diverse multiple sequence alignments (MSAs) and templates for AlphaFold-Multimer to generate structural predictions by using both traditional sequence alignments and Foldseek-based structure alignments, ranks structural predictions through multiple complementary metrics, and refines the structural predictions via a Foldseek structure alignment-based refinement method. The MULTICOM system with different implementations was blindly tested in the assembly structure prediction in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 as both server and human predictors. MULTICOM_qa ranked 3rd among 26 CASP15 server predictors and MULTICOM_human ranked 7th among 87 CASP15 server and human predictors. The average TM-score of the first predictions submitted by MULTICOM_qa for CASP15 assembly targets is ~0.76, 5.3% higher than ~0.72 of the standard AlphaFold-Multimer. The average TM-score of the best of top 5 predictions submitted by MULTICOM_qa is ~0.80, about 8% higher than ~0.74 of the standard AlphaFold-Multimer. Moreover, the Foldseek Structure Alignment-based Multimer structure Generation (FSAMG) method outperforms the widely used sequence alignment-based multimer structure generation.

Assuntos

Benchmarking , Proteínas , Humanos , Proteínas/química , Alinhamento de Sequência

3.

Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment.

Lensink, Marc F; Brysbaert, Guillaume; Raouraoua, Nessim; Bates, Paul A; Giulini, Marco; Honorato, Rodrigo V; van Noort, Charlotte; Teixeira, Joao M C; Bonvin, Alexandre M J J; Kong, Ren; Shi, Hang; Lu, Xufeng; Chang, Shan; Liu, Jian; Guo, Zhiye; Chen, Xiao; Morehead, Alex; Roy, Raj S; Wu, Tianqi; Giri, Nabin; Quadir, Farhan; Chen, Chen; Cheng, Jianlin; Del Carpio, Carlos A; Ichiishi, Eichiro; Rodriguez-Lumbreras, Luis A; Fernandez-Recio, Juan; Harmalkar, Ameya; Chu, Lee-Shin; Canner, Sam; Smanta, Rituparna; Gray, Jeffrey J; Li, Hao; Lin, Peicong; He, Jiahua; Tao, Huanyu; Huang, Sheng-You; Roel-Touris, Jorge; Jimenez-Garcia, Brian; Christoffer, Charles W; Jain, Anika J; Kagaya, Yuki; Kannan, Harini; Nakamura, Tsukasa; Terashi, Genki; Verburgt, Jacob C; Zhang, Yuanyuan; Zhang, Zicong; Fujuta, Hayato; Sekijima, Masakazu.

Proteins ; 91(12): 1658-1683, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37905971

RESUMO

We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.

Assuntos

Algoritmos , Mapeamento de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Conformação Proteica , Ligação Proteica , Simulação de Acoplamento Molecular , Biologia Computacional/métodos , Software

4.

Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15.

Liu, Jian; Guo, Zhiye; Wu, Tianqi; Roy, Raj S; Chen, Chen; Cheng, Jianlin.

Commun Chem ; 6(1): 188, 2023 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-37679431

RESUMO

Since the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14), AlphaFold2 has become the standard method for protein tertiary structure prediction. One remaining challenge is to further improve its prediction. We developed a new version of the MULTICOM system to sample diverse multiple sequence alignments (MSAs) and structural templates to improve the input for AlphaFold2 to generate structural models. The models are then ranked by both the pairwise model similarity and AlphaFold2 self-reported model quality score. The top ranked models are refined by a novel structure alignment-based refinement method powered by Foldseek. Moreover, for a monomer target that is a subunit of a protein assembly (complex), MULTICOM integrates tertiary and quaternary structure predictions to account for tertiary structural changes induced by protein-protein interaction. The system participated in the tertiary structure prediction in 2022 CASP15 experiment. Our server predictor MULTICOM_refine ranked 3rd among 47 CASP15 server predictors and our human predictor MULTICOM ranked 7th among all 132 human and server predictors. The average GDT-TS score and TM-score of the first structural models that MULTICOM_refine predicted for 94 CASP15 domains are ~0.80 and ~0.92, 9.6% and 8.2% higher than ~0.73 and 0.85 of the standard AlphaFold2 predictor respectively.

5.

Single-cell Hi-C data enhancement with deep residual and generative adversarial networks.

Wang, Yanli; Guo, Zhiye; Cheng, Jianlin.

Bioinformatics ; 39(8)2023 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-37498561

RESUMO

MOTIVATION: The spatial genome organization of a eukaryotic cell is important for its function. The development of single-cell technologies for probing the 3D genome conformation, especially single-cell chromosome conformation capture techniques, has enabled us to understand genome function better than before. However, due to extreme sparsity and high noise associated with single-cell Hi-C data, it is still difficult to study genome structure and function using the HiC-data of one single cell. RESULTS: In this work, we developed a deep learning method ScHiCEDRN based on deep residual networks and generative adversarial networks for the imputation and enhancement of Hi-C data of a single cell. In terms of both image evaluation and Hi-C reproducibility metrics, ScHiCEDRN outperforms the four deep learning methods (DeepHiC, HiCPlus, HiCSR, and Loopenhance) on enhancing the raw single-cell Hi-C data of human and Drosophila. The experiments also show that it can generate single-cell Hi-C data more suitable for identifying topologically associating domain boundaries and reconstructing 3D chromosome structures than the existing methods. Moreover, ScHiCEDRN's performance generalizes well across different single cells and cell types, and it can be applied to improving population Hi-C data. AVAILABILITY AND IMPLEMENTATION: The source code of ScHiCEDRN is available at the GitHub repository: https://github.com/BioinfoMachineLearning/ScHiCEDRN.

Assuntos

Cromossomos , Genoma , Humanos , Reprodutibilidade dos Testes , Estruturas Cromossômicas , Software , Cromatina

6.

Enhancing AlphaFold-Multimer-based Protein Complex Structure Prediction with MULTICOM in CASP15.

Liu, Jian; Guo, Zhiye; Wu, Tianqi; Roy, Raj S; Quadir, Farhan; Chen, Chen; Cheng, Jianlin.

bioRxiv ; 2023 May 18.

Artigo em Inglês | MEDLINE | ID: mdl-37293073

RESUMO

AlphaFold-Multimer has emerged as the state-of-the-art tool for predicting the quaternary structure of protein complexes (assemblies or multimers) since its release in 2021. To further enhance the AlphaFold-Multimer-based complex structure prediction, we developed a new quaternary structure prediction system (MULTICOM) to improve the input fed to AlphaFold-Multimer and evaluate and refine the outputs generated by AlphaFold2-Multimer. Specifically, MULTICOM samples diverse multiple sequence alignments (MSAs) and templates for AlphaFold-Multimer to generate structural models by using both traditional sequence alignments and new Foldseek-based structure alignments, ranks structural models through multiple complementary metrics, and refines the structural models via a Foldseek structure alignment-based refinement method. The MULTICOM system with different implementations was blindly tested in the assembly structure prediction in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 as both server and human predictors. Our server (MULTICOM_qa) ranked 3rd among 26 CASP15 server predictors and our human predictor (MULTICOM_human) ranked 7th among 87 CASP15 server and human predictors. The average TM-score of the first models predicted by MULTICOM_qa for CASP15 assembly targets is ~0.76, 5.3% higher than ~0.72 of the standard AlphaFold-Multimer. The average TM-score of the best of top 5 models predicted by MULTICOM_qa is ~0.80, about 8% higher than ~0.74 of the standard AlphaFold-Multimer. Moreover, the novel Foldseek Structure Alignment-based Model Generation (FSAMG) method based on AlphaFold-Multimer outperforms the widely used sequence alignment-based model generation. The source code of MULTICOM is available at: https://github.com/BioinfoMachineLearning/MULTICOM3.

7.

Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15.

Roy, Raj S; Liu, Jian; Giri, Nabin; Guo, Zhiye; Cheng, Jianlin.

Proteins ; 91(12): 1889-1902, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37357816

RESUMO

Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and performed very well in estimating the global structure accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analyzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA.

Assuntos

Aprendizado Profundo , Modelos Moleculares , Proteínas/química

8.

Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph transformer.

Wu, Tianqi; Guo, Zhiye; Cheng, Jianlin.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-37144951

RESUMO

MOTIVATION: The state-of-art protein structure prediction methods such as AlphaFold are being widely used to predict structures of uncharacterized proteins in biomedical research. There is a significant need to further improve the quality and nativeness of the predicted structures to enhance their usability. In this work, we develop ATOMRefine, a deep learning-based, end-to-end, all-atom protein structural model refinement method. It uses a SE(3)-equivariant graph transformer network to directly refine protein atomic coordinates in a predicted tertiary structure represented as a molecular graph. RESULTS: The method is first trained and tested on the structural models in AlphaFoldDB whose experimental structures are known, and then blindly tested on 69 CASP14 regular targets and 7 CASP14 refinement targets. ATOMRefine improves the quality of both backbone atoms and all-atom conformation of the initial structural models generated by AlphaFold. It also performs better than two state-of-the-art refinement methods in multiple evaluation metrics including an all-atom model quality score-the MolProbity score based on the analysis of all-atom contacts, bond length, atom clashes, torsion angles, and side-chain rotamers. As ATOMRefine can refine a protein structure quickly, it provides a viable, fast solution for improving protein geometry and fixing structural errors of predicted structures through direct coordinate refinement. AVAILABILITY AND IMPLEMENTATION: The source code of ATOMRefine is available in the GitHub repository (https://github.com/BioinfoMachineLearning/ATOMRefine). All the required data for training and testing are available at https://doi.org/10.5281/zenodo.6944368.

Assuntos

Proteínas , Software , Proteínas/química , Conformação Molecular

9.

Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15.

Roy, Raj S; Liu, Jian; Giri, Nabin; Guo, Zhiye; Cheng, Jianlin.

bioRxiv ; 2023 Mar 12.

Artigo em Inglês | MEDLINE | ID: mdl-36945536

RESUMO

Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and ranked first out of 24 predictors in estimating the global accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analayzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA. The source code of MULTICOM_qa is available at https://github.com/BioinfoMachineLearning/MULTICOM_qa .

10.

Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks.

Guo, Zhiye; Liu, Jian; Skolnick, Jeffrey; Cheng, Jianlin.

Nat Commun ; 13(1): 6963, 2022 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-36379943

RESUMO

Residue-residue distance information is useful for predicting tertiary structures of protein monomers or quaternary structures of protein complexes. Many deep learning methods have been developed to predict intra-chain residue-residue distances of monomers accurately, but few methods can accurately predict inter-chain residue-residue distances of complexes. We develop a deep learning method CDPred (i.e., Complex Distance Prediction) based on the 2D attention-powered residual network to address the gap. Tested on two homodimer datasets, CDPred achieves the precision of 60.94% and 42.93% for top L/5 inter-chain contact predictions (L: length of the monomer in homodimer), respectively, substantially higher than DeepHomo's 37.40% and 23.08% and GLINTER's 48.09% and 36.74%. Tested on the two heterodimer datasets, the top Ls/5 inter-chain contact prediction precision (Ls: length of the shorter monomer in heterodimer) of CDPred is 47.59% and 22.87% respectively, surpassing GLINTER's 23.24% and 13.49%. Moreover, the prediction of CDPred is complementary with that of AlphaFold2-multimer.

Assuntos

Biologia Computacional , Redes Neurais de Computação , Proteínas/química

11.

Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps.

Mahmud, Sajid; Guo, Zhiye; Quadir, Farhan; Liu, Jian; Cheng, Jianlin.

BMC Bioinformatics ; 23(1): 283, 2022 Jul 19.

Artigo em Inglês | MEDLINE | ID: mdl-35854211

RESUMO

The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep learning method based on multi-head U-Nets (called DistDom) to predict protein domain boundaries utilizing 1D sequence features and predicted 2D inter-residue distance map as input. The 1D features contain the evolutionary and physicochemical information of protein sequences, whereas the 2D distance map includes the structural information of proteins that was rarely used in domain boundary prediction before. The 1D and 2D features are processed by the 1D and 2D U-Nets respectively to generate hidden features. The hidden features are then used by the multi-head attention to predict the probability of each residue of a protein being in a domain boundary, leveraging both local and global information in the features. The residue-level domain boundary predictions can be used to classify proteins as single-domain or multi-domain proteins. It classifies the CASP14 single-domain and multi-domain targets at the accuracy of 75.9%, 13.28% more accurate than the state-of-the-art method. Tested on the CASP14 multi-domain protein targets with expert annotated domain boundaries, the average per-target F1 measure score of the domain boundary prediction by DistDom is 0.263, 29.56% higher than the state-of-the-art method.

Assuntos

Algoritmos , Proteínas , Sequência de Aminoácidos , Domínios Proteicos , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína

12.

Melatonin Alleviates Venous Dysfunction in a Mouse Model of Iliac Vein Occlusion.

Guo, Zhiye; Du, Xiaolong; Zhou, Yu; Xu, Dandan; Xu, Xingyu; Lu, Shan; Ran, Feng.

Front Immunol ; 13: 870981, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35585973

RESUMO

The iliac vein can be severely stenosed and occluded due to thrombosis, tumor compression, or an anatomical abnormality. Such occlusion could result in limb swelling, venous claudication, and persistent leg ulcers. Its devastating sequelae heavily impact patients lifestyles and the social economy. Due to a lack of a stable and easy-to-operate iliac vein occlusion (IVO) model, its underlying molecular mechanism and pathophysiological process has not been completely understood. Melatonin (MLT) plays a critical role in anti-inflammation, but the potential protective effect of melatonin on venous dysfunction induced by IVO has not been revealed. In this study, a mouse model of IVO was established to study the effects of MLT on injured veins. The results of laser speckle images and Evans blue showed that MLT inhibited venous permeability in an IVO mouse model. Furthermore, MLT suppressed inflammation of surrounding tissues close to the affected vein by inhibiting the mRNA levels of TNF-α, IL-1α, and MCP-1. In addition, endothelial injury was inhibited by MLT using zonula occludens protein-1 (ZO-1) staining. Taken together, we elucidated the therapeutic effect of MLT on vascular dysfunction induced by IVO, mainly by inhibiting the TNF-α, IL-1α, and MCP-1 mRNA levels, improving endothelial function, and inhibiting vascular leakage.

Assuntos

Melatonina , Doenças Vasculares , Animais , Modelos Animais de Doenças , Humanos , Veia Ilíaca , Melatonina/farmacologia , Melatonina/uso terapêutico , Camundongos , RNA Mensageiro , Fator de Necrose Tumoral alfa , Doenças Vasculares/tratamento farmacológico

13.

Distance-based reconstruction of protein quaternary structures from inter-chain contacts.

Soltanikazemi, Elham; Quadir, Farhan; Roy, Raj S; Guo, Zhiye; Cheng, Jianlin.

Proteins ; 90(3): 720-731, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-34716620

RESUMO

Predicting the quaternary structure of protein complex is an important problem. Inter-chain residue-residue contact prediction can provide useful information to guide the ab initio reconstruction of quaternary structures. However, few methods have been developed to build quaternary structures from predicted inter-chain contacts. Here, we develop the first method based on gradient descent optimization (GD) to build quaternary structures of protein dimers utilizing inter-chain contacts as distance restraints. We evaluate GD on several datasets of homodimers and heterodimers using true/predicted contacts and monomer structures as input. GD consistently performs better than both simulated annealing and Markov Chain Monte Carlo simulation. Starting from an arbitrarily quaternary structure randomly initialized from the tertiary structures of protein chains and using true inter-chain contacts as input, GD can reconstruct high-quality structural models for homodimers and heterodimers with average TM-score ranging from 0.92 to 0.99 and average interface root mean square distance from 0.72 Å to 1.64 Å. On a dataset of 115 homodimers, using predicted inter-chain contacts as restraints, the average TM-score of the structural models built by GD is 0.76. For 46% of the homodimers, high-quality structural models with TM-score ≥ 0.9 are reconstructed from predicted contacts. There is a strong correlation between the quality of the reconstructed models and the precision and recall of predicted contacts. Only a moderate precision or recall of inter-chain contact prediction is needed to build good structural models for most homodimers. Moreover, GD improves the quality of quaternary structures predicted by AlphaFold2 on a Critical Assessment of Techniques for Protein Structure Prediction-Critical Assessments of Predictions of Interactions dataset.

Assuntos

Proteínas/química , Biologia Computacional , Bases de Dados de Proteínas , Simulação de Acoplamento Molecular , Método de Monte Carlo , Ligação Proteica , Multimerização Proteica , Estrutura Quaternária de Proteína

14.

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14.

Liu, Jian; Wu, Tianqi; Guo, Zhiye; Hou, Jie; Cheng, Jianlin.

Proteins ; 90(1): 58-72, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34291486

RESUMO

Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Assuntos

Aprendizado Profundo , Modelos Moleculares , Proteínas/química , Algoritmos , Humanos , Estrutura Terciária de Proteína/genética , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Software

15.

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction.

Wu, Tianqi; Liu, Jian; Guo, Zhiye; Hou, Jie; Cheng, Jianlin.

Sci Rep ; 11(1): 13155, 2021 06 23.

Artigo em Inglês | MEDLINE | ID: mdl-34162922

RESUMO

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system-MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .

Assuntos

Aprendizado Profundo , Estrutura Terciária de Proteína , Software , Sequência de Aminoácidos , Modelos Moleculares , Alinhamento de Sequência , Relação Estrutura-Atividade

16.

Correction to: DeepDist: realvalue interresidue distance prediction with deep residual convolutional network.

Wu, Tianqi; Guo, Zhiye; Hou, Jie; Cheng, Jianlin.

BMC Bioinformatics ; 22(1): 354, 2021 Jun 29.

Artigo em Inglês | MEDLINE | ID: mdl-34187363

17.

Improving deep learning-based protein distance prediction in CASP14.

Guo, Zhiye; Wu, Tianqi; Liu, Jian; Hou, Jie; Cheng, Jianlin.

Bioinformatics ; 37(19): 3190-3196, 2021 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-33961009

RESUMO

MOTIVATION: Accurate prediction of residue-residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. RESULTS: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. AVAILABILITY AND IMPLEMENTATION: The software package, source code and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

18.

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14.

Chen, Xiao; Liu, Jian; Guo, Zhiye; Wu, Tianqi; Hou, Jie; Cheng, Jianlin.

Sci Rep ; 11(1): 10943, 2021 05 25.

Artigo em Inglês | MEDLINE | ID: mdl-34035363

RESUMO

The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.

Assuntos

Caspases/metabolismo , Biologia Computacional , Aprendizado Profundo , Modelos Moleculares , Conformação Proteica

19.

Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction.

Chen, Chen; Wu, Tianqi; Guo, Zhiye; Cheng, Jianlin.

Proteins ; 89(6): 697-707, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-33538038

RESUMO

Deep learning has emerged as a revolutionary technology for protein residue-residue contact prediction since the 2012 CASP10 competition. Considerable advancements in the predictive power of the deep learning-based contact predictions have been achieved since then. However, little effort has been put into interpreting the black-box deep learning methods. Algorithms that can interpret the relationship between predicted contact maps and the internal mechanism of the deep learning architectures are needed to explore the essential components of contact inference and improve their explainability. In this study, we present an attention-based convolutional neural network for protein contact prediction, which consists of two attention mechanism-based modules: sequence attention and regional attention. Our benchmark results on the CASP13 free-modeling targets demonstrate that the two attention modules added on top of existing typical deep learning models exhibit a complementary effect that contributes to prediction improvements. More importantly, the inclusion of the attention mechanism provides interpretable patterns that contain useful insights into the key fold-determining residues in proteins. We expect the attention-based model can provide a reliable and practically interpretable technique that helps break the current bottlenecks in explaining deep neural networks for contact prediction. The source code of our method is available at https://github.com/jianlin-cheng/InterpretContactMap.

Assuntos

Biologia Computacional/estatística & dados numéricos , Aprendizado Profundo , Proteínas/química , Software , Benchmarking , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Ligação Proteica , Conformação Proteica , Domínios e Motivos de Interação entre Proteínas , Proteínas/metabolismo , Projetos de Pesquisa , Alinhamento de Sequência , Análise de Sequência de Proteína

20.

DeepDist: real-value inter-residue distance prediction with deep residual convolutional network.

Wu, Tianqi; Guo, Zhiye; Hou, Jie; Cheng, Jianlin.

BMC Bioinformatics ; 22(1): 30, 2021 Jan 25.

Artigo em Inglês | MEDLINE | ID: mdl-33494711

RESUMO

BACKGROUND: Driven by deep learning, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently, most of the distance prediction methods classify inter-residue distances into multiple distance intervals instead of directly predicting real-value distances. The output of the former has to be converted into real-value distances to be used in tertiary structure prediction. RESULTS: To explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. Tested on 43 CASP13 hard domains, DeepDist achieves comparable performance in real-value distance prediction and multi-class distance prediction. The average mean square error (MSE) of DeepDist's real-value distance prediction is 0.896 Å2 when filtering out the predicted distance ≥ 16 Å, which is lower than 1.003 Å2 of DeepDist's multi-class distance prediction. When distance predictions are converted into contact predictions at 8 Å threshold (the standard threshold in the field), the precision of top L/5 and L/2 contact predictions of DeepDist's multi-class distance prediction is 79.3% and 66.1%, respectively, higher than 78.6% and 64.5% of its real-value distance prediction and the best results in the CASP13 experiment. CONCLUSIONS: DeepDist can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE. Finally, we demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone.

Assuntos

Algoritmos , Biologia Computacional , Proteínas , Modelos Moleculares , Proteínas/química

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA