MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction.

Wu, Tianqi; Liu, Jian; Guo, Zhiye; Hou, Jie; Cheng, Jianlin

Wu, Tianqi; Liu, Jian; Guo, Zhiye; Hou, Jie; Cheng, Jianlin.

Afiliação

Wu T; Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
Liu J; Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
Guo Z; Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
Hou J; Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA.
Cheng J; Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA. chengji@missouri.edu.

Sci Rep ; 11(1): 13155, 2021 06 23.

Article em En | MEDLINE | ID: mdl-34162922

RESUMO

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system-MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .

Assuntos

Aprendizado Profundo; Estrutura Terciária de Proteína; Software; Sequência de Aminoácidos; Modelos Moleculares; Alinhamento de Sequência; Relação Estrutura-Atividade

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Estrutura Terciária de Proteína / Aprendizado Profundo Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google