Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14.

Anishchenko, Ivan; Baek, Minkyung; Park, Hahnbeom; Hiranuma, Naozumi; Kim, David E; Dauparas, Justas; Mansoor, Sanaa; Humphreys, Ian R; Baker, David

Anishchenko, Ivan; Baek, Minkyung; Park, Hahnbeom; Hiranuma, Naozumi; Kim, David E; Dauparas, Justas; Mansoor, Sanaa; Humphreys, Ian R; Baker, David.

Afiliación

Anishchenko I; Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, USA.
Baek M; Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, USA.
Park H; Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, USA.
Hiranuma N; Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, USA.
Kim DE; Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Washington, USA.
Dauparas J; Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, USA.
Mansoor S; Howard Hughes Medical Institute, University of Washington, Seattle, Washington, USA.
Humphreys IR; Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, USA.
Baker D; Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington, USA.

Proteins ; 89(12): 1722-1733, 2021 12.

Article en En | MEDLINE | ID: mdl-34331359

ABSTRACT

ABSTRACT

The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.

Asunto(s)

Biología Computacional/métodos; Aprendizaje Profundo; Estructura Terciaria de Proteína; Proteínas; Programas Informáticos; Humanos; Metagenoma/genética; Proteínas/química; Proteínas/genética; Proteínas/metabolismo; Análisis de Secuencia de Proteína

Palabras clave

Rosetta; deep learning; metagenomes; protein structure prediction; refinement

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Proteínas / Estructura Terciaria de Proteína / Biología Computacional / Aprendizaje Profundo Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Proteins Asunto de la revista: BIOQUIMICA Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google