Your browser doesn't support javascript.
loading
Transferable deep generative modeling of intrinsically disordered protein conformations.
Janson, Giacomo; Feig, Michael.
Afiliación
  • Janson G; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America.
  • Feig M; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America.
PLoS Comput Biol ; 20(5): e1012144, 2024 May.
Article en En | MEDLINE | ID: mdl-38781245
ABSTRACT
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Conformación Proteica / Redes Neurales de la Computación / Biología Computacional / Proteínas Intrínsecamente Desordenadas Idioma: En Revista: PLoS Comput Biol Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Conformación Proteica / Redes Neurales de la Computación / Biología Computacional / Proteínas Intrínsecamente Desordenadas Idioma: En Revista: PLoS Comput Biol Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos