Your browser doesn't support javascript.
loading
GENERALIST: A latent space based generative model for protein sequence families.
Akl, Hoda; Emison, Brooke; Zhao, Xiaochuan; Mondal, Arup; Perez, Alberto; Dixit, Purushottam D.
Afiliação
  • Akl H; Department of Physics, University of Florida, Gainesville, Florida, United States of America.
  • Emison B; Department of Biomedical Engineering, Yale University, New Haven, Connecticut, United States of America.
  • Zhao X; Department of Physics, University of Florida, Gainesville, Florida, United States of America.
  • Mondal A; Department of Chemistry, University of Florida, Gainesville, Florida, United States of America.
  • Perez A; Department of Chemistry, University of Florida, Gainesville, Florida, United States of America.
  • Dixit PD; Department of Biomedical Engineering, Yale University, New Haven, Connecticut, United States of America.
PLoS Comput Biol ; 19(11): e1011655, 2023 Nov.
Article em En | MEDLINE | ID: mdl-38011273
ABSTRACT
Generative models of protein sequence families are an important tool in the repertoire of protein scientists and engineers alike. However, state-of-the-art generative approaches face inference, accuracy, and overfitting- related obstacles when modeling moderately sized to large proteins and/or protein families with low sequence coverage. Here, we present a simple to learn, tunable, and accurate generative model, GENERALIST GENERAtive nonLInear tenSor-factorizaTion for protein sequences. GENERALIST accurately captures several high order summary statistics of amino acid covariation. GENERALIST also predicts conservative local optimal sequences which are likely to fold in stable 3D structure. Importantly, unlike current methods, the density of sequences in GENERALIST-modeled sequence ensembles closely resembles the corresponding natural ensembles. Finally, GENERALIST embeds protein sequences in an informative latent space. GENERALIST will be an important tool to study protein sequence variability.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas / Aminoácidos Idioma: En Revista: PLoS Comput Biol Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas / Aminoácidos Idioma: En Revista: PLoS Comput Biol Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos