Your browser doesn't support javascript.
loading
Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations.
Diaz, Daniel J; Gong, Chengyue; Ouyang-Zhang, Jeffrey; Loy, James M; Wells, Jordan; Yang, David; Ellington, Andrew D; Dimakis, Alexandros G; Klivans, Adam R.
Affiliation
  • Diaz DJ; UT Austin, Department of Computer Science, Austin, TX, 78712, USA. dannyjdiaz305@gmail.com.
  • Gong C; Intelligent Proteins, LLC, Austin, TX, 78712, USA. dannyjdiaz305@gmail.com.
  • Ouyang-Zhang J; UT Austin, Department of Chemistry, Austin, TX, 78712, USA. dannyjdiaz305@gmail.com.
  • Loy JM; UT Austin, Department of Computer Science, Austin, TX, 78712, USA.
  • Wells J; UT Austin, Department of Computer Science, Austin, TX, 78712, USA.
  • Yang D; Intelligent Proteins, LLC, Austin, TX, 78712, USA.
  • Ellington AD; UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA.
  • Dimakis AG; UT Austin, McKetta Department of Chemical Engineering, Austin, TX, 78712, USA.
  • Klivans AR; UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA.
Nat Commun ; 15(1): 6170, 2024 Jul 23.
Article in En | MEDLINE | ID: mdl-39043654
ABSTRACT
Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Thermodynamics / Proteins / Protein Stability / Mutation Language: En Journal: Nat Commun Journal subject: BIOLOGIA / CIENCIA Year: 2024 Type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Thermodynamics / Proteins / Protein Stability / Mutation Language: En Journal: Nat Commun Journal subject: BIOLOGIA / CIENCIA Year: 2024 Type: Article Affiliation country: United States