Search | VHL Regional Portal

Protein design using structure-based residue preferences.

Ding, David; Shaw, Ada Y; Sinai, Sam; Rollins, Nathan; Prywes, Noam; Savage, David F; Laub, Michael T; Marks, Debora S.

Nat Commun ; 15(1): 1639, 2024 Feb 22.

Article in English | MEDLINE | ID: mdl-38388493

ABSTRACT

Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R2 ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.

Subject(s)

Neural Networks, Computer , Proteins , Proteins/metabolism , Amino Acids/chemistry , Mutation

ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction.

Notin, Pascal; Kollasch, Aaron W; Ritter, Daniel; van Niekerk, Lood; Paul, Steffanie; Spinner, Hansen; Rollins, Nathan; Shaw, Ada; Weitzman, Ruben; Frazer, Jonathan; Dias, Mafalda; Franceschi, Dinko; Orenbuch, Rose; Gal, Yarin; Marks, Debora S.

bioRxiv ; 2023 Dec 08.

Article in English | MEDLINE | ID: mdl-38106144

ABSTRACT

Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.

An in silico method to assess antibody fragment polyreactivity.

Harvey, Edward P; Shin, Jung-Eun; Skiba, Meredith A; Nemeth, Genevieve R; Hurley, Joseph D; Wellner, Alon; Shaw, Ada Y; Miranda, Victor G; Min, Joseph K; Liu, Chang C; Marks, Debora S; Kruse, Andrew C.

Nat Commun ; 13(1): 7554, 2022 12 07.

Article in English | MEDLINE | ID: mdl-36477674

ABSTRACT

Antibodies are essential biological research tools and important therapeutic agents, but some exhibit non-specific binding to off-target proteins and other biomolecules. Such polyreactive antibodies compromise screening pipelines, lead to incorrect and irreproducible experimental results, and are generally intractable for clinical development. Here, we design a set of experiments using a diverse naïve synthetic camelid antibody fragment (nanobody) library to enable machine learning models to accurately assess polyreactivity from protein sequence (AUC > 0.8). Moreover, our models provide quantitative scoring metrics that predict the effect of amino acid substitutions on polyreactivity. We experimentally test our models' performance on three independent nanobody scaffolds, where over 90% of predicted substitutions successfully reduced polyreactivity. Importantly, the models allow us to diminish the polyreactivity of an angiotensin II type I receptor antagonist nanobody, without compromising its functional properties. We provide a companion web-server that offers a straightforward means of predicting polyreactivity and polyreactivity-reducing mutations for any given nanobody sequence.

Subject(s)

Immunoglobulin Fragments

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL