RESUMEN
The design of artificial enzymes is an emerging field of research. Although progress has been made, the catalytic proficiency of many designed enzymes is low compared to natural enzymes. Nevertheless, recently Hilvert et al. ( Nat. Chem. 2017, 9, 50-56) created a series of five artificial retro-aldolase enzymes via directed evolution, with the final variant exhibiting a rate comparable to the naturally occurring enzyme fructose 1,6 bisphosphate aldolase. We present a study of this system in atomistic detail that elucidates the effects of mutational changes on the chemical step. Transition path sampling is used to create ensembles of reactive trajectories, and committor analysis is used to identify the stochastic separatrix of each ensemble. The application of committor distribution analysis to constrained trajectories allows the identification of changes in important protein motions coupled to reaction across the generated series of the artificial retro-aldolases. We observed two different reaction mechanisms and analyzed the role of the residues participating in the reaction coordinate of each enzyme. However, only in the most evolved variant we identified a fast motion that promotes catalysis, suggesting that this rate promoting vibration was introduced during directed evolution. This study provides further evidence that protein dynamics must be taken into account in designing efficient artificial enzymes.
Asunto(s)
Materiales Biomiméticos/química , Termodinámica , Catálisis , Modelos MolecularesRESUMEN
Though typically associated with a single folded state, some globular proteins remodel their secondary and/or tertiary structures in response to cellular stimuli. AlphaFold21 (AF2) readily generates one dominant protein structure for these fold-switching (a.k.a. metamorphic) proteins2, but it often fails to predict their alternative experimentally observed structures3,4. Wayment-Steele, et al. steered AF2 to predict alternative structures of a few metamorphic proteins using a method they call AF-cluster5. However, their Paper lacks some essential controls needed to assess AF-cluster's reliability. We find that these controls show AF-cluster to be a poor predictor of metamorphic proteins. First, closer examination of the Paper's results reveals that random sequence sampling outperforms sequence clustering, challenging the claim that AF-cluster works by "deconvolving conflicting sets of couplings." Further, we observe that AF-cluster mistakes some single-folding KaiB homologs for fold switchers, a critical flaw bound to mislead users. Finally, proper error analysis reveals that AF-cluster predicts many correct structures with low confidence and some experimentally unobserved conformations with confidences similar to experimentally observed ones. For these reasons, we suggest using ColabFold6-based random sequence sampling7-augmented by other predictive approaches-as a more accurate and less computationally intense alternative to AF-cluster.
RESUMEN
Recent work suggests that AlphaFold (AF)-a deep learning-based model that can accurately infer protein structure from sequence-may discern important features of folded protein energy landscapes, defined by the diversity and frequency of different conformations in the folded state. Here, we test the limits of its predictive power on fold-switching proteins, which assume two structures with regions of distinct secondary and/or tertiary structure. We find that (1) AF is a weak predictor of fold switching and (2) some of its successes result from memorization of training-set structures rather than learned protein energetics. Combining >280,000 models from several implementations of AF2 and AF3, a 35% success rate was achieved for fold switchers likely in AF's training sets. AF2's confidence metrics selected against models consistent with experimentally determined fold-switching structures and failed to discriminate between low and high energy conformations. Further, AF captured only one out of seven experimentally confirmed fold switchers outside of its training sets despite extensive sampling of an additional ~280,000 models. Several observations indicate that AF2 has memorized structural information during training, and AF3 misassigns coevolutionary restraints. These limitations constrain the scope of successful predictions, highlighting the need for physically based methods that readily predict multiple protein conformations.
Asunto(s)
Conformación Proteica , Pliegue de Proteína , Proteínas , Proteínas/química , Modelos Moleculares , Aprendizaje Profundo , Estructura Secundaria de ProteínaRESUMEN
Although most globular proteins fold into a single stable structure, an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli. State-of-the-art algorithms predict that these fold-switching proteins adopt only one stable structure, missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that single-fold variants could be masking these signatures, we developed an approach, called Alternative Contact Enhancement (ACE), to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. ACE successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/56 fold-switching proteins from distinct families. Then, we used ACE-derived contacts to (1) predict two experimentally consistent conformations of a candidate protein with unsolved structure and (2) develop a blind prediction pipeline for fold-switching proteins. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Asunto(s)
Aminoácidos , Evolución Biológica , Humanos , AlgoritmosRESUMEN
Although most globular proteins fold into a single stable structure 1 , an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli 2 . State-of-the-art algorithms 3-5 predict that these fold-switching proteins assume only one stable structure 6,7 , missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that over-represented single-fold sequences may be masking these signatures, we developed an approach to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. This approach successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/58 fold-switching proteins from distinct families. Then, using a set of coevolved amino acid pairs predicted by our approach, we successfully biased AlphaFold2 5 to predict two experimentally consistent conformations of a candidate protein with unsolved structure. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
RESUMEN
Though many folded proteins assume one stable structure that performs one function, a small-but-increasing number remodel their secondary and tertiary structures and change their functions in response to cellular stimuli. These fold-switching proteins regulate biological processes and are associated with autoimmune dysfunction, severe acute respiratory syndrome coronavirus-2 infection, and more. Despite their biological importance, it is difficult to computationally predict fold switching. With the aim of advancing computational prediction and experimental characterization of fold switchers, this review discusses several features that distinguish fold-switching proteins from their single-fold and intrinsically disordered counterparts. First, the isolated structures of fold switchers are less stable and more heterogeneous than single folders but more stable and less heterogeneous than intrinsically disordered proteins (IDPs). Second, the sequences of single fold, fold switching, and intrinsically disordered proteins can evolve at distinct rates. Third, proteins from these three classes are best predicted using different computational techniques. Finally, late-breaking results suggest that single folders, fold switchers, and IDPs have distinct patterns of residue-residue coevolution. The review closes by discussing high-throughput and medium-throughput experimental approaches that might be used to identify new fold-switching proteins.
Asunto(s)
COVID-19 , Proteínas Intrínsecamente Desordenadas , Humanos , Proteínas Intrínsecamente Desordenadas/química , Pliegue de Proteína , Modelos MolecularesRESUMEN
Though typically associated with a single folded state, globular proteins are dynamic and often assume alternative or transient structures important for their functions1,2. Wayment-Steele, et al. steered ColabFold3 to predict alternative structures of several proteins using a method they call AF-cluster4. They propose that AF-cluster "enables ColabFold to sample alternate states of known metamorphic proteins with high confidence" by first clustering multiple sequence alignments (MSAs) in a way that "deconvolves" coevolutionary information specific to different conformations and then using these clusters as input for ColabFold. Contrary to this Coevolution Assumption, clustered MSAs are not needed to make these predictions. Rather, these alternative structures can be predicted from single sequences and/or sequence similarity, indicating that coevolutionary information is unnecessary for predictive success and may not be used at all. These results suggest that AF-cluster's predictive scope is likely limited to sequences with distinct-yet-homologous structures within ColabFold's training set.
RESUMEN
Recent work suggests that AlphaFold2 (AF2)-a deep learning-based model that can accurately infer protein structure from sequence-may discern important features of folded protein energy landscapes, defined by the diversity and frequency of different conformations in the folded state. Here, we test the limits of its predictive power on fold-switching proteins, which assume two structures with regions of distinct secondary and/or tertiary structure. Using several implementations of AF2, including two published enhanced sampling approaches, we generated >280,000 models of 93 fold-switching proteins whose experimentally determined conformations were likely in AF2's training set. Combining all models, AF2 predicted fold switching with a modest success rate of ~25%, indicating that it does not readily sample both experimentally characterized conformations of most fold switchers. Further, AF2's confidence metrics selected against models consistent with experimentally determined fold-switching conformations in favor of inconsistent models. Accordingly, these confidence metrics-though suggested to evaluate protein energetics reliably-did not discriminate between low and high energy states of fold-switching proteins. We then evaluated AF2's performance on seven fold-switching proteins outside of its training set, generating >159,000 models in total. Fold switching was accurately predicted in one of seven targets with moderate confidence. Further, AF2 demonstrated no ability to predict alternative conformations of two newly discovered targets without homologs in the set of 93 fold switchers. These results indicate that AF2 has more to learn about the underlying energetics of protein ensembles and highlight the need for further developments of methods that readily predict multiple protein conformations.
RESUMEN
Creating efficient and stable enzymes for catalysis in pharmaceutical and industrial laboratories is an important research goal. Arnold et al. used directed evolution to engineer a natural tryptophan synthase to create a mutant that is operable under laboratory conditions without the need for a natural allosteric effector. The use of directed evolution allows researchers to improve enzymes without understanding the structure-activity relationship. Here, we present a transition path sampling study of a key chemical transformation in the tryptophan synthase catalytic cycle. We observed that while directed evolution does mimic the natural allosteric effect from a stability perspective, fast protein dynamics associated with chemistry has been dramatically altered. This work provides further evidence of the role of protein dynamics in catalysis and clearly demonstrates the multifaceted complexity of mutations associated with protein engineering. This study also demonstrates a fascinating contrast between allosteric and stand-alone functions at the femtosecond time scale.
RESUMEN
Protein engineering is a growing field with a variety of experimental techniques available for altering protein function. However, creating an enzyme de novo is still in its infancy, so far yielding enzymes of modest catalytic efficiency. In this study, a system of artificial retro-aldolase enzymes found to have chemistry coupled to protein dynamics was examined. The original design was created computationally, and this protein was then subjected to directed evolution to improve the initial low catalytic efficiency. We found that this re-engineering of the enzyme resulted in rapid density fluctuations throughout the enzyme being reshaped via alterations in the hydrogen bonding network. This work also led to the discovery of a second important motion which aids in the release of an intermediate product. These results provide compelling evidence that to engineer efficient protein catalysts, fast protein dynamics need to be considered in the design.