RESUMO
Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.
Assuntos
Inteligência Artificial , Proteínas , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Engenharia de Proteínas , Aprendizado ProfundoRESUMO
Molecular switch proteins whose cycling between states is controlled by opposing regulators1,2 are central to biological signal transduction. As switch proteins function within highly connected interaction networks3, the fundamental question arises of how functional specificity is achieved when different processes share common regulators. Here we show that functional specificity of the small GTPase switch protein Gsp1 in Saccharomyces cerevisiae (the homologue of the human protein RAN)4 is linked to differential sensitivity of biological processes to different kinetics of the Gsp1 (RAN) switch cycle. We make 55 targeted point mutations to individual protein interaction interfaces of Gsp1 (RAN) and show through quantitative genetic5 and physical interaction mapping that Gsp1 (RAN) interface perturbations have widespread cellular consequences. Contrary to expectation, the cellular effects of the interface mutations group by their biophysical effects on kinetic parameters of the GTPase switch cycle and not by the targeted interfaces. Instead, we show that interface mutations allosterically tune the GTPase cycle kinetics. These results suggest a model in which protein partner binding, or post-translational modifications at distal sites, could act as allosteric regulators of GTPase switching. Similar mechanisms may underlie regulation by other GTPases, and other biological switches. Furthermore, our integrative platform to determine the quantitative consequences of molecular perturbations may help to explain the effects of disease mutations that target central molecular switches.
Assuntos
Regulação Alostérica/genética , Proteínas Monoméricas de Ligação ao GTP/genética , Proteínas Monoméricas de Ligação ao GTP/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Mutação Puntual , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae , Sítios de Ligação/genética , Domínio Catalítico/genética , Proteínas Ativadoras de GTPase/metabolismo , Fatores de Troca do Nucleotídeo Guanina/metabolismo , Guanosina Trifosfato/metabolismo , Cinética , Ligação Proteica/genética , Saccharomyces cerevisiae/enzimologia , Saccharomyces cerevisiae/genéticaRESUMO
Calcium/calmodulin-dependent kinase II (CaMKII) forms a highly conserved dodecameric assembly that is sensitive to the frequency of calcium pulse trains. Neither the structure of the dodecameric assembly nor how it regulates CaMKII are known. We present the crystal structure of an autoinhibited full-length human CaMKII holoenzyme, revealing an unexpected compact arrangement of kinase domains docked against a central hub, with the calmodulin-binding sites completely inaccessible. We show that this compact docking is important for the autoinhibition of the kinase domains and for setting the calcium response of the holoenzyme. Comparison of CaMKII isoforms, which differ in the length of the linker between the kinase domain and the hub, demonstrates that these interactions can be strengthened or weakened by changes in linker length. This equilibrium between autoinhibited states provides a simple mechanism for tuning the calcium response without changes in either the hub or the kinase domains.
Assuntos
Proteína Quinase Tipo 2 Dependente de Cálcio-Calmodulina/química , Proteína Quinase Tipo 2 Dependente de Cálcio-Calmodulina/metabolismo , Sequência de Aminoácidos , Animais , Cristalografia por Raios X , Holoenzimas/química , Holoenzimas/metabolismo , Humanos , Modelos Moleculares , Conformação Proteica , Estrutura Terciária de Proteína , Alinhamento de SequênciaRESUMO
With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the two-state design problem of the foldswitching protein RfaH as an in-depth case study, and PapD and calmodulin as examples of higher-dimensional design problems, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
Assuntos
Algoritmos , Biologia Computacional , Proteínas , Biologia Computacional/métodos , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Engenharia de Proteínas/métodos , Análise de Sequência de Proteína/métodosRESUMO
SignificanceComputational protein design promises to advance applications in medicine and biotechnology by creating proteins with many new and useful functions. However, new functions require the design of specific and often irregular atom-level geometries, which remains a major challenge. Here, we develop computational methods that design and predict local protein geometries with greater accuracy than existing methods. Then, as a proof of concept, we leverage these methods to design new protein conformations in the enzyme ketosteroid isomerase that change the protein's preference for a key functional residue. Our computational methods are openly accessible and can be applied to the design of other intricate geometries customized for new user-defined protein functions.
Assuntos
Aminoácidos/química , Desenho Assistido por Computador , Engenharia de Proteínas/métodos , Proteínas/química , Robótica , Algoritmos , Biologia Computacional/métodos , Isomerases/química , Modelos Moleculares , Conformação Proteica , Proteínas/genética , Reprodutibilidade dos Testes , Relação Estrutura-AtividadeRESUMO
An essential mechanism for severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection begins with the viral spike protein binding to the human receptor protein angiotensin-converting enzyme II (ACE2). Here, we describe a stepwise engineering approach to generate a set of affinity optimized, enzymatically inactivated ACE2 variants that potently block SARS-CoV-2 infection of cells. These optimized receptor traps tightly bind the receptor binding domain (RBD) of the viral spike protein and prevent entry into host cells. We first computationally designed the ACE2-RBD interface using a two-stage flexible protein backbone design process that improved affinity for the RBD by up to 12-fold. These designed receptor variants were affinity matured an additional 14-fold by random mutagenesis and selection using yeast surface display. The highest-affinity variant contained seven amino acid changes and bound to the RBD 170-fold more tightly than wild-type ACE2. With the addition of the natural ACE2 collectrin domain and fusion to a human immunoglobulin crystallizable fragment (Fc) domain for increased stabilization and avidity, the most optimal ACE2 receptor traps neutralized SARS-CoV-2-pseudotyped lentivirus and authentic SARS-CoV-2 virus with half-maximal inhibitory concentrations (IC50s) in the 10- to 100-ng/mL range. Engineered ACE2 receptor traps offer a promising route to fighting infections by SARS-CoV-2 and other ACE2-using coronaviruses, with the key advantage that viral resistance would also likely impair viral entry. Moreover, such traps can be predesigned for viruses with known entry receptors for faster therapeutic response without the need for neutralizing antibodies isolated from convalescent patients.
Assuntos
Enzima de Conversão de Angiotensina 2/metabolismo , Antivirais/química , Desenho de Fármacos , Engenharia de Proteínas/métodos , Glicoproteína da Espícula de Coronavírus/metabolismo , Enzima de Conversão de Angiotensina 2/química , Enzima de Conversão de Angiotensina 2/genética , Antivirais/metabolismo , Sítios de Ligação , Células HEK293 , Humanos , Simulação de Acoplamento Molecular , Mutação , Biblioteca de Peptídeos , Ligação Proteica , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Saccharomyces cerevisiae , Glicoproteína da Espícula de Coronavírus/químicaRESUMO
The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.
Assuntos
Proteínas/química , Bases de Dados de Proteínas , Conformação ProteicaRESUMO
A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.
Assuntos
Dobramento de Proteína , Sítios de Ligação , Ligantes , Conformação ProteicaRESUMO
Synthetic biology approaches living systems with an engineering perspective and promises to deliver solutions to global challenges in healthcare and sustainability. A critical component is the design of biomolecular circuits with programmable input-output behaviors. Such circuits typically rely on a sensor module that recognizes molecular inputs, which is coupled to a functional output via protein-level circuits or regulating the expression of a target gene. While gene expression outputs can be customized relatively easily by exchanging the target genes, sensing new inputs is a major limitation. There is a limited repertoire of sensors found in nature, and there are often difficulties with interfacing them with engineered circuits. Computational protein design could be a key enabling technology to address these challenges, as it allows for the engineering of modular and tunable sensors that can be tailored to the circuit's application. In this article, we review recent computational approaches to design protein-based sensors for small-molecule inputs with particular focus on those based on the widely used Rosetta software suite. Furthermore, we review mechanisms that have been harnessed to couple ligand inputs to functional outputs. Based on recent literature, we illustrate how the combination of protein design and synthetic biology enables new sensors for diverse applications ranging from biomedicine to metabolic engineering. We conclude with a perspective on how strategies to address frontiers in protein design and cellular circuit design may enable the next generation of sense-response networks, which may increasingly be assembled from de novo components to display diverse and engineerable input-output behaviors.
RESUMO
Protein binding to small molecules is fundamental to many biological processes, yet it remains challenging to predictively design this functionality de novo. Current state-of-the-art computational design methods typically rely on existing small molecule binding sites or protein scaffolds with existing shape complementarity for a target ligand. Here we introduce new methods that utilize pools of discrete contacts between protein side chains and defined small molecule ligand substructures (ligand fragments) observed in the Protein Data Bank. We use the Rosetta Molecular Modeling Suite to recombine protein side chains in these contact pools to generate hundreds of thousands of energetically favorable binding sites for a target ligand. These composite binding sites are built into existing scaffold proteins matching the intended binding site geometry with high accuracy. In addition, we apply pools of side chain rotamers interacting with the target ligand to augment Rosetta's conventional design machinery and improve key metrics known to be predictive of design success. We demonstrate that our method reliably builds diverse binding sites into different scaffold proteins for a variety of target molecules. Our generalizable de novo ligand binding site design method provides a foundation for versatile design of protein to interface previously unattainable molecules for applications in medical diagnostics and synthetic biology.
Assuntos
Sítios de Ligação , Engenharia de Proteínas/métodos , Proteínas , Algoritmos , Sítios de Ligação/genética , Sítios de Ligação/fisiologia , Biologia Computacional , Modelos Moleculares , Ligação Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , SoftwareRESUMO
Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.
Assuntos
Biologia Computacional/métodos , Pesquisa/tendências , Software/tendências , Comportamento Cooperativo , Análise de Dados , Engenharia , Biblioteca Gênica , Humanos , Modelos Moleculares , Pesquisadores , Comportamento Social , Interface Usuário-ComputadorRESUMO
Computational design of binding sites in proteins remains difficult, in part due to limitations in our current ability to sample backbone conformations that enable precise and accurate geometric positioning of side chains during sequence design. Here we present a benchmark framework for comparison between flexible-backbone design methods applied to binding interactions. We quantify the ability of different flexible backbone design methods in the widely used protein design software Rosetta to recapitulate observed protein sequence profiles assumed to represent functional protein/protein and protein/small molecule binding interactions. The CoupledMoves method, which combines backbone flexibility and sequence exploration into a single acceptance step during the sampling trajectory, better recapitulates observed sequence profiles than the BackrubEnsemble and FastDesign methods, which separate backbone flexibility and sequence design into separate acceptance steps during the sampling trajectory. Flexible-backbone design with the CoupledMoves method is a powerful strategy for reducing sequence space to generate targeted libraries for experimental screening and selection.
Assuntos
Biologia Computacional , Conformação Proteica , Mapeamento de Interação de Proteínas , Proteínas/ultraestrutura , Algoritmos , Sequência de Aminoácidos/genética , Sítios de Ligação/genética , Fenômenos Biofísicos/genética , Humanos , Modelos Moleculares , Ligação Proteica/genética , Engenharia de Proteínas/tendências , Proteínas/química , SoftwareRESUMO
The circuitry of the brain is characterized by cell heterogeneity, sprawling cellular anatomy, and astonishingly complex patterns of connectivity. Determining how complex neural circuits control behavior is a major challenge that is often approached using surgical, chemical, or transgenic approaches to ablate neurons. However, all these approaches suffer from a lack of precise spatial and temporal control. This drawback would be overcome if cellular ablation could be controlled with light. Cells are naturally and cleanly ablated through apoptosis due to the terminal activation of caspases. Here, we describe the engineering of a light-activated human caspase-3 (Caspase-LOV) by exploiting its natural spring-loaded activation mechanism through rational insertion of the light-sensitive LOV2 domain that expands upon illumination. We apply the light-activated caspase (Caspase-LOV) to study neurodegeneration in larval and adult Drosophila Using the tissue-specific expression system (UAS)-GAL4, we express Caspase-LOV specifically in three neuronal cell types: retinal, sensory, and motor neurons. Illumination of whole flies or specific tissues containing Caspase-LOV-induced cell death and allowed us to follow the time course and sequence of neurodegenerative events. For example, we find that global synchronous activation of caspase-3 drives degeneration with a different time-course and extent in sensory versus motor neurons. We believe the Caspase-LOV tool we engineered will have many other uses for neurobiologists and others for specific temporal and spatial ablation of cells in complex organisms.
Assuntos
Apoptose/fisiologia , Caspase 3/genética , Drosophila melanogaster/metabolismo , Ativação Enzimática/genética , Luz , Neurônios Motores/metabolismo , Células Receptoras Sensoriais/metabolismo , Técnicas de Ablação , Animais , Animais Geneticamente Modificados , Encéfalo/fisiologia , Caspase 3/metabolismo , Caspases/genética , Proteínas de Ligação a DNA/genética , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Condução Nervosa/fisiologia , Interferência de RNA , RNA Interferente Pequeno/genética , Proteínas Virais/metabolismoRESUMO
The ability to engineer the precise geometries, fine-tuned energetics and subtle dynamics that are characteristic of functional proteins is a major unsolved challenge in the field of computational protein design. In natural proteins, functional sites exhibiting these properties often feature structured loops. However, unlike the elements of secondary structures that comprise idealized protein folds, structured loops have been difficult to design computationally. Addressing this shortcoming in a general way is a necessary first step towards the routine design of protein function. In this perspective, we will describe the progress that has been made on this problem and discuss how recent advances in the field of loop structure prediction can be harnessed and applied to the inverse problem of computational loop design.
Assuntos
Biologia Computacional , Proteínas , Modelos Moleculares , Conformação Proteica , Proteínas/química , Proteínas/metabolismoRESUMO
Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host's cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV-human protein-protein interactions involving 435 individual human proteins, with â¼40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.
Assuntos
HIV-1/química , HIV-1/metabolismo , Interações Hospedeiro-Patógeno , Proteínas do Vírus da Imunodeficiência Humana/metabolismo , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/fisiologia , Marcadores de Afinidade , Sequência de Aminoácidos , Sequência Conservada , Fator de Iniciação 3 em Eucariotos/química , Fator de Iniciação 3 em Eucariotos/metabolismo , Células HEK293 , Infecções por HIV/metabolismo , Infecções por HIV/virologia , Protease de HIV/metabolismo , HIV-1/fisiologia , Proteínas do Vírus da Imunodeficiência Humana/análise , Proteínas do Vírus da Imunodeficiência Humana/química , Proteínas do Vírus da Imunodeficiência Humana/isolamento & purificação , Humanos , Imunoprecipitação , Células Jurkat , Espectrometria de Massas , Ligação Proteica , Reprodutibilidade dos Testes , Replicação ViralRESUMO
Reengineering protein-protein recognition is an important route to dissecting and controlling complex interaction networks. Experimental approaches have used the strategy of "second-site suppressors," where a functional interaction is inferred between two proteins if a mutation in one protein can be compensated by a mutation in the second. Mimicking this strategy, computational design has been applied successfully to change protein recognition specificity by predicting such sets of compensatory mutations in protein-protein interfaces. To extend this approach, it would be advantageous to be able to "transplant" existing engineered and experimentally validated specificity changes to other homologous protein-protein complexes. Here, we test this strategy by designing a pair of mutations that modulates peptide recognition specificity in the Syntrophin PDZ domain, confirming the designed interaction biochemically and structurally, and then transplanting the mutations into the context of five related PDZ domain-peptide complexes. We find a wide range of energetic effects of identical mutations in structurally similar positions, revealing a dramatic context dependence (epistasis) of designed mutations in homologous protein-protein interactions. To better understand the structural basis of this context dependence, we apply a structure-based computational model that recapitulates these energetic effects and we use this model to make and validate forward predictions. Although the context dependence of these mutations is captured by computational predictions, our results both highlight the considerable difficulties in designing protein-protein interactions and provide challenging benchmark cases for the development of improved protein modeling and design methods that accurately account for the context.
Assuntos
Proteínas Associadas à Distrofina/química , Proteínas Associadas à Distrofina/genética , Engenharia de Proteínas , Epistasia Genética , Modelos Moleculares , Mutação/genética , Óxido Nítrico Sintase Tipo I/química , Óxido Nítrico Sintase Tipo I/metabolismo , Domínios PDZ , TermodinâmicaRESUMO
Interactions between small molecules and proteins play critical roles in regulating and facilitating diverse biological functions, yet our ability to accurately re-engineer the specificity of these interactions using computational approaches has been limited. One main difficulty, in addition to inaccuracies in energy functions, is the exquisite sensitivity of protein-ligand interactions to subtle conformational changes, coupled with the computational problem of sampling the large conformational search space of degrees of freedom of ligands, amino acid side chains, and the protein backbone. Here, we describe two benchmarks for evaluating the accuracy of computational approaches for re-engineering protein-ligand interactions: (i) prediction of enzyme specificity altering mutations and (ii) prediction of sequence tolerance in ligand binding sites. After finding that current state-of-the-art "fixed backbone" design methods perform poorly on these tests, we develop a new "coupled moves" design method in the program Rosetta that couples changes to protein sequence with alterations in both protein side-chain and protein backbone conformations, and allows for changes in ligand rigid-body and torsion degrees of freedom. We show significantly increased accuracy in both predicting ligand specificity altering mutations and binding site sequences. These methodological improvements should be useful for many applications of protein-ligand design. The approach also provides insights into the role of subtle conformational adjustments that enable functional changes not only in engineering applications but also in natural protein evolution.
Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Sequência de Aminoácidos , Ligantes , Modelos Moleculares , Dados de Sequência Molecular , Maleabilidade , Especificidade por SubstratoRESUMO
Signaling pathways depend on regulatory protein-protein interactions; controlling these interactions in cells has important applications for reengineering biological functions. As many regulatory proteins are modular, considerable progress in engineering signaling circuits has been made by recombining commonly occurring domains. Our ability to predictably engineer cellular functions, however, is constrained by complex crosstalk observed in naturally occurring domains. Here we demonstrate a strategy for improving and simplifying protein network engineering: using computational design to create orthogonal (non-crossreacting) protein-protein interfaces. We validated the design of the interface between a key signaling protein, the GTPase Cdc42, and its activator, Intersectin, biochemically and by solving the crystal structure of the engineered complex. The designed GTPase (orthoCdc42) is activated exclusively by its engineered cognate partner (orthoIntersectin), but maintains the ability to interface with other GTPase signaling circuit components in vitro. In mammalian cells, orthoCdc42 activity can be regulated by orthoIntersectin, but not wild-type Intersectin, showing that the designed interaction can trigger complex processes. Computational design of protein interfaces thus promises to provide specific components that facilitate the predictable engineering of cellular functions.
Assuntos
GTP Fosfo-Hidrolases/metabolismo , Fatores de Troca do Nucleotídeo Guanina/metabolismo , Proteínas/metabolismo , Transdução de Sinais , Animais , Cristalografia , GTP Fosfo-Hidrolases/química , Fatores de Troca do Nucleotídeo Guanina/química , Camundongos , Modelos Moleculares , Células NIH 3T3RESUMO
Amino acid covariation, where the identities of amino acids at different sequence positions are correlated, is a hallmark of naturally occurring proteins. This covariation can arise from multiple factors, including selective pressures for maintaining protein structure, requirements imposed by a specific function, or from phylogenetic sampling bias. Here we employed flexible backbone computational protein design to quantify the extent to which protein structure has constrained amino acid covariation for 40 diverse protein domains. We find significant similarities between the amino acid covariation in alignments of natural protein sequences and sequences optimized for their structures by computational protein design methods. These results indicate that the structural constraints imposed by protein architecture play a dominant role in shaping amino acid covariation and that computational protein design methods can capture these effects. We also find that the similarity between natural and designed covariation is sensitive to the magnitude and mechanism of backbone flexibility used in computational protein design. Our results thus highlight the necessity of including backbone flexibility to correctly model precise details of correlated amino acid changes and give insights into the pressures underlying these correlations.