RESUMO
Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale1. However, the energetics driving folding are invisible in these structures and remain largely unknown2. The hidden thermodynamics of folding can drive disease3,4, shape protein evolution5-7 and guide protein engineering8-10, and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40-72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.
Assuntos
Biologia , Engenharia de Proteínas , Dobramento de Proteína , Proteínas , Aminoácidos/genética , Aminoácidos/metabolismo , Biologia/métodos , DNA Complementar/genética , Estabilidade Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Termodinâmica , Proteólise , Engenharia de Proteínas/métodos , Domínios Proteicos/genética , MutaçãoRESUMO
Controlling the biodistribution of protein- and nanoparticle-based therapeutic formulations remains challenging. In vivo library selection is an effective method for identifying constructs that exhibit desired distribution behavior; library variants can be selected based on their ability to localize to the tissue or compartment of interest despite complex physiological challenges. Here, we describe further development of an in vivo library selection platform based on self-assembling protein nanoparticles encapsulating their own mRNA genomes (synthetic nucleocapsids or synNCs). We tested two distinct libraries: a low-diversity library composed of synNC surface mutations (45 variants) and a high-diversity library composed of synNCs displaying miniproteins with binder-like properties (6.2 million variants). While we did not identify any variants from the low-diversity surface library that yielded therapeutically relevant changes in biodistribution, the high-diversity miniprotein display library yielded variants that shifted accumulation toward lungs or muscles in just two rounds of in vivo selection. Our approach should contribute to achieving specific tissue homing patterns and identifying targeting ligands for diseases of interest.
Assuntos
Biblioteca de Peptídeos , Proteínas , Distribuição Tecidual , Nucleocapsídeo , MutaçãoRESUMO
Designing entirely new protein structures remains challenging because we do not fully understand the biophysical determinants of folding stability. Yet, some protein folds are easier to design than others. Previous work identified the 43-residue ÉßßÉ fold as especially challenging: The best designs had only a 2% success rate, compared to 39 to 87% success for other simple folds [G. J. Rocklin et al., Science 357, 168-175 (2017)]. This suggested the ÉßßÉ fold would be a useful model system for gaining a deeper understanding of folding stability determinants and for testing new protein design methods. Here, we designed over 10,000 new ÉßßÉ proteins and found over 3,000 of them to fold into stable structures using a high-throughput protease-based assay. NMR, hydrogen-deuterium exchange, circular dichroism, deep mutational scanning, and scrambled sequence control experiments indicated that our stable designs fold into their designed ÉßßÉ structures with exceptional stability for their small size. Our large dataset enabled us to quantify the influence of universal stability determinants including nonpolar burial, helix capping, and buried unsatisfied polar atoms, as well as stability determinants unique to the ÉßßÉ topology. Our work demonstrates how large-scale design and test cycles can solve challenging design problems while illuminating the biophysical determinants of folding.
Assuntos
Dobramento de Proteína , Proteínas , Sequência de Aminoácidos , Dicroísmo Circular , Deutério , Peptídeo Hidrolases , Estabilidade Proteica , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/genéticaRESUMO
Programmed cell death protein-1 (PD-1) expressed on activated T cells inhibits T cell function and proliferation to prevent an excessive immune response, and disease can result if this delicate balance is shifted in either direction. Tumor cells often take advantage of this pathway by overexpressing the PD-1 ligand PD-L1 to evade destruction by the immune system. Alternatively, if there is a decrease in function of the PD-1 pathway, unchecked activation of the immune system and autoimmunity can result. Using a combination of computation and experiment, we designed a hyperstable 40-residue miniprotein, PD-MP1, that specifically binds murine and human PD-1 at the PD-L1 interface with a Kd of â¼100 nM. The apo crystal structure shows that the binder folds as designed with a backbone RMSD of 1.3 Å to the design model. Trimerization of PD-MP1 resulted in a PD-1 agonist that strongly inhibits murine T cell activation. This small, hyperstable PD-1 binding protein was computationally designed with an all-beta interface, and the trimeric agonist could contribute to treatments for autoimmune and inflammatory diseases.
Assuntos
Antígeno B7-H1/química , Receptor de Morte Celular Programada 1/agonistas , Animais , Doenças Autoimunes/tratamento farmacológico , Doenças Autoimunes/genética , Doenças Autoimunes/imunologia , Antígeno B7-H1/síntese química , Antígeno B7-H1/imunologia , Antígeno B7-H1/farmacologia , Biologia Computacional , Desenho de Fármacos , Humanos , Ativação Linfocitária , Camundongos , Camundongos Endogâmicos C57BL , Receptor de Morte Celular Programada 1/química , Receptor de Morte Celular Programada 1/imunologia , Linfócitos T/química , Linfócitos T/efeitos dos fármacos , Linfócitos T/imunologiaRESUMO
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Assuntos
Desenho de Fármacos , Influenza Humana/tratamento farmacológico , Influenza Humana/prevenção & controle , Terapia de Alvo Molecular/métodos , Engenharia de Proteínas/métodos , Proteínas/química , Proteínas/uso terapêutico , Toxinas Botulínicas/classificação , Toxinas Botulínicas/metabolismo , Simulação por Computador , Glicoproteínas de Hemaglutininação de Vírus da Influenza/metabolismo , Temperatura Alta , Humanos , Influenza Humana/metabolismo , Simulação de Dinâmica Molecular , Ligação Proteica , Estabilidade Proteica , Proteínas/imunologia , Proteínas/metabolismo , TemperaturaRESUMO
Naturally occurring, pharmacologically active peptides constrained with covalent crosslinks generally have shapes that have evolved to fit precisely into binding pockets on their targets. Such peptides can have excellent pharmaceutical properties, combining the stability and tissue penetration of small-molecule drugs with the specificity of much larger protein therapeutics. The ability to design constrained peptides with precisely specified tertiary structures would enable the design of shape-complementary inhibitors of arbitrary targets. Here we describe the development of computational methods for accurate de novo design of conformationally restricted peptides, and the use of these methods to design 18-47 residue, disulfide-crosslinked peptides, a subset of which are heterochiral and/or N-C backbone-cyclized. Both genetically encodable and non-canonical peptides are exceptionally stable to thermal and chemical denaturation, and 12 experimentally determined X-ray and NMR structures are nearly identical to the computational design models. The computational design methods and stable scaffolds presented here provide the basis for development of a new generation of peptide-based drugs.
Assuntos
Desenho Assistido por Computador , Desenho de Fármacos , Peptídeos/química , Peptídeos/síntese química , Estabilidade Proteica , Motivos de Aminoácidos , Cristalografia por Raios X , Ciclização , Dissulfetos/química , Temperatura Alta , Modelos Moleculares , Ressonância Magnética Nuclear Biomolecular , Peptídeos/genética , Peptídeos Cíclicos/química , Peptídeos Cíclicos/genética , Desnaturação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , EstereoisomerismoRESUMO
The de novo design of miniprotein inhibitors has recently emerged as a new technology to create proteins that bind with high affinity to specific therapeutic targets. Their size, ease of expression, and apparent high stability makes them excellent candidates for a new class of protein drugs. However, beyond circular dichroism melts and hydrogen/deuterium exchange experiments, little is known about their dynamics, especially at the elevated temperatures they seemingly tolerate quite well. To address that and gain insight for future designs, we have focused on identifying unintended and previously overlooked heat-induced structural and chemical changes in a particularly stable model miniprotein, EHEE_rd2_0005. Nuclear magnetic resonance (NMR) studies suggest the presence of dynamics on multiple time and temperature scales. Transiently elevating the temperature results in spontaneous chemical deamidation visible in the NMR spectra, which we validate using both capillary electrophoresis and mass spectrometry (MS) experiments. High temperatures also result in greatly accelerated intrinsic rates of hydrogen exchange and signal loss in NMR heteronuclear single quantum coherence spectra from local unfolding. These losses are in excellent agreement with both room temperature hydrogen exchange experiments and hydrogen bond disruption in replica exchange molecular dynamics simulations. Our analysis reveals important principles for future miniprotein designs and the potential for high stability to result in long-lived alternate conformational states.
Assuntos
Temperatura Alta , Ressonância Magnética Nuclear Biomolecular , Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas/química , Estabilidade ProteicaRESUMO
Synthetic biology allows us to reuse, repurpose, and reconfigure biological systems to address society's most pressing challenges. Developing biotechnologies in this way requires integrating concepts across disciplines, posing challenges to educating students with diverse expertise. We created a framework for synthetic biology training that deconstructs biotechnologies across scales-molecular, circuit/network, cell/cell-free systems, biological communities, and societal-giving students a holistic toolkit to integrate cross-disciplinary concepts towards responsible innovation of successful biotechnologies. We present this framework, lessons learned, and inclusive teaching materials to allow its adaption to train the next generation of synthetic biologists.
Assuntos
Biologia Sintética , Biologia Sintética/educação , Biologia Sintética/métodos , Humanos , Biotecnologia/educação , Estudantes/psicologiaRESUMO
Orientational restraints can improve the efficiency of alchemical free energy calculations, but they are not typically applied in relative binding calculations, which compute the affinity difference been two ligands. Here, we describe a new "separated topologies" method, which computes relative binding free energies using orientational restraints and which has several advantages over existing methods. While standard approaches maintain the initial and final ligand in a shared orientation, the separated topologies approach allows the initial and final ligands to have distinct orientations. This avoids a slowly converging reorientation step in the calculation. The separated topologies approach can also be applied to determine the relative free energies of multiple orientations of the same ligand. We illustrate the approach by calculating the relative binding free energies of two compounds to an engineered site in Cytochrome C Peroxidase.
Assuntos
Termodinâmica , Benzimidazóis/química , Sítios de Ligação , Citocromo-c Peroxidase/química , Indóis/química , Modelos Moleculares , Conformação MolecularRESUMO
The calculation of a protein-ligand binding free energy based on molecular dynamics (MD) simulations generally relies on a thermodynamic cycle in which the ligand is alchemically inserted into the system, both in the solvated protein and free in solution. The corresponding ligand-insertion free energies are typically calculated in nanoscale computational boxes simulated under periodic boundary conditions and considering electrostatic interactions defined by a periodic lattice-sum. This is distinct from the ideal bulk situation of a system of macroscopic size simulated under non-periodic boundary conditions with Coulombic electrostatic interactions. This discrepancy results in finite-size effects, which affect primarily the charging component of the insertion free energy, are dependent on the box size, and can be large when the ligand bears a net charge, especially if the protein is charged as well. This article investigates finite-size effects on calculated charging free energies using as a test case the binding of the ligand 2-amino-5-methylthiazole (net charge +1 e) to a mutant form of yeast cytochrome c peroxidase in water. Considering different charge isoforms of the protein (net charges -5, 0, +3, or +9 e), either in the absence or the presence of neutralizing counter-ions, and sizes of the cubic computational box (edges ranging from 7.42 to 11.02 nm), the potentially large magnitude of finite-size effects on the raw charging free energies (up to 17.1 kJ mol(-1)) is demonstrated. Two correction schemes are then proposed to eliminate these effects, a numerical and an analytical one. Both schemes are based on a continuum-electrostatics analysis and require performing Poisson-Boltzmann (PB) calculations on the protein-ligand system. While the numerical scheme requires PB calculations under both non-periodic and periodic boundary conditions, the latter at the box size considered in the MD simulations, the analytical scheme only requires three non-periodic PB calculations for a given system, its dependence on the box size being analytical. The latter scheme also provides insight into the physical origin of the finite-size effects. These two schemes also encompass a correction for discrete solvent effects that persists even in the limit of infinite box sizes. Application of either scheme essentially eliminates the size dependence of the corrected charging free energies (maximal deviation of 1.5 kJ mol(-1)). Because it is simple to apply, the analytical correction scheme offers a general solution to the problem of finite-size effects in free-energy calculations involving charged solutes, as encountered in calculations concerning, e.g., protein-ligand binding, biomolecular association, residue mutation, pKa and redox potential estimation, substrate transformation, solvation, and solvent-solvent partitioning.
Assuntos
Citocromo-c Peroxidase/química , Simulação de Dinâmica Molecular , Termodinâmica , Tiazóis/química , Citocromo-c Peroxidase/genética , Citocromo-c Peroxidase/metabolismo , Ligantes , Mutação , Saccharomyces cerevisiae/enzimologia , Solventes/química , Eletricidade Estática , Água/químicaRESUMO
Fragment screens for new ligands have had wide success, notwithstanding their constraint to libraries of 1,000-10,000 molecules. Larger libraries would be addressable were molecular docking reliable for fragment screens, but this has not been widely accepted. To investigate docking's ability to prioritize fragments, a library of >137,000 such molecules were docked against the structure of beta-lactamase. Forty-eight fragments highly ranked by docking were acquired and tested; 23 had K(i) values ranging from 0.7 to 9.2 mM. X-ray crystal structures of the enzyme-bound complexes were determined for 8 of the fragments. For 4, the correspondence between the predicted and experimental structures was high (RMSD between 1.2 and 1.4 A), whereas for another 2, the fidelity was lower but retained most key interactions (RMSD 2.4-2.6 A). Two of the 8 fragments adopted very different poses in the active site owing to enzyme conformational changes. The 48% hit rate of the fragment docking compares very favorably with "lead-like" docking and high-throughput screening against the same enzyme. To understand this, we investigated the occurrence of the fragment scaffolds among larger, lead-like molecules. Approximately 1% of commercially available fragments contain these inhibitors whereas only 10(-7)% of lead-like molecules do. This suggests that many more chemotypes and combinations of chemotypes are present among fragments than are available among lead-like molecules, contributing to the higher hit rates. The ability of docking to prioritize these fragments suggests that the technique can be used to exploit the better chemotype coverage that exists at the fragment level.
Assuntos
Proteínas de Bactérias/antagonistas & inibidores , Proteínas de Bactérias/química , Descoberta de Drogas , Inibidores Enzimáticos/química , Inibidores de beta-Lactamases , beta-Lactamases/química , Técnicas de Química Combinatória , Inibidores Enzimáticos/isolamento & purificação , Inibidores Enzimáticos/farmacologia , LigantesRESUMO
The denaturant dependence of hydrogen-deuterium exchange (HDX) is a powerful measurement to identify the breaking of individual H-bonds and map the free energy surface (FES) of a protein including the very rare states. Molecular dynamics (MD) can identify each partial unfolding event with atomic-level resolution. Hence, their combination provides a great opportunity to test the accuracy of simulations and to verify the interpretation of HDX data. For this comparison, we use Upside, our new and extremely fast MD package that is capable of folding proteins with an accuracy comparable to that of all-atom methods. The FESs of two naturally occurring and two designed proteins are so generated and compared to our NMR/HDX data. We find that Upside's accuracy is considerably improved upon modifying the energy function using a new machine-learning procedure that trains for proper protein behavior including realistic denatured states in addition to stable native states. The resulting increase in cooperativity is critical for replicating the HDX data and protein stability, indicating that we have properly encoded the underlying physiochemical interactions into an MD package. We did observe some mismatch, however, underscoring the ongoing challenges faced by simulations in calculating accurate FESs. Nevertheless, our ensembles can identify the properties of the fluctuations that lead to HDX, whether they be small-, medium-, or large-scale openings, and can speak to the breadth of the native ensemble that has been a matter of debate.
Assuntos
Medição da Troca de Deutério , Hidrogênio , Medição da Troca de Deutério/métodos , Entropia , Hidrogênio/química , Conformação Proteica , Proteínas/químicaRESUMO
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.
Assuntos
Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Aminoácidos , Estabilidade Proteica , Proteínas/químicaRESUMO
Monoclonal antibody (mAb) 10E8 recognizes a highly conserved epitope on HIV and is capable of neutralizing > 95% of circulating viral isolates making it one of the most promising Abs against HIV. Solution instability and biochemical heterogeneity of 10E8 has hampered its development for clinical use. We identify the source of 10E8 heterogeneity being linked to cis/trans isomerization at two prolines within the YPP motif in the CRD3 loop that exists as two predominant conformers that interconvert on a slow timescale. The YtransP conformation conformer can bind the HIV gp41 epitope, while the YcisP is not binding competent and shows a higher aggregation propensity. The high barrier of isomerization and propensity to adopt non-binding competent proline conformers provides novel insight into the slow binding kinetics, low potency, and poor solubility of 10E8. This study highlights how proline isomerization should be considered a critical quality attribute for biotherapeutics with paratopes containing potential cis proline amide bonds.
Assuntos
Anticorpos Monoclonais/química , Isomerismo , Prolina/químicaRESUMO
Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2500 stable designed proteins in four basic folds-a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and experiment and has the potential to transform computational protein design into a data-driven science.
Assuntos
Dobramento de Proteína , DNA/síntese química , DNA/genética , Análise Mutacional de DNA , Mutação , Conformação Proteica , Engenharia de Proteínas , Estabilidade Proteica , Proteínas/química , Proteínas/genética , ProteóliseRESUMO
Binding free energy calculations offer a thermodynamically rigorous method to compute protein-ligand binding, and they depend on empirical force fields with hundreds of parameters. We examined the sensitivity of computed binding free energies to the ligand's electrostatic and van der Waals parameters. Dielectric screening and cancellation of effects between ligand-protein and ligand-solvent interactions reduce the parameter sensitivity of binding affinity by 65%, compared with interaction strengths computed in the gas-phase. However, multiple changes to parameters combine additively on average, which can lead to large changes in overall affinity from many small changes to parameters. Using these results, we estimate that random, uncorrelated errors in force field nonbonded parameters must be smaller than 0.02 e per charge, 0.06 Å per radius, and 0.01 kcal/mol per well depth in order to obtain 68% (one standard deviation) confidence that a computed affinity for a moderately-sized lead compound will fall within 1 kcal/mol of the true affinity, if these are the only sources of error considered.
RESUMO
Predicting absolute protein-ligand binding affinities remains a frontier challenge in ligand discovery and design. This becomes more difficult when ionic interactions are involved because of the large opposing solvation and electrostatic attraction energies. In a blind test, we examined whether alchemical free-energy calculations could predict binding affinities of 14 charged and 5 neutral compounds previously untested as ligands for a cavity binding site in cytochrome c peroxidase. In this simplified site, polar and cationic ligands compete with solvent to interact with a buried aspartate. Predictions were tested by calorimetry, spectroscopy, and crystallography. Of the 15 compounds predicted to bind, 13 were experimentally confirmed, while 4 compounds were false negative predictions. Predictions had a root-mean-square error of 1.95 kcal/mol to the experimental affinities, and predicted poses had an average RMSD of 1.7Å to the crystallographic poses. This test serves as a benchmark for these thermodynamically rigorous calculations at predicting binding affinities for charged compounds and gives insights into the existing sources of error, which are primarily electrostatic interactions inside proteins. Our experiments also provide a useful set of ionic binding affinities in a simplified system for testing new affinity prediction methods.
Assuntos
Modelos Moleculares , Proteínas/química , Benzimidazóis/química , Benzimidazóis/metabolismo , Sítios de Ligação , Cristalografia por Raios X , Citocromo-c Peroxidase/química , Citocromo-c Peroxidase/metabolismo , Cinética , Ligantes , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Ligação Proteica , Conformação Proteica , Proteínas/metabolismoRESUMO
We present a combined experimental and modeling study of organic ligand molecules binding to a slightly polar engineered cavity site in T4 lysozyme (L99A/M102Q). For modeling, we computed alchemical absolute binding free energies. These were blind tests performed prospectively on 13 diverse, previously untested candidate ligand molecules. We predicted that eight compounds would bind to the cavity and five would not; 11 of 13 predictions were correct at this level. The RMS error to the measurable absolute binding energies was 1.8 kcal/mol. In addition, we computed "relative" binding free energies for six phenol derivatives starting from two known ligands: phenol and catechol. The average RMS error in the relative free energy prediction was 2.5 kcal/mol (phenol) and 1.1 kcal/mol (catechol). To understand these results at atomic resolution, we obtained x-ray co-complex structures for nine of the diverse ligands and for all six phenol analogs. The average RMSD of the predicted pose to the experiment was 2.0 A (diverse set), 1.8 A (phenol-derived predictions), and 1.2 A (catechol-derived predictions). We found that predicting accurate affinities and rank-orderings required near-native starting orientations of the ligand in the binding site. Unanticipated binding modes, multiple ligand binding, and protein conformational change all proved challenging for the free energy methods. We believe that these results can help guide future improvements in physics-based absolute binding free energy methods.