RESUMO
Despite recent success in computational design of structured cyclic peptides, de novo design of cyclic peptides that bind to any protein functional site remains difficult. To address this challenge, we develop a computational "anchor extension" methodology for targeting protein interfaces by extending a peptide chain around a non-canonical amino acid residue anchor. To test our approach using a well characterized model system, we design cyclic peptides that inhibit histone deacetylases 2 and 6 (HDAC2 and HDAC6) with enhanced potency compared to the original anchor (IC50 values of 9.1 and 4.4 nM for the best binders compared to 5.4 and 0.6 µM for the anchor, respectively). The HDAC6 inhibitor is among the most potent reported so far. These results highlight the potential for de novo design of high-affinity protein-peptide interfaces, as well as the challenges that remain.
Assuntos
Desenho de Fármacos , Inibidores de Histona Desacetilases/farmacologia , Peptídeos Cíclicos/farmacologia , Relação Estrutura-Atividade , Domínio Catalítico/efeitos dos fármacos , Cristalografia por Raios X , Ensaios Enzimáticos , Histona Desacetilase 2/antagonistas & inibidores , Histona Desacetilase 2/isolamento & purificação , Histona Desacetilase 2/metabolismo , Histona Desacetilase 2/ultraestrutura , Desacetilase 6 de Histona/antagonistas & inibidores , Desacetilase 6 de Histona/genética , Desacetilase 6 de Histona/isolamento & purificação , Desacetilase 6 de Histona/ultraestrutura , Inibidores de Histona Desacetilases/química , Concentração Inibidora 50 , Simulação de Acoplamento Molecular , Ressonância Magnética Nuclear Biomolecular , Biblioteca de Peptídeos , Peptídeos Cíclicos/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/isolamento & purificação , Proteínas Recombinantes/metabolismo , Proteínas Recombinantes/ultraestrutura , Proteínas de Peixe-Zebra/genética , Proteínas de Peixe-Zebra/ultraestruturaRESUMO
Monoclonal antibody (mAb) 10E8 recognizes a highly conserved epitope on HIV and is capable of neutralizing > 95% of circulating viral isolates making it one of the most promising Abs against HIV. Solution instability and biochemical heterogeneity of 10E8 has hampered its development for clinical use. We identify the source of 10E8 heterogeneity being linked to cis/trans isomerization at two prolines within the YPP motif in the CRD3 loop that exists as two predominant conformers that interconvert on a slow timescale. The YtransP conformation conformer can bind the HIV gp41 epitope, while the YcisP is not binding competent and shows a higher aggregation propensity. The high barrier of isomerization and propensity to adopt non-binding competent proline conformers provides novel insight into the slow binding kinetics, low potency, and poor solubility of 10E8. This study highlights how proline isomerization should be considered a critical quality attribute for biotherapeutics with paratopes containing potential cis proline amide bonds.
Assuntos
Anticorpos Monoclonais/química , Isomerismo , Prolina/químicaRESUMO
The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.
Assuntos
Substâncias Macromoleculares/química , Modelos Moleculares , Proteínas/química , Software , Simulação de Acoplamento Molecular , Peptidomiméticos/química , Conformação ProteicaRESUMO
Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.
Assuntos
Biologia Computacional/métodos , Pesquisa/tendências , Software/tendências , Comportamento Cooperativo , Análise de Dados , Engenharia , Biblioteca Gênica , Humanos , Modelos Moleculares , Pesquisadores , Comportamento Social , Interface Usuário-ComputadorRESUMO
The Rosetta software suite for macromolecular modeling is a powerful computational toolbox for protein design, structure prediction, and protein structure analysis. The development of novel Rosetta-based scientific tools requires two orthogonal skill sets: deep domain-specific expertise in protein biochemistry and technical expertise in development, deployment, and analysis of molecular simulations. Furthermore, the computational demands of molecular simulation necessitate large scale cluster-based or distributed solutions for nearly all scientifically relevant tasks. To reduce the technical barriers to entry for new development, we integrated Rosetta with modern, widely adopted computational infrastructure. This allows simplified deployment in large-scale cluster and cloud computing environments, and effective reuse of common libraries for simulation execution and data analysis. To achieve this, we integrated Rosetta with the Conda package manager; this simplifies installation into existing computational environments and packaging as docker images for cloud deployment. Then, we developed programming interfaces to integrate Rosetta with the PyData stack for analysis and distributed computing, including the popular tools Jupyter, Pandas, and Dask. We demonstrate the utility of these components by generating a library of a thousand de novo disulfide-rich miniproteins in a hybrid simulation that included cluster-based design and interactive notebook-based analyses. Our new tools enable users, who would otherwise not have access to the necessary computational infrastructure, to perform state-of-the-art molecular simulation and design with Rosetta.
Assuntos
Biologia Computacional/métodos , Proteínas/química , Computação em Nuvem , Modelos Moleculares , Software , Interface Usuário-ComputadorRESUMO
Computational design of new active sites has generally proceeded by geometrically defining interactions between the reaction transition state(s) and surrounding side-chain functional groups which maximize transition-state stabilization, and then searching for sites in protein scaffolds where the specified side-chain-transition-state interactions can be realized. A limitation of this approach is that the interactions between the side chains themselves are not constrained. An extensive connected hydrogen bond network involving the catalytic residues was observed in a designed retroaldolase following directed evolution. Such connected networks could increase catalytic activity by preorganizing active site residues in catalytically competent orientations, and enabling concerted interactions between side chains during catalysis, for example, proton shuffling. We developed a method for designing active sites in which the catalytic side chains, in addition to making interactions with the transition state, are also involved in extensive hydrogen bond networks. Because of the added constraint of hydrogen-bond connectivity between the catalytic side chains, to find solutions, a wider range of interactions between these side chains and the transition state must be considered. Our new method starts from a ChemDraw-like two-dimensional representation of the transition state with hydrogen-bond donors, acceptors, and covalent interaction sites indicated, and all placements of side-chain functional groups that make the indicated interactions with the transition state, and are fully connected in a single hydrogen-bond network are systematically enumerated. The RosettaMatch method can then be used to identify realizations of these fully-connected active sites in protein scaffolds. The method generates many fully-connected active site solutions for a set of model reactions that are promising starting points for the design of fully-preorganized enzyme catalysts.
Assuntos
Redes Neurais de Computação , Proteínas/metabolismo , Sítios de Ligação , Biocatálise , Bases de Dados de Proteínas , Ligação de Hidrogênio , Modelos Moleculares , Proteínas/químicaRESUMO
We describe a de novo computational approach for designing proteins that recapitulate the binding sites of natural cytokines, but are otherwise unrelated in topology or amino acid sequence. We use this strategy to design mimics of the central immune cytokine interleukin-2 (IL-2) that bind to the IL-2 receptor ßγc heterodimer (IL-2Rßγc) but have no binding site for IL-2Rα (also called CD25) or IL-15Rα (also known as CD215). The designs are hyper-stable, bind human and mouse IL-2Rßγc with higher affinity than the natural cytokines, and elicit downstream cell signalling independently of IL-2Rα and IL-15Rα. Crystal structures of the optimized design neoleukin-2/15 (Neo-2/15), both alone and in complex with IL-2Rßγc, are very similar to the designed model. Neo-2/15 has superior therapeutic activity to IL-2 in mouse models of melanoma and colon cancer, with reduced toxicity and undetectable immunogenicity. Our strategy for building hyper-stable de novo mimetics could be applied generally to signalling proteins, enabling the creation of superior therapeutic candidates.
Assuntos
Desenho de Fármacos , Interleucina-15/imunologia , Interleucina-2/imunologia , Mimetismo Molecular , Receptores de Interleucina-2/agonistas , Receptores de Interleucina-2/imunologia , Sequência de Aminoácidos , Animais , Sítios de Ligação , Neoplasias do Colo/tratamento farmacológico , Neoplasias do Colo/imunologia , Simulação por Computador , Cristalografia por Raios X , Modelos Animais de Doenças , Humanos , Interleucina-15/uso terapêutico , Interleucina-2/uso terapêutico , Subunidade alfa de Receptor de Interleucina-2/imunologia , Subunidade alfa de Receptor de Interleucina-2/metabolismo , Melanoma/tratamento farmacológico , Melanoma/imunologia , Camundongos , Modelos Moleculares , Estabilidade Proteica , Receptores de Interleucina-2/metabolismo , Transdução de Sinais/imunologiaRESUMO
A structural-bioinformatics-based computational methodology and framework have been developed for the design of antibodies to targets of interest. RosettaAntibodyDesign (RAbD) samples the diverse sequence, structure, and binding space of an antibody to an antigen in highly customizable protocols for the design of antibodies in a broad range of applications. The program samples antibody sequences and structures by grafting structures from a widely accepted set of the canonical clusters of CDRs (North et al., J. Mol. Biol., 406:228-256, 2011). It then performs sequence design according to amino acid sequence profiles of each cluster, and samples CDR backbones using a flexible-backbone design protocol incorporating cluster-based CDR constraints. Starting from an existing experimental or computationally modeled antigen-antibody structure, RAbD can be used to redesign a single CDR or multiple CDRs with loops of different length, conformation, and sequence. We rigorously benchmarked RAbD on a set of 60 diverse antibody-antigen complexes, using two design strategies-optimizing total Rosetta energy and optimizing interface energy alone. We utilized two novel metrics for measuring success in computational protein design. The design risk ratio (DRR) is equal to the frequency of recovery of native CDR lengths and clusters divided by the frequency of sampling of those features during the Monte Carlo design procedure. Ratios greater than 1.0 indicate that the design process is picking out the native more frequently than expected from their sampled rate. We achieved DRRs for the non-H3 CDRs of between 2.4 and 4.0. The antigen risk ratio (ARR) is the ratio of frequencies of the native amino acid types, CDR lengths, and clusters in the output decoys for simulations performed in the presence and absence of the antigen. For CDRs, we achieved cluster ARRs as high as 2.5 for L1 and 1.5 for H2. For sequence design simulations without CDR grafting, the overall recovery for the native amino acid types for residues that contact the antigen in the native structures was 72% in simulations performed in the presence of the antigen and 48% in simulations performed without the antigen, for an ARR of 1.5. For the non-contacting residues, the ARR was 1.08. This shows that the sequence profiles are able to maintain the amino acid types of these conserved, buried sites, while recovery of the exposed, contacting residues requires the presence of the antigen-antibody interface. We tested RAbD experimentally on both a lambda and kappa antibody-antigen complex, successfully improving their affinities 10 to 50 fold by replacing individual CDRs of the native antibody with new CDR lengths and clusters.
Assuntos
Anticorpos/química , Software , Sequência de Aminoácidos , Animais , Anticorpos/genética , Anticorpos/imunologia , Complexo Antígeno-Anticorpo/química , Complexo Antígeno-Anticorpo/genética , Complexo Antígeno-Anticorpo/imunologia , Regiões Determinantes de Complementaridade , Biologia Computacional , Simulação por Computador , Evolução Molecular Direcionada , Desenho de Fármacos , Humanos , Modelos Moleculares , Método de Monte Carlo , Conformação Proteica , Engenharia de Proteínas/métodos , Engenharia de Proteínas/estatística & dados numéricosRESUMO
We describe Rosetta-based computational protocols for predicting the 3D structure of an antibody from sequence (RosettaAntibody) and then docking the antibody to protein antigens (SnugDock). Antibody modeling leverages canonical loop conformations to graft large segments from experimentally determined structures, as well as offering (i) energetic calculations to minimize loops, (ii) docking methodology to refine the VL-VH relative orientation and (iii) de novo prediction of the elusive complementarity determining region (CDR) H3 loop. To alleviate model uncertainty, antibody-antigen docking resamples CDR loop conformations and can use multiple models to represent an ensemble of conformations for the antibody, the antigen or both. These protocols can be run fully automated via the ROSIE web server (http://rosie.rosettacommons.org/) or manually on a computer with user control of individual steps. For best results, the protocol requires roughly 1,000 CPU-hours for antibody modeling and 250 CPU-hours for antibody-antigen docking. Tasks can be completed in under a day by using public supercomputers.
Assuntos
Região Variável de Imunoglobulina/imunologia , Simulação de Acoplamento Molecular/métodos , Sequência de Aminoácidos , Antígenos/imunologia , Regiões Determinantes de Complementaridade/química , Regiões Determinantes de Complementaridade/imunologia , Região Variável de Imunoglobulina/química , Internet , Domínios Proteicos , Homologia de Sequência de Aminoácidos , TermodinâmicaRESUMO
Ab structure prediction has made great strides, but accurately modeling CDR H3 loops remains elusive. Unlike the other five CDR loops, CDR H3 does not adopt canonical conformations and must be modeled de novo. During Antibody Modeling Assessment II, we found that biasing simulations toward kinked conformations enables generating low-root mean square deviation models (Weitzner et al. 2014. Proteins 82: 1611-1623), and since then, we have presented new geometric parameters defining the kink conformation (Weitzner et al. 2015. Structure 23: 302-311). In this study, we use these parameters to develop a new biasing constraint. When applied to a benchmark set of high-quality CDR H3 loops, the average minimum root mean square deviation sampled is 0.93 Å, compared with 1.34 Å without the constraint. We then test the performance of the constrained de novo method for homology modeling and rigid-body docking and present the results for 1) the Antibody Modeling Assessment II targets, 2) the 2009 RosettaAntibody benchmark set, and 3) the high-quality set.
Assuntos
Regiões Determinantes de Complementaridade/química , Modelos Imunológicos , Modelos Moleculares , Animais , Humanos , Camundongos , Conformação ProteicaRESUMO
A core task in computational structural biology is the search of conformational space for low energy configurations of a biological macromolecule. Because conformational space has a very high dimensionality, the most successful search methods integrate some form of prior knowledge into a general sampling algorithm to reduce the effective dimensionality. However, integrating multiple types of constraints can be challenging. To streamline the incorporation of diverse constraints, we developed the Broker: an extension of the Rosetta macromolecular modeling suite that can express a wide range of protocols using constraints by combining small, independent modules, each of which implements a different set of constraints. We demonstrate expressiveness of the Broker through several code vignettes. The framework enables rapid protocol development in both biomolecular design and structural modeling tasks and thus is an important step towards exposing the rich functionality of Rosetta's core libraries to a growing community of users addressing a diverse set of tasks in computational biology.
Assuntos
Biologia Computacional/métodos , Dobramento de Proteína , Estrutura Terciária de Proteína , Software , Algoritmos , Substâncias Macromoleculares/química , Substâncias Macromoleculares/metabolismo , Modelos Moleculares , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química , Proteínas/metabolismoRESUMO
Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1) prediction of free energy changes upon mutation; (2) high-resolution structural refinement; (3) protein-protein docking; and (4) assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design.
Assuntos
Biologia Computacional/métodos , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Modelos Moleculares , Engenharia de Proteínas/métodos , Proteínas de Membrana/genética , Conformação ProteicaRESUMO
Antibody complementarity determining region (CDR) H3 loops are critical for adaptive immunological functions. Although the other five CDR loops adopt predictable canonical structures, H3 conformations have proven unclassifiable, other than an unusual C-terminal "kink" present in most antibodies. To determine why the majority of H3 loops are kinked and to learn whether non-antibody proteins have loop structures similar to those of H3, we searched a set of 15,679 high-quality non-antibody structures for regions geometrically similar to the residues immediately surrounding the loop. By incorporating the kink into our search, we identified 1,030 H3-like loops from 632 protein families. Some protein families, including PDZ domains, appear to use the identified region for recognition and binding. Our results suggest that the kink is conserved in the immunoglobulin heavy chain fold because it disrupts the ß-strand pairing at the base of the loop. Thus, the kink is a critical driver of the observed structural diversity in CDR H3.
Assuntos
Regiões Determinantes de Complementaridade/química , Evolução Molecular , Variação Genética/genética , Modelos Moleculares , Proteômica/métodos , Regiões Determinantes de Complementaridade/genética , Sequência Conservada/genética , Ligação de Hidrogênio , Conformação ProteicaRESUMO
Antibody Modeling Assessment II (AMA-II) provided an opportunity to benchmark RosettaAntibody on a set of 11 unpublished antibody structures. RosettaAntibody produced accurate, physically realistic models, with all framework regions and 42 of the 55 non-H3 CDR loops predicted to under an Ångström. The performance is notable when modeling H3 on a homology framework, where RosettaAntibody produced the best model among all participants for four of the 11 targets, two of which were predicted with sub-Ångström accuracy. To improve RosettaAntibody, we pursued the causes of model errors. The most common limitation was template unavailability, underscoring the need for more antibody structures and/or better de novo loop methods. In some cases, better templates could have been found by considering residues outside of the CDRs. De novo CDR H3 modeling remains challenging at long loop lengths, but constraining the C-terminal end of H3 to a kinked conformation allows near-native conformations to be sampled more frequently. We also found that incorrect VL -VH orientations caused models with low H3 RMSDs to score poorly, suggesting that correct VL -VH orientations will improve discrimination between near-native and incorrect conformations. These observations will guide the future development of RosettaAntibody.
Assuntos
Regiões Determinantes de Complementaridade/química , Imunoglobulinas/química , Software , Algoritmos , Animais , Fenômenos Biomecânicos , Humanos , Cadeias Pesadas de Imunoglobulinas/química , Cadeias Leves de Imunoglobulina/química , Modelos Moleculares , Conformação ProteicaRESUMO
The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code's difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step 'serverification' protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at http://rosie.rosettacommons.org.
Assuntos
Internet , Modelos Moleculares , Software , Interface Usuário-Computador , Simulação de Dinâmica MolecularRESUMO
RosettaDock has been increasingly used in protein docking and design strategies in order to predict the structure of protein-protein interfaces. Here we test capabilities of RosettaDock 3.2, part of the newly developed Rosetta v3.2 modeling suite, against Docking Benchmark 3.0, and compare it with RosettaDock v2.3, the latest version of the previous Rosetta software package. The benchmark contains a diverse set of 116 docking targets including 22 antibody-antigen complexes, 33 enzyme-inhibitor complexes, and 60 'other' complexes. These targets were further classified by expected docking difficulty into 84 rigid-body targets, 17 medium targets, and 14 difficult targets. We carried out local docking perturbations for each target, using the unbound structures when available, in both RosettaDock v2.3 and v3.2. Overall the performances of RosettaDock v2.3 and v3.2 were similar. RosettaDock v3.2 achieved 56 docking funnels, compared to 49 in v2.3. A breakdown of docking performance by protein complex type shows that RosettaDock v3.2 achieved docking funnels for 63% of antibody-antigen targets, 62% of enzyme-inhibitor targets, and 35% of 'other' targets. In terms of docking difficulty, RosettaDock v3.2 achieved funnels for 58% of rigid-body targets, 30% of medium targets, and 14% of difficult targets. For targets that failed, we carry out additional analyses to identify the cause of failure, which showed that binding-induced backbone conformation changes account for a majority of failures. We also present a bootstrap statistical analysis that quantifies the reliability of the stochastic docking results. Finally, we demonstrate the additional functionality available in RosettaDock v3.2 by incorporating small-molecules and non-protein co-factors in docking of a smaller target set. This study marks the most extensive benchmarking of the RosettaDock module to date and establishes a baseline for future research in protein interface modeling and structure prediction.
Assuntos
Benchmarking , Proteínas/metabolismo , Software/normas , Algoritmos , Ligação Proteica , Reprodutibilidade dos TestesRESUMO
Computational structure prediction and design of proteins and protein-protein complexes have long been inaccessible to those not directly involved in the field. A key missing component has been the ability to visualize the progress of calculations to better understand them. Rosetta is one simulation suite that would benefit from a robust real-time visualization solution. Several tools exist for the sole purpose of visualizing biomolecules; one of the most popular tools, PyMOL (Schrödinger), is a powerful, highly extensible, user friendly, and attractive package. Integrating Rosetta and PyMOL directly has many technical and logistical obstacles inhibiting usage. To circumvent these issues, we developed a novel solution based on transmitting biomolecular structure and energy information via UDP sockets. Rosetta and PyMOL run as separate processes, thereby avoiding many technical obstacles while visualizing information on-demand in real-time. When Rosetta detects changes in the structure of a protein, new coordinates are sent over a UDP network socket to a PyMOL instance running a UDP socket listener. PyMOL then interprets and displays the molecule. This implementation also allows remote execution of Rosetta. When combined with PyRosetta, this visualization solution provides an interactive environment for protein structure prediction and design.