RESUMO
We describe a process for rapid antibody affinity optimization by repertoire mining to identify clones across B cell clonal lineages based on convergent immune responses where antigen-specific clones with the same heavy (VH) and light chain germline segment pairs, or parallel lineages, bind a single epitope on the antigen. We use this convergence framework to mine unique and distinct VH lineages from rat anti-triggering receptor on myeloid cells 2 (TREM2) antibody repertoire datasets with high diversity in the third complementarity-determining loop region (CDR H3) to further affinity-optimize a high-affinity agonistic anti-TREM2 antibody while retaining critical functional properties. Structural analyses confirm a nearly identical binding mode of anti-TREM2 variants with subtle but significant structural differences in the binding interface. Parallel lineage repertoire mining is uniquely tailored to rationally explore the large CDR H3 sequence space in antibody repertoires and can be easily and generally applied to antibodies discovered in vivo.
Assuntos
Afinidade de Anticorpos , Regiões Determinantes de Complementaridade , Receptores Imunológicos , Animais , Regiões Determinantes de Complementaridade/imunologia , Afinidade de Anticorpos/imunologia , Humanos , Ratos , Receptores Imunológicos/imunologia , Receptores Imunológicos/genética , Glicoproteínas de Membrana/imunologia , Linfócitos B/imunologia , Cadeias Pesadas de Imunoglobulinas/imunologia , Cadeias Pesadas de Imunoglobulinas/genética , Epitopos/imunologia , Anticorpos Monoclonais/imunologia , Anticorpos/imunologiaRESUMO
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
Assuntos
Modelos Moleculares , Dobramento de Proteína , Proteínas , Proteínas/química , Biologia Computacional/métodos , Software , Conformação Proteica , Algoritmos , Estrutura Secundária de ProteínaRESUMO
Many naturally occurring protein assemblies have dynamic structures that allow them to perform specialized functions. For example, clathrin coats adopt a wide variety of architectures to adapt to vesicular cargos of various sizes. Although computational methods for designing novel self-assembling proteins have advanced substantially over the past decade, most existing methods focus on designing static structures with high accuracy. Here we characterize the structures of three distinct computationally designed protein assemblies that each form multiple unanticipated architectures, and identify flexibility in specific regions of the subunits of each assembly as the source of structural diversity. Cryo-EM single-particle reconstructions and native mass spectrometry showed that only two distinct architectures were observed in two of the three cases, while we obtained six cryo-EM reconstructions that likely represent a subset of the architectures present in solution in the third case. Structural modeling and molecular dynamics simulations indicated that the surprising observation of a defined range of architectures, instead of non-specific aggregation, can be explained by constrained flexibility within the building blocks. Our results suggest that deliberate use of structural flexibility as a design principle will allow exploration of previously inaccessible structural and functional space in designed protein assemblies.
RESUMO
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
RESUMO
Mimics of protein secondary and tertiary structure offer rationally-designed inhibitors of biomolecular interactions. ß-Sheet mimics have a storied history in bioorganic chemistry and are typically designed with synthetic or natural turn segments. We hypothesized that replacement of terminal inter-ß-strand hydrogen bonds with hydrogen bond surrogates (HBS) may lead to conformationally-defined macrocyclic ß-sheets without the requirement for natural or synthetic ß-turns, thereby providing a minimal mimic of a protein ß-sheet. To access turn-less antiparallel ß-sheet mimics, we developed a facile solid phase synthesis protocol. We surveyed a dataset of protein ß-sheets for naturally observed interstrand side chain interactions. This bioinformatics survey highlighted an over-abundance of aromatic-aromatic, cation-π and ionic interactions in ß-sheets. In correspondence with natural ß-sheets, we find that minimal HBS mimics show robust ß-sheet formation when specific amino acid residue pairings are incorporated. In isolated ß-sheets, aromatic interactions endow superior conformational stability over ionic or cation-π interactions. Circular dichroism and NMR spectroscopies, along with high-resolution X-ray crystallography, support our design principles.
Assuntos
Proteínas , Conformação Proteica em Folha beta , Ligação de Hidrogênio , Modelos Moleculares , Estrutura Secundária de Proteína , Proteínas/químicaRESUMO
Riboswitches are non-coding RNA elements that play vital roles in regulating gene expression. Their specific ligand-dependent structural reorganization facilitates their use as templates for design of engineered RNA switches for therapeutics, nanotechnology and synthetic biology. T-box riboswitches bind tRNAs to sense aminoacylation and control gene expression via transcription attenuation or translation inhibition. Here we determine the cryo-EM structure of the wild-type Mycobacterium smegmatis ileS T-box in complex with its cognate tRNA Ile . This structure shows a very flexible antisequestrator region that tolerates both 3'-OH and 2',3'-cyclic phosphate modification at the 3' end of tRNA Ile . Elongation of one helical turn (11-base pair) in both the tRNA acceptor arm and T-box Stem III maintains T-box-tRNA complex formation and increases the selectivity for tRNA 3' end modification. Moreover, elongation of Stem III results in â¼6-fold tighter binding to tRNA, which leads to increased sensitivity of downstream translational regulation indicated by precedent translation. Our results demonstrate that cryo-EM can guide RNA engineering to design improved riboswitch modules for translational regulation, and potentially a variety of additional functions.
RESUMO
Functional design of ribosomes with mutant ribosomal RNA (rRNA) can expand opportunities for understanding molecular translation, building cells from the bottom-up, and engineering ribosomes with altered capabilities. However, such efforts are hampered by cell viability constraints, an enormous combinatorial sequence space, and limitations on large-scale, 3D design of RNA structures and functions. To address these challenges, we develop an integrated community science and experimental screening approach for rational design of ribosomes. This approach couples Eterna, an online video game that crowdsources RNA sequence design to community scientists in the form of puzzles, with in vitro ribosome synthesis, assembly, and translation in multiple design-build-test-learn cycles. We apply our framework to discover mutant rRNA sequences that improve protein synthesis in vitro and cell growth in vivo, relative to wild type ribosomes, under diverse environmental conditions. This work provides insights into rRNA sequence-function relationships and has implications for synthetic biology.
Assuntos
RNA Ribossômico , Ribossomos , Ribossomos/metabolismo , RNA Ribossômico/metabolismo , Biologia Sintética , Fenótipo , Proteínas Ribossômicas/metabolismoRESUMO
Understanding the three-dimensional structure of an RNA molecule is often essential to understanding its function. Sampling algorithms and energy functions for RNA structure prediction are improving, due to the increasing diversity of structural data available for training statistical potentials and testing structural data, along with a steady supply of blind challenges through the RNA-Puzzles initiative. The recent FARFAR2 algorithm enables near-native structure predictions on fairly complex RNA structures, including automated selection of final candidate models and estimation of model accuracy. Here, we describe the use of a publicly available webserver for RNA modeling for realistic scenarios using FARFAR2, available at https://rosie.rosettacommons.org/farfar2 . We walk through two cases in some detail: a simple model pseudoknot from the frameshifting element of beet western yellows virus modeled using the "basic interface" to the webserver and a replication of RNA-Puzzle 20, a metagenomic twister sister ribozyme, using the "advanced interface." We also describe example runs of FARFAR2 modeling including two kinds of experimental data: a c-di-GMP riboswitch modeled with low-resolution restraints from MOHCA-seq experiments and a tandem GA motif modeled with 1H NMR chemical shifts.
Assuntos
RNA Catalítico , RNA , RNA/química , Conformação de Ácido Nucleico , Modelos Moleculares , RNA Catalítico/química , AlgoritmosRESUMO
Medicines based on messenger RNA (mRNA) hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ('Stanford OpenVaccine') on Kaggle, involving single-nucleotide resolution measurements on 6,043 diverse 102-130-nucleotide RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1,588 nucleotides) with improved accuracy compared with previously published models. These results indicate that such models can represent in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for dataset creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.
RESUMO
Understanding how modifications to the ribosome affect function has implications for studying ribosome biogenesis, building minimal cells, and repurposing ribosomes for synthetic biology. However, efforts to design sequence-modified ribosomes have been limited because point mutations in the ribosomal RNA (rRNA), especially in the catalytic active site (peptidyl transferase center; PTC), are often functionally detrimental. Moreover, methods for directed evolution of rRNA are constrained by practical considerations (e.g. library size). Here, to address these limitations, we developed a computational rRNA design approach for screening guided libraries of mutant ribosomes. Our method includes in silico library design and selection using a Rosetta stepwise Monte Carlo method (SWM), library construction and in vitro testing of combined ribosomal assembly and translation activity, and functional characterization in vivo. As a model, we apply our method to making modified ribosomes with mutant PTCs. We engineer ribosomes with as many as 30 mutations in their PTCs, highlighting previously unidentified epistatic interactions, and show that SWM helps identify sequences with beneficial phenotypes as compared to random library sequences. We further demonstrate that some variants improve cell growth in vivo, relative to wild type ribosomes. We anticipate that SWM design and selection may serve as a powerful tool for rRNA engineering.
Assuntos
Peptidil Transferases , Ribossomos , Domínio Catalítico , Ribossomos/metabolismo , RNA Ribossômico/metabolismo , Peptidil Transferases/metabolismo , Mutação , Proteínas Ribossômicas/genética , RNA Ribossômico 23S/metabolismoRESUMO
Minimal protein mimics have yielded novel classes of protein-protein interaction inhibitors; however, this success has not been extended to targeting intrinsically disordered proteins, which represent a significant proportion of important therapeutic targets. We sought to determine the requirements for binding an intrinsically disordered region (IDR) by its native binding partner as a prelude to developing minimal protein mimics that regulate IDR interactions. Our analysis reinforces the hypothesis that IDRs reside on a fulcrum between unfolded and folded states and that a handful of key binding residues on partner protein surfaces dictate their folding. Our studies also suggest that minimal mimics of protein surfaces may not offer specific ligands for IDRs and that it would be more judicious to target the globular protein partners of IDRs.
Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/química , Proteínas de MembranaRESUMO
Therapeutic mRNAs and vaccines are being developed for a broad range of human diseases, including COVID-19. However, their optimization is hindered by mRNA instability and inefficient protein expression. Here, we describe design principles that overcome these barriers. We develop an RNA sequencing-based platform called PERSIST-seq to systematically delineate in-cell mRNA stability, ribosome load, as well as in-solution stability of a library of diverse mRNAs. We find that, surprisingly, in-cell stability is a greater driver of protein output than high ribosome load. We further introduce a method called In-line-seq, applied to thousands of diverse RNAs, that reveals sequence and structure-based rules for mitigating hydrolytic degradation. Our findings show that highly structured "superfolder" mRNAs can be designed to improve both stability and expression with further enhancement through pseudouridine nucleoside modification. Together, our study demonstrates simultaneous improvement of mRNA stability and protein expression and provides a computational-experimental platform for the enhancement of mRNA medicines.
Assuntos
COVID-19 , RNA , COVID-19/terapia , Humanos , Pseudouridina/metabolismo , Estabilidade de RNA/genética , RNA Mensageiro/metabolismoRESUMO
Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.
RESUMO
Publishing, discussing, envisioning, modeling, designing and experimentally determining RNA three-dimensional (3D) structures involve preparation of two-dimensional (2D) drawings that depict critical functional features of the subject molecules, such as noncanonical base pairs and protein contacts. Here, we describe RiboDraw, new software for crafting these drawings. We illustrate the features of RiboDraw by applying it to several RNAs, including the Escherichia coli tRNA-Phe, the P4-P6 domain of Tetrahymena ribozyme, a -1 ribosomal frameshift stimulation element from beet western yellows virus and the 5' untranslated region of SARS-CoV-2. We show secondary structure diagrams of the 23S and 16S subunits of the E. coli ribosome that reflect noncanonical base pairs, ribosomal proteins and structural motifs, and that convey the relative positions of these critical features in 3D space. This software is a MATLAB package freely available at https://github.com/DasLab/RiboDraw.
RESUMO
RNA hydrolysis presents problems in manufacturing, long-term storage, world-wide delivery and in vivo stability of messenger RNA (mRNA)-based vaccines and therapeutics. A largely unexplored strategy to reduce mRNA hydrolysis is to redesign RNAs to form double-stranded regions, which are protected from in-line cleavage and enzymatic degradation, while coding for the same proteins. The amount of stabilization that this strategy can deliver and the most effective algorithmic approach to achieve stabilization remain poorly understood. Here, we present simple calculations for estimating RNA stability against hydrolysis, and a model that links the average unpaired probability of an mRNA, or AUP, to its overall hydrolysis rate. To characterize the stabilization achievable through structure design, we compare AUP optimization by conventional mRNA design methods to results from more computationally sophisticated algorithms and crowdsourcing through the OpenVaccine challenge on the Eterna platform. We find that rational design on Eterna and the more sophisticated algorithms lead to constructs with low AUP, which we term 'superfolder' mRNAs. These designs exhibit a wide diversity of sequence and structure features that may be desirable for translation, biophysical size, and immunogenicity. Furthermore, their folding is robust to temperature, computer modeling method, choice of flanking untranslated regions, and changes in target protein sequence, as illustrated by rapid redesign of superfolder mRNAs for B.1.351, P.1 and B.1.1.7 variants of the prefusion-stabilized SARS-CoV-2 spike protein. Increases in in vitro mRNA half-life by at least two-fold appear immediately achievable.
Assuntos
Algoritmos , RNA de Cadeia Dupla/química , RNA Mensageiro/química , RNA Viral/química , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genética , Pareamento de Bases , Sequência de Bases , COVID-19/prevenção & controle , Humanos , Hidrólise , Estabilidade de RNA , RNA de Cadeia Dupla/genética , RNA de Cadeia Dupla/imunologia , RNA Mensageiro/genética , RNA Mensageiro/imunologia , RNA Viral/genética , RNA Viral/imunologia , SARS-CoV-2/imunologia , Glicoproteína da Espícula de Coronavírus/imunologia , TermodinâmicaRESUMO
RNA molecules adopt three-dimensional structures that are critical to their function and of interest in drug discovery. Few RNA structures are known, however, and predicting them computationally has proven challenging. We introduce a machine learning approach that enables identification of accurate structural models without assumptions about their defining characteristics, despite being trained with only 18 known RNA structures. The resulting scoring function, the Atomic Rotationally Equivariant Scorer (ARES), substantially outperforms previous methods and consistently produces the best results in community-wide blind RNA structure prediction challenges. By learning effectively even from a small amount of data, our approach overcomes a major limitation of standard deep neural networks. Because it uses only atomic coordinates as inputs and incorporates no RNA-specific information, this approach is applicable to diverse problems in structural biology, chemistry, materials science, and beyond.
Assuntos
Aprendizado Profundo , Conformação de Ácido Nucleico , RNA/química , RNA/ultraestrutura , Modelos Moleculares , Redes Neurais de ComputaçãoRESUMO
Therapeutic mRNAs and vaccines are being developed for a broad range of human diseases, including COVID-19. However, their optimization is hindered by mRNA instability and inefficient protein expression. Here, we describe design principles that overcome these barriers. We develop a new RNA sequencing-based platform called PERSIST-seq to systematically delineate in-cell mRNA stability, ribosome load, as well as in-solution stability of a library of diverse mRNAs. We find that, surprisingly, in-cell stability is a greater driver of protein output than high ribosome load. We further introduce a method called In-line-seq, applied to thousands of diverse RNAs, that reveals sequence and structure-based rules for mitigating hydrolytic degradation. Our findings show that "superfolder" mRNAs can be designed to improve both stability and expression that are further enhanced through pseudouridine nucleoside modification. Together, our study demonstrates simultaneous improvement of mRNA stability and protein expression and provides a computational-experimental platform for the enhancement of mRNA medicines.
RESUMO
The rapid spread of COVID-19 is motivating development of antivirals targeting conserved SARS-CoV-2 molecular machinery. The SARS-CoV-2 genome includes conserved RNA elements that offer potential small-molecule drug targets, but most of their 3D structures have not been experimentally characterized. Here, we provide a compilation of chemical mapping data from our and other labs, secondary structure models, and 3D model ensembles based on Rosetta's FARFAR2 algorithm for SARS-CoV-2 RNA regions including the individual stems SL1-8 in the extended 5' UTR; the reverse complement of the 5' UTR SL1-4; the frameshift stimulating element (FSE); and the extended pseudoknot, hypervariable region, and s2m of the 3' UTR. For eleven of these elements (the stems in SL1-8, reverse complement of SL1-4, FSE, s2m and 3' UTR pseudoknot), modeling convergence supports the accuracy of predicted low energy states; subsequent cryo-EM characterization of the FSE confirms modeling accuracy. To aid efforts to discover small molecule RNA binders guided by computational models, we provide a second set of similarly prepared models for RNA riboswitches that bind small molecules. Both datasets ('FARFAR2-SARS-CoV-2', https://github.com/DasLab/FARFAR2-SARS-CoV-2; and 'FARFAR2-Apo-Riboswitch', at https://github.com/DasLab/FARFAR2-Apo-Riboswitch') include up to 400 models for each RNA element, which may facilitate drug discovery approaches targeting dynamic ensembles of RNA molecules.
Assuntos
Consenso , Modelos Moleculares , Conformação de Ácido Nucleico , RNA Viral/química , SARS-CoV-2/genética , Regiões 3' não Traduzidas/genética , Regiões 5' não Traduzidas/genética , Algoritmos , Aptâmeros de Nucleotídeos/genética , Sequência de Bases , Sítios de Ligação , Microscopia Crioeletrônica , Conjuntos de Dados como Assunto , Avaliação Pré-Clínica de Medicamentos/métodos , Mudança da Fase de Leitura do Gene Ribossômico/genética , Genoma Viral/genética , Estabilidade de RNA , RNA Viral/genética , Reprodutibilidade dos Testes , Riboswitch/genética , Bibliotecas de Moléculas Pequenas/químicaRESUMO
The rise of antibiotic resistance calls for new therapeutics targeting resistance factors such as the New Delhi metallo-ß-lactamase 1 (NDM-1), a bacterial enzyme that degrades ß-lactam antibiotics. We present structure-guided computational methods for designing peptide macrocycles built from mixtures of l- and d-amino acids that are able to bind to and inhibit targets of therapeutic interest. Our methods explicitly consider the propensity of a peptide to favor a binding-competent conformation, which we found to predict rank order of experimentally observed IC50 values across seven designed NDM-1- inhibiting peptides. We were able to determine X-ray crystal structures of three of the designed inhibitors in complex with NDM-1, and in all three the conformation of the peptide is very close to the computationally designed model. In two of the three structures, the binding mode with NDM-1 is also very similar to the design model, while in the third, we observed an alternative binding mode likely arising from internal symmetry in the shape of the design combined with flexibility of the target. Although challenges remain in robustly predicting target backbone changes, binding mode, and the effects of mutations on binding affinity, our methods for designing ordered, binding-competent macrocycles should have broad applicability to a wide range of therapeutic targets.