RESUMO
Splice-modulating antisense oligonucleotides (ASOs) are precision RNA-based drugs that are becoming an established modality to treat human disease. Previously, we reported the discovery of ASOs that target a novel, putative intronic RNA structure to rescue splicing of multiple pathogenic variants of F8 exon 16 that cause hemophilia A. However, the conventional approach to discovering splice-modulating ASOs is both laborious and expensive. Here, we describe an alternative paradigm that integrates data-driven RNA structure prediction and community science to discover splice-modulating ASOs. Using a splicing-deficient pathogenic variant of F8 exon 16 as a model, we show that 25% of the top-scoring molecules designed in the Eterna OpenASO challenge have a statistically significant impact on enhancing exon 16 splicing. Additionally, we show that a distinct combination of ASOs designed by Eterna players can additively enhance the inclusion of the splicing-deficient exon 16 variant. Together, our data suggests that crowdsourcing designs from a community of citizen scientists may accelerate the discovery of splice-modulating ASOs with potential to treat human disease.
RESUMO
Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.
RESUMO
Coronavirus genomes sequester their start codons within stem-loop 5 (SL5), a structured, 5' genomic RNA element. In most alpha- and betacoronaviruses, the secondary structure of SL5 is predicted to contain a four-way junction of helical stems, some of which are capped with UUYYGU hexaloops. Here, using cryogenic electron microscopy (cryo-EM) and computational modeling with biochemically determined secondary structures, we present three-dimensional structures of SL5 from six coronaviruses. The SL5 domain of betacoronavirus severe-acute-respiratory-syndrome-related coronavirus 2 (SARS-CoV-2), resolved at 4.7 Å resolution, exhibits a T-shaped structure, with its UUYYGU hexaloops at opposing ends of a coaxial stack, the T's "arms." Further analysis of SL5 domains from SARS-CoV-1 and MERS (7.1 and 6.4 to 6.9 Å resolution, respectively) indicate that the junction geometry and inter-hexaloop distances are conserved features across these human-infecting betacoronaviruses. The MERS SL5 domain displays an additional tertiary interaction, which is also observed in the non-human-infecting betacoronavirus BtCoV-HKU5 (5.9 to 8.0 Å resolution). SL5s from human-infecting alphacoronaviruses, HCoV-229E and HCoV-NL63 (6.5 and 8.4 to 9.0 Å resolution, respectively), exhibit the same coaxial stacks, including the UUYYGU-capped arms, but with a phylogenetically distinct crossing angle, an X-shape. As such, all SL5 domains studied herein fold into stable tertiary structures with cross-genus similarities and notable differences, with implications for potential protein-binding modes and therapeutic targets.
Assuntos
Alphacoronavirus , COVID-19 , Coronavirus Humano 229E , Humanos , SARS-CoV-2/genética , RNARESUMO
Designing single molecules that compute general functions of input molecular partners represents a major unsolved challenge in molecular design. Here, we demonstrate that high-throughput, iterative experimental testing of diverse RNA designs crowdsourced from Eterna yields sensors of increasingly complex functions of input oligonucleotide concentrations. After designing single-input RNA sensors with activation ratios beyond our detection limits, we created logic gates, including challenging XOR and XNOR gates, and sensors that respond to the ratio of two inputs. Finally, we describe the OpenTB challenge, which elicited 85-nucleotide sensors that compute a score for diagnosing active tuberculosis, based on the ratio of products of three gene segments. Building on OpenTB design strategies, we created an algorithm Nucleologic that produces similarly compact sensors for the three-gene score based on RNA and DNA. These results open new avenues for diverse applications of compact, single molecule sensors previously limited by design complexity.
RESUMO
The ribosome is a ribonucleoprotein complex found in all domains of life. Its role is to catalyze protein synthesis, the messenger RNA (mRNA)-templated formation of amide bonds between α-amino acid monomers. Amide bond formation occurs within a highly conserved region of the large ribosomal subunit known as the peptidyl transferase center (PTC). Here we describe the step-wise design and characterization of mini-PTC 1.1, a 284-nucleotide RNA that recapitulates many essential features of the Escherichia coli PTC. Mini-PTC 1.1 folds into a PTC-like structure under physiological conditions, even in the absence of r-proteins, and engages small molecule analogs of A- and P-site tRNAs. The sequence of mini-PTC 1.1 differs from the wild type E. coli ribosome at 12 nucleotides that were installed by a cohort of citizen scientists using the on-line video game Eterna. These base changes improve both the secondary structure and tertiary folding of mini-PTC 1.1 as well as its ability to bind small molecule substrate analogs. Here, the combined input from Eterna citizen-scientists and RNA structural analysis provides a robust workflow for the design of a minimal PTC that recapitulates many features of an intact ribosome.
Assuntos
Escherichia coli , Ribossomos , Humanos , Amidas , Escherichia coli/genética , Escherichia coli/metabolismo , Peptidil Transferases/genética , Peptidil Transferases/química , Ribossomos/metabolismo , RNA de Transferência/metabolismoRESUMO
Coronavirus genomes sequester their start codons within stem-loop 5 (SL5), a structured, 5' genomic RNA element. In most alpha- and betacoronaviruses, the secondary structure of SL5 is predicted to contain a four-way junction of helical stems, some of which are capped with UUYYGU hexaloops. Here, using cryogenic electron microscopy (cryo-EM) and computational modeling with biochemically-determined secondary structures, we present three-dimensional structures of SL5 from six coronaviruses. The SL5 domain of betacoronavirus SARS-CoV-2, resolved at 4.7 Å resolution, exhibits a T-shaped structure, with its UUYYGU hexaloops at opposing ends of a coaxial stack, the T's "arms." Further analysis of SL5 domains from SARS-CoV-1 and MERS (7.1 and 6.4-6.9 Å resolution, respectively) indicate that the junction geometry and inter-hexaloop distances are conserved features across the studied human-infecting betacoronaviruses. The MERS SL5 domain displays an additional tertiary interaction, which is also observed in the non-human-infecting betacoronavirus BtCoV-HKU5 (5.9-8.0 Å resolution). SL5s from human-infecting alphacoronaviruses, HCoV-229E and HCoV-NL63 (6.5 and 8.4-9.0 Å resolution, respectively), exhibit the same coaxial stacks, including the UUYYGU-capped arms, but with a phylogenetically distinct crossing angle, an X-shape. As such, all SL5 domains studied herein fold into stable tertiary structures with cross-genus similarities, with implications for potential protein-binding modes and therapeutic targets.
RESUMO
CASP assessments primarily rely on comparing predicted coordinates with experimental reference structures. However, experimental structures by their nature are only models themselves-their construction involves a certain degree of subjectivity in interpreting density maps and translating them to atomic coordinates. Here, we directly utilized density maps to evaluate the predictions by employing a method for ranking the quality of protein chain predictions based on their fit into the experimental density. The fit-based ranking was found to correlate well with the CASP assessment scores. Overall, the evaluation against the density map indicated that the models are of high accuracy, and occasionally even better than the reference structure in some regions of the model. Local assessment of predicted side chains in a 1.52 Å resolution map showed that side-chains are sometimes poorly positioned. Additionally, the top 118 predictions associated with 9 protein target reference structures were selected for automated refinement, in addition to the top 40 predictions for 11 RNA targets. For both proteins and RNA, the refinement of CASP15 predictions resulted in structures that are close to the reference target structure. This refinement was successful despite large conformational changes often being required, showing that predictions from CASP-assessed methods could serve as a good starting point for building atomic models in cryo-EM maps for both proteins and RNA. Loop modeling continued to pose a challenge for predictors, and together with the lack of consensus amongst models in these regions suggests that modeling, in combination with model-fit to the density, holds the potential for identifying more flexible regions within the structure.
Assuntos
Proteínas , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Proteínas/química , Conformação ProteicaRESUMO
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
Assuntos
Algoritmos , RNA , Biologia Computacional/métodos , Proteínas/químicaRESUMO
CASP assessments primarily rely on comparing predicted coordinates with experimental reference structures. However, errors in the reference structures can potentially reduce the accuracy of the assessment. This issue is particularly prominent in cryoEM-determined structures, and therefore, in the assessment of CASP15 cryoEM targets, we directly utilized density maps to evaluate the predictions. A method for ranking the quality of protein chain predictions based on rigid fitting to experimental density was found to correlate well with the CASP assessment scores. Overall, the evaluation against the density map indicated that the models are of high accuracy although local assessment of predicted side chains in a 1.52 Å resolution map showed that side-chains are sometimes poorly positioned. The top 136 predictions associated with 9 protein target reference structures were selected for refinement, in addition to the top 40 predictions for 11 RNA targets. To this end, we have developed an automated hierarchical refinement pipeline in cryoEM maps. For both proteins and RNA, the refinement of CASP15 predictions resulted in structures that are close to the reference target structure, including some regions with better fit to the density. This refinement was successful despite large conformational changes and secondary structure element movements often being required, suggesting that predictions from CASP-assessed methods could serve as a good starting point for building atomic models in cryoEM maps for both proteins and RNA. Loop modeling continued to pose a challenge for predictors with even short loops failing to be accurately modeled or refined at times. The lack of consensus amongst models suggests that modeling holds the potential for identifying more flexible regions within the structure.
RESUMO
The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.
Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Proteínas/química , Modelos Moleculares , Biologia Computacional/métodos , Difração de Raios XRESUMO
Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.
Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Proteínas/química , Modelos Moleculares , LigantesRESUMO
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and X-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as non-canonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
RESUMO
Functional design of ribosomes with mutant ribosomal RNA (rRNA) can expand opportunities for understanding molecular translation, building cells from the bottom-up, and engineering ribosomes with altered capabilities. However, such efforts are hampered by cell viability constraints, an enormous combinatorial sequence space, and limitations on large-scale, 3D design of RNA structures and functions. To address these challenges, we develop an integrated community science and experimental screening approach for rational design of ribosomes. This approach couples Eterna, an online video game that crowdsources RNA sequence design to community scientists in the form of puzzles, with in vitro ribosome synthesis, assembly, and translation in multiple design-build-test-learn cycles. We apply our framework to discover mutant rRNA sequences that improve protein synthesis in vitro and cell growth in vivo, relative to wild type ribosomes, under diverse environmental conditions. This work provides insights into rRNA sequence-function relationships and has implications for synthetic biology.
Assuntos
RNA Ribossômico , Ribossomos , Ribossomos/metabolismo , RNA Ribossômico/metabolismo , Biologia Sintética , Fenótipo , Proteínas Ribossômicas/metabolismoRESUMO
Understanding the three-dimensional structure of an RNA molecule is often essential to understanding its function. Sampling algorithms and energy functions for RNA structure prediction are improving, due to the increasing diversity of structural data available for training statistical potentials and testing structural data, along with a steady supply of blind challenges through the RNA-Puzzles initiative. The recent FARFAR2 algorithm enables near-native structure predictions on fairly complex RNA structures, including automated selection of final candidate models and estimation of model accuracy. Here, we describe the use of a publicly available webserver for RNA modeling for realistic scenarios using FARFAR2, available at https://rosie.rosettacommons.org/farfar2 . We walk through two cases in some detail: a simple model pseudoknot from the frameshifting element of beet western yellows virus modeled using the "basic interface" to the webserver and a replication of RNA-Puzzle 20, a metagenomic twister sister ribozyme, using the "advanced interface." We also describe example runs of FARFAR2 modeling including two kinds of experimental data: a c-di-GMP riboswitch modeled with low-resolution restraints from MOHCA-seq experiments and a tandem GA motif modeled with 1H NMR chemical shifts.
Assuntos
RNA Catalítico , RNA , RNA/química , Conformação de Ácido Nucleico , Modelos Moleculares , RNA Catalítico/química , AlgoritmosRESUMO
RNA three-dimensional structures provide rich and vital information for understanding their functions. Recent advances in cryogenic electron microscopy (cryo-EM) allow structure determination of RNAs and ribonucleoprotein (RNP) complexes. However, limited global and local resolutions of RNA cryo-EM maps pose great challenges in tracing RNA coordinates. The Rosetta-based "auto-DRRAFTER" method builds RNA models into moderate-resolution RNA cryo-EM density as part of the Ribosolve pipeline. Here, we describe a step-by-step protocol for auto-DRRAFTER using a glycine riboswitch from Fusobacterium nucleatum as an example. Successful implementation of this protocol allows automated RNA modeling into RNA cryo-EM density, accelerating our understanding of RNA structure-function relationships. Input and output files are being made available at https://github.com/auto-DRRAFTER/springer-chapter .
Assuntos
RNA , Riboswitch , Microscopia Crioeletrônica/métodos , Glicina , Modelos Moleculares , Conformação Proteica , RibonucleoproteínasRESUMO
Medicines based on messenger RNA (mRNA) hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ('Stanford OpenVaccine') on Kaggle, involving single-nucleotide resolution measurements on 6,043 diverse 102-130-nucleotide RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1,588 nucleotides) with improved accuracy compared with previously published models. These results indicate that such models can represent in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for dataset creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.
RESUMO
Understanding how modifications to the ribosome affect function has implications for studying ribosome biogenesis, building minimal cells, and repurposing ribosomes for synthetic biology. However, efforts to design sequence-modified ribosomes have been limited because point mutations in the ribosomal RNA (rRNA), especially in the catalytic active site (peptidyl transferase center; PTC), are often functionally detrimental. Moreover, methods for directed evolution of rRNA are constrained by practical considerations (e.g. library size). Here, to address these limitations, we developed a computational rRNA design approach for screening guided libraries of mutant ribosomes. Our method includes in silico library design and selection using a Rosetta stepwise Monte Carlo method (SWM), library construction and in vitro testing of combined ribosomal assembly and translation activity, and functional characterization in vivo. As a model, we apply our method to making modified ribosomes with mutant PTCs. We engineer ribosomes with as many as 30 mutations in their PTCs, highlighting previously unidentified epistatic interactions, and show that SWM helps identify sequences with beneficial phenotypes as compared to random library sequences. We further demonstrate that some variants improve cell growth in vivo, relative to wild type ribosomes. We anticipate that SWM design and selection may serve as a powerful tool for rRNA engineering.
Assuntos
Peptidil Transferases , Ribossomos , Domínio Catalítico , Ribossomos/metabolismo , RNA Ribossômico/metabolismo , Peptidil Transferases/metabolismo , Mutação , Proteínas Ribossômicas/genética , RNA Ribossômico 23S/metabolismoRESUMO
Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.
Assuntos
Algoritmos , RNA , Humanos , Conformação de Ácido Nucleico , Estrutura Secundária de Proteína , RNA/genética , TermodinâmicaRESUMO
The Tetrahymena group I intron has been a key system in the understanding of RNA folding and misfolding. The molecule folds into a long-lived misfolded intermediate (M) in vitro, which has been known to form extensive native-like secondary and tertiary structures but is separated by an unknown kinetic barrier from the native state (N). Here, we used cryogenic electron microscopy (cryo-EM) to resolve misfolded structures of the Tetrahymena L-21 ScaI ribozyme. Maps of three M substates (M1, M2, M3) and one N state were achieved from a single specimen with overall resolutions of 3.5 Å, 3.8 Å, 4.0 Å, and 3.0 Å, respectively. Comparisons of the structures reveal that all the M substates are highly similar to N, except for rotation of a core helix P7 that harbors the ribozyme's guanosine binding site and the crossing of the strands J7/3 and J8/7 that connect P7 to the other elements in the ribozyme core. This topological difference between the M substates and N state explains the failure of 5'-splice site substrate docking in M, supports a topological isomer model for the slow refolding of M to N due to a trapped strand crossing, and suggests pathways for M-to-N refolding.
Assuntos
Dobramento de RNA , RNA Catalítico , Tetrahymena , Microscopia Crioeletrônica , Cinética , RNA Catalítico/química , Tetrahymena/genéticaRESUMO
Influenza A virus's (IAV's) frequent genetic changes challenge vaccine strategies and engender resistance to current drugs. We sought to identify conserved and essential RNA secondary structures within IAV's genome that are predicted to have greater constraints on mutation in response to therapeutic targeting. We identified and genetically validated an RNA structure (packaging stem-loop 2 (PSL2)) that mediates in vitro packaging and in vivo disease and is conserved across all known IAV isolates. A PSL2-targeting locked nucleic acid (LNA), administered 3 d after, or 14 d before, a lethal IAV inoculum provided 100% survival in mice, led to the development of strong immunity to rechallenge with a tenfold lethal inoculum, evaded attempts to select for resistance and retained full potency against neuraminidase inhibitor-resistant virus. Use of an analogous approach to target SARS-CoV-2, prophylactic administration of LNAs specific for highly conserved RNA structures in the viral genome, protected hamsters from efficient transmission of the SARS-CoV-2 USA_WA1/2020 variant. These findings highlight the potential applicability of this approach to any virus of interest via a process we term 'programmable antivirals', with implications for antiviral prophylaxis and post-exposure therapy.