RESUMEN
Recent advancements in RNA three-dimensional (3D) structure prediction have provided significant insights into RNA biology, highlighting the essential role of RNA in cellular functions and its therapeutic potential. This review summarizes the latest developments in computational methods, particularly the incorporation of artificial intelligence and machine learning, which have improved the efficiency and accuracy of RNA structure predictions. We also discuss the integration of new experimental data types, including cryoelectron microscopy (cryo-EM) techniques and high-throughput sequencing, which have transformed RNA structure modeling. The combination of experimental advances with computational methods represents a significant leap in RNA structure determination. We review the outcomes of RNA-Puzzles and critical assessment of structure prediction (CASP) challenges, which assess the state of the field and limitations of existing methods. Future perspectives are discussed, focusing on the impact of RNA 3D structure prediction on understanding RNA mechanisms and its implications for drug discovery and RNA-targeted therapies, opening new avenues in molecular biology.
RESUMEN
Betacoronaviruses are a genus within the Coronaviridae family of RNA viruses. They are capable of infecting vertebrates and causing epidemics as well as global pandemics in humans. Mitigating the threat posed by Betacoronaviruses requires an understanding of their molecular diversity. The development of novel antivirals hinges on understanding the key regulatory elements within the viral RNA genomes, in particular the 5'-proximal region, which is pivotal for viral protein synthesis. Using a combination of cryo-electron microscopy, atomic force microscopy, chemical probing, and computational modeling, we determined the structures of 5'-proximal regions in RNA genomes of Betacoronaviruses from four subgenera: OC43-CoV, SARS-CoV-2, MERS-CoV, and Rousettus bat-CoV. We obtained cryo-electron microscopy maps and determined atomic-resolution models for the stem-loop-5 (SL5) region at the translation start site and found that despite low sequence similarity and variable length of the helical elements it exhibits a remarkable structural conservation. Atomic force microscopy imaging revealed a common domain organization and a dynamic arrangement of structural elements connected with flexible linkers across all four Betacoronavirus subgenera. Together, these results reveal common features of a critical regulatory region shared between different Betacoronavirus RNA genomes, which may allow targeting of these RNAs by broad-spectrum antiviral therapeutics.
Asunto(s)
Betacoronavirus , ARN Viral , Betacoronavirus/genética , Microscopía por Crioelectrón , Genoma Viral/genética , ARN Viral/química , ARN Viral/genética , ARN Viral/ultraestructura , SARS-CoV-2/genéticaRESUMEN
Knots are very common in polymers, including DNA and protein molecules. Yet, no genuine knot has been identified in natural RNA molecules to date. Upon re-examining experimentally determined RNA 3D structures, we discovered a trefoil knot 31, the most basic non-trivial knot, in the RydC RNA. This knotted RNA is a member of a small family of short bacterial RNAs, whose secondary structure is characterized by an H-type pseudoknot. Molecular dynamics simulations suggest a folding pathway of the RydC RNA that starts with a native twisted loop. Based on sequence analyses and computational RNA 3D structure predictions, we postulate that this trefoil knot is a conserved feature of all RydC-related RNAs. The first discovery of a knot in a natural RNA molecule introduces a novel perspective on RNA 3D structure formation and on fundamental research on the relationship between function and spatial structure of biopolymers.
Asunto(s)
Pliegue del ARN , ARN , Simulación de Dinámica Molecular , ARN/química , ARN/genéticaRESUMEN
The MODOMICS database was updated with recent data and now includes new data types related to RNA modifications. Changes to the database include an expanded modification catalog, encompassing both natural and synthetic residues identified in RNA structures. This addition aids in representing RNA sequences from the RCSB PDB database more effectively. To manage the increased number of modifications, adjustments to the nomenclature system were made. Updates in the RNA sequences section include the addition of new sequences and the reintroduction of sequence alignments for tRNAs and rRNAs. The protein section was updated and connected to structures from the RCSB PDB database and predictions by AlphaFold. MODOMICS now includes a data annotation system, with 'Evidence' and 'Estimated Reliability' features, offering clarity on data support and accuracy. This system is open to all MODOMICS entries, enhancing the accuracy of RNA modification data representation. MODOMICS is available at https://iimcb.genesilico.pl/modomics/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN , Bases de Datos de Proteínas , ARN/química , ARN/genética , Internet , Análisis de Secuencia de ARN , Interfaz Usuario-ComputadorRESUMEN
Ribonucleic acid (RNA) molecules serve as master regulators of cells by encoding their biological function in the ribonucleotide sequence, particularly their ability to interact with other molecules. To understand how RNA molecules perform their biological tasks and to design new sequences with specific functions, it is of great benefit to be able to computationally predict how RNA folds and interacts in the cellular environment. Our workflow for computational modeling of the 3D structures of RNA and its interactions with other molecules uses a set of methods developed in our laboratory, including MeSSPredRNA for predicting canonical and non-canonical base pairs, PARNASSUS for detecting remote homology based on comparisons of sequences and secondary structures, ModeRNA for comparative modeling, the SimRNA family of programs for modeling RNA 3D structure and its complexes with other molecules, and QRNAS for model refinement. In this study, we present the results of testing this workflow in predicting RNA 3D structures in the CASP15 experiment. The overall high score of the computational models predicted by our group demonstrates the robustness of our workflow and its individual components in terms of predicting RNA 3D structures of acceptable quality that are close to the target structures. However, the variance in prediction quality is still quite high, and the results are still too far from the level of protein 3D structure predictions. This exercise led us to consider several improvements, especially to better predict and enforce stacking interactions and non-canonical base pairs.
Asunto(s)
ARN , ARN/química , Conformación de Ácido Nucleico , Modelos Moleculares , Emparejamiento Base , Simulación por ComputadorRESUMEN
SUMMARY: Structure determination is a key step in the functional characterization of many non-coding RNA molecules. High-resolution RNA 3D structure determination efforts, however, are not keeping up with the pace of discovery of new non-coding RNA sequences. This increases the importance of computational approaches and low-resolution experimental data, such as from the small-angle X-ray scattering experiments. We present RNA Masonry, a computer program and a web service for a fully automated modeling of RNA 3D structures. It assemblies RNA fragments into geometrically plausible models that meet user-provided secondary structure constraints, restraints on tertiary contacts, and small-angle X-ray scattering data. We illustrate the method description with detailed benchmarks and its application to structural studies of viral RNAs with SAXS restraints. AVAILABILITY AND IMPLEMENTATION: The program web server is available at http://iimcb.genesilico.pl/rnamasonry. The source code is available at https://gitlab.com/gchojnowski/rnamasonry.
Asunto(s)
ARN no Traducido , ARN Viral , Dispersión del Ángulo Pequeño , Rayos X , Difracción de Rayos XRESUMEN
The Nucleic Acid Circular Dichroism Database (NACDDB) is a public repository that archives and freely distributes circular dichroism (CD) and synchrotron radiation CD (SRCD) spectral data about nucleic acids, and the associated experimental metadata, structural models, and links to literature. NACDDB covers CD data for various nucleic acid molecules, including DNA, RNA, DNA/RNA hybrids, and various nucleic acid derivatives. The entries are linked to primary sequence and experimental structural data, as well as to the literature. Additionally, for all entries, 3D structure models are provided. All entries undergo expert validation and curation procedures to ensure completeness, consistency, and quality of the data included. The NACDDB is open for submission of the CD data for nucleic acids. NACDDB is available at: https://genesilico.pl/nacddb/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Ácidos Nucleicos , Dicroismo Circular , Sincrotrones , Ácidos Nucleicos/químicaRESUMEN
RNA is a unique biomolecule that is involved in a variety of fundamental biological functions, all of which depend solely on its structure and dynamics. Since the experimental determination of crystal RNA structures is laborious, computational 3D structure prediction methods are experiencing an ongoing and thriving development. Such methods can lead to many models; thus, it is necessary to build comparisons and extract common structural motifs for further medical or biological studies. Here, we introduce a computational pipeline dedicated to reference-free high-throughput comparative analysis of 3D RNA structures. We show its application in the RNA-Puzzles challenge, in which five participating groups attempted to predict the three-dimensional structures of 5'- and 3'-untranslated regions (UTRs) of the SARS-CoV-2 genome. We report the results of this puzzle and discuss the structural motifs obtained from the analysis. All simulated models and tools incorporated into the pipeline are open to scientific and academic use.
Asunto(s)
COVID-19 , ARN , Regiones no Traducidas 3' , Humanos , Conformación de Ácido Nucleico , ARN/química , SARS-CoV-2RESUMEN
The MODOMICS database has been, since 2006, a manually curated and centralized resource, storing and distributing comprehensive information about modified ribonucleosides. Originally, it only contained data on the chemical structures of modified ribonucleosides, their biosynthetic pathways, the location of modified residues in RNA sequences, and RNA-modifying enzymes. Over the years, prompted by the accumulation of new knowledge and new types of data, it has been updated with new information and functionalities. In this new release, we have created a catalog of RNA modifications linked to human diseases, e.g., due to mutations in genes encoding modification enzymes. MODOMICS has been linked extensively to RCSB Protein Data Bank, and sequences of experimentally determined RNA structures with modified residues have been added. This expansion was accompanied by including nucleotide 5'-monophosphate residues. We redesigned the web interface and upgraded the database backend. In addition, a search engine for chemically similar modified residues has been included that can be queried by SMILES codes or by drawing chemical molecules. Finally, previously available datasets of modified residues, biosynthetic pathways, and RNA-modifying enzymes have been updated. Overall, we provide users with a new, enhanced, and restyled tool for research on RNA modification. MODOMICS is available at https://iimcb.genesilico.pl/modomics/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Enzimas/genética , ARN/genética , Ribonucleósidos/genética , Interfaz Usuario-Computador , Secuencia de Bases , Enfermedades Cardiovasculares/genética , Enfermedades Cardiovasculares/metabolismo , Enfermedades Cardiovasculares/patología , Gráficos por Computador , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Enzimas/metabolismo , Enfermedades Gastrointestinales/genética , Enfermedades Gastrointestinales/metabolismo , Enfermedades Gastrointestinales/patología , Enfermedades Hematológicas/genética , Enfermedades Hematológicas/metabolismo , Enfermedades Hematológicas/patología , Humanos , Internet , Trastornos Mentales/genética , Trastornos Mentales/metabolismo , Trastornos Mentales/patología , Enfermedades Musculoesqueléticas/genética , Enfermedades Musculoesqueléticas/metabolismo , Enfermedades Musculoesqueléticas/patología , Mutación , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/patología , Enfermedades Neurodegenerativas/genética , Enfermedades Neurodegenerativas/metabolismo , Enfermedades Neurodegenerativas/patología , ARN/metabolismo , Procesamiento Postranscripcional del ARN , Ribonucleósidos/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMEN
Non-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for different ncRNA classes. Additionally, we find similar averages for minimum folding energy index across various ncRNAs classes except for pre-miRNAs and lncRNAs. Various RNA folding measures show similar trends among the different ncRNA classes except for pre-miRNAs and lncRNAs. We observe different k-mer repeat signatures of length three among various ncRNA classes. However, in pre-miRs and lncRNAs, a diffuse pattern of k-mers is observed. Using these attributes, we train eight different classifiers to discriminate various ncRNA classes in plants. Support vector machines employing radial basis function show the highest accuracy (average F1 of ~96%) in discriminating ncRNAs, and the classifier is implemented as a web server, NCodR.
RESUMEN
The design of high-affinity, RNA-binding ligands has proven very challenging. This is due to the unique structural properties of RNA, often characterized by polar surfaces and high flexibility. In addition, the frequent lack of well-defined binding pockets complicates the development of small molecule binders. This has triggered the search for alternative scaffolds of intermediate size. Among these, peptide-derived molecules represent appealing entities as they can mimic structural features also present in RNA-binding proteins. However, the application of peptidic RNA-targeting ligands is hampered by a lack of design principles and their inherently low bio-stability. Here, the structure-based design of constrained α-helical peptides derived from the viral suppressor of RNA silencing, TAV2b, is described. We observe that the introduction of two inter-side chain crosslinks provides peptides with increased α-helicity and protease stability. One of these modified peptides (B3) shows high affinity for double-stranded RNA structures including a palindromic siRNA as well as microRNA-21 and its precursor pre-miR-21. Notably, B3 binding to pre-miR-21 inhibits Dicer processing in a biochemical assay. As a further characteristic this peptide also exhibits cellular entry. Our findings show that constrained peptides can efficiently mimic RNA-binding proteins rendering them potentially useful for the design of bioactive RNA-targeting ligands.
Asunto(s)
Péptidos/química , Interferencia de ARN , ARN Bicatenario/química , Proteínas de Unión al ARN/química , Proteínas Virales/química , Permeabilidad de la Membrana Celular , Cucumovirus , Endopeptidasa K , Humanos , Células K562 , MicroARNs/química , MicroARNs/metabolismo , Imitación Molecular , Péptidos/metabolismo , Precursores del ARN/química , Precursores del ARN/metabolismo , ARN Bicatenario/metabolismo , ARN Interferente Pequeño/química , ARN Interferente Pequeño/metabolismoRESUMEN
RECQ1 is the shortest among the five human RecQ helicases comprising of two RecA like domains, a zinc-binding domain and a RecQ C-terminal domain containing the winged-helix (WH). Mutations or deletions on the tip of a ß-hairpin located in the WH domain are known to abolish the unwinding activity. Interestingly, the same mutations on the ß-hairpin of annealing incompetent RECQ1 mutant (RECQ1T1) have been reported to restore its annealing activity. In an attempt to unravel the strand annealing mechanism, we have crystallized a fragment of RECQ1 encompassing D2-Zn-WH domains harbouring mutations on the ß-hairpin. From our crystal structure data and interface analysis, we have demonstrated that an α-helix located in zinc-binding domain potentially interacts with residues of WH domain, which plays a significant role in strand annealing activity. We have shown that deletion of the α-helix or mutation of specific residues on it restores strand annealing activity of annealing deficient constructs of RECQ1. Our results also demonstrate that mutations on the α-helix induce conformational changes and affects DNA stimulated ATP hydrolysis and unwinding activity of RECQ1. Our study, for the first time, provides insight into the conformational requirements of the WH domain for efficient strand annealing by human RECQ1.
Asunto(s)
ADN de Cadena Simple/química , RecQ Helicasas/química , Sitios de Unión , ADN de Cadena Simple/metabolismo , Humanos , Simulación de Dinámica Molecular , Mutación , Unión Proteica , Conformación Proteica en Hélice alfa , RecQ Helicasas/genética , RecQ Helicasas/metabolismo , Zinc/metabolismoRESUMEN
The molecules of the ribonucleic acid (RNA) perform a variety of vital roles in all living cells. Their biological function depends on their structure and dynamics, both of which are difficult to experimentally determine but can be theoretically inferred based on the RNA sequence. SimRNA is one of the computational methods for molecular simulations of RNA 3D structure formation. The method is based on a simplified (coarse-grained) representation of nucleotide chains, a statistically derived model of interactions (statistical potential), and the Monte Carlo method as a conformational sampling scheme.The current version of SimRNA (3.22) is able to predict basic topologies of RNA molecules with sizes up to about 50-70 nucleotides, based on their sequences only, and larger molecules if supplied with appropriate distance restraints. The user can specify various types of restraints, including secondary structure, pairwise atom-atom distances, and positions of atoms. SimRNA can be also used for studying systems composed of several chains of RNA. SimRNA is a folding simulations method, thus it allows for examining folding pathways, getting an approximate view of the energy landscapes.
Asunto(s)
Simulación de Dinámica Molecular , Pliegue del ARN , ARN/química , Método de MontecarloRESUMEN
Protein-RNA recognition is highly affinity-driven and regulates a wide array of cellular functions. In this study, we have curated a binding affinity data set of 40 protein-RNA complexes, for which at least one unbound partner is available in the docking benchmark. The data set covers a wide affinity range of eight orders of magnitude as well as four different structural classes. On average, we find the complexes with single-stranded RNA have the highest affinity, whereas the complexes with the duplex RNA have the lowest. Nevertheless, free energy gain upon binding is the highest for the complexes with ribosomal proteins and the lowest for the complexes with tRNA with an average of -5.7 cal/mol/Å2 in the entire data set. We train regression models to predict the binding affinity from the structural and physicochemical parameters of protein-RNA interfaces. The best fit model with the lowest maximum error is provided with three interface parameters: relative hydrophobicity, conformational change upon binding and relative hydration pattern. This model has been used for predicting the binding affinity on a test data set, generated using mutated structures of yeast aspartyl-tRNA synthetase, for which experimentally determined ΔG values of 40 mutations are available. The predicted ΔGempirical values highly correlate with the experimental observations. The data set provided in this study should be useful for further development of the binding affinity prediction methods. Moreover, the model developed in this study enhances our understanding on the structural basis of protein-RNA binding affinity and provides a platform to engineer protein-RNA interfaces with desired affinity.
Asunto(s)
Modelos Moleculares , Conformación de Ácido Nucleico , Conformación Proteica , Proteínas de Unión al ARN/química , ARN/química , Algoritmos , Sitios de Unión , Modelos Teóricos , Mutación , Unión Proteica , ARN/metabolismo , ARN de Transferencia/química , ARN de Transferencia/genética , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Reproducibilidad de los Resultados , Relación Estructura-ActividadRESUMEN
BACKGROUND: Computational models of RNA 3D structure often present various inaccuracies caused by simplifications used in structure prediction methods, such as template-based modeling or coarse-grained simulations. To obtain a high-quality model, the preliminary RNA structural model needs to be refined, taking into account atomic interactions. The goal of the refinement is not only to improve the local quality of the model but to bring it globally closer to the true structure. RESULTS: We present QRNAS, a software tool for fine-grained refinement of nucleic acid structures, which is an extension of the AMBER simulation method with additional restraints. QRNAS is capable of handling RNA, DNA, chimeras, and hybrids thereof, and enables modeling of nucleic acids containing modified residues. CONCLUSIONS: We demonstrate the ability of QRNAS to improve the quality of models generated with different methods. QRNAS was able to improve MolProbity scores of NMR structures, as well as of computational models generated in the course of the RNA-Puzzles experiment. The overall geometry improvement may be associated with increased model accuracy, especially on the level of correctly modeled base-pairs, but the systematic improvement of root mean square deviation to the reference structure should not be expected. The method has been integrated into a computational modeling workflow, enabling improved RNA 3D structure prediction.
Asunto(s)
Biología Computacional/métodos , ADN/química , ARN/química , Enlace de Hidrógeno , Modelos Moleculares , Conformación de Ácido Nucleico , Programas InformáticosRESUMEN
RNA molecules are master regulators of cells. They are involved in a variety of molecular processes: they transmit genetic information, sense cellular signals and communicate responses, and even catalyze chemical reactions. As in the case of proteins, RNA function is dictated by its structure and by its ability to adopt different conformations, which in turn is encoded in the sequence. Experimental determination of high-resolution RNA structures is both laborious and difficult, and therefore the majority of known RNAs remain structurally uncharacterized. To address this problem, predictive computational methods were developed based on the accumulated knowledge of RNA structures determined so far, the physical basis of the RNA folding, and taking into account evolutionary considerations, such as conservation of functionally important motifs. However, all theoretical methods suffer from various limitations, and they are generally unable to accurately predict structures for RNA sequences longer than 100-nt residues unless aided by additional experimental data. In this article, we review experimental methods that can generate data usable by computational methods, as well as computational approaches for RNA structure prediction that can utilize data from experimental analyses. We outline methods and data types that can be potentially useful for RNA 3D structure modeling but are not commonly used by the existing software, suggesting directions for future development.
Asunto(s)
Modelos Moleculares , Biología Molecular/métodos , ARN/química , Biología Computacional/métodos , Cristalografía por Rayos X , Espectroscopía de Resonancia por Spin del Electrón , Transferencia Resonante de Energía de Fluorescencia , Espectroscopía de Resonancia Magnética , Microscopía de Fuerza Atómica , Microscopía Electrónica , Conformación de Ácido Nucleico , Dispersión del Ángulo Pequeño , Difracción de Rayos XRESUMEN
We dissect the protein-protein interfaces into water preservation (WP), water hydration (WH) and water dehydration (WD) sites by comparing the water-mediated hydrogen bonds (H-bond) in the bound and unbound states of the interacting subunits. Upon subunit complexation, if a H-bond between an interface water and a protein polar group is retained, we assign it as WP site; if it is lost, we assign it as WD site and if a new H-bond is created, we assign it as WH site. We find that the density of WD sites is highest followed by WH and WP sites except in antigen and (or) antibody complexes, where the density of WH sites is highest followed by WD and WP sites. Furthermore, we find that WP sites are the most conserved followed by WD and WH sites in all class of complexes except in antigen and (or) antibody complexes, where WD sites are the most conserved followed by WH and WP sites. A significant number of WP and WH sites are involved in water bridges that stabilize the subunit interactions. At WH sites, the residues involved in water bridges are significantly better conserved than the other residues. However, no such difference is observed at WP sites. Interestingly, WD sites are generally replaced with direct H-bonds upon subunit complexation. Significantly, we observe many water-mediated H-bonds remain preserved in spite of large conformational changes upon subunit complexation. These findings have implications in predicting and engineering water binding sites at protein-protein interfaces.
Asunto(s)
Sitios de Unión , Proteínas/química , Agua/química , Bases de Datos de Proteínas , Enlace de Hidrógeno , Modelos Moleculares , Conformación Molecular , Estructura Molecular , Complejos Multiproteicos/química , Unión Proteica , Relación Estructura-ActividadRESUMEN
Protein-RNA recognition often induces conformational changes in binding partners. Consequently, the solvent accessible surface area (SASA) buried in contact estimated from the co-crystal structures may differ from that calculated using their unbound forms. To evaluate the change in accessibility upon binding, we compare SASA of 126 protein-RNA complexes between bound and unbound forms. We observe, in majority of cases the interface of both the binding partners gain accessibility upon binding, which is often associated with either large domain movements or secondary structural transitions in RNA-binding proteins (RBPs), and binding-induced conformational changes in RNAs. At the non-interface region, majority of RNAs lose accessibility upon binding, however, no such preference is observed for RBPs. Side chains of RBPs have major contribution in change in accessibility. In case of flexible binding, we find a moderate correlation between the binding free energy and change in accessibility at the interface. Finally, we introduce a parameter, the ratio of gain to loss of accessibility upon binding, which can be used to identify the native solution among the flexible docking models. Our findings provide fundamental insights into the relationship between flexibility and solvent accessibility, and advance our understanding on binding induced folding in protein-RNA recognition.
Asunto(s)
Pliegue de Proteína , Proteínas de Unión al ARN/química , ARN/química , Conjuntos de Datos como Asunto , Entropía , Enlace de Hidrógeno , Simulación del Acoplamiento Molecular , Unión Proteica , Conformación Proteica , Dominios y Motivos de Interacción de Proteínas , Estructura Secundaria de Proteína , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Solventes/químicaRESUMEN
We present an updated version of the protein-RNA docking benchmark, which we first published four years back. The non-redundant protein-RNA docking benchmark version 2.0 consists of 126 test cases, a threefold increase in number compared to its previous version. The present version consists of 21 unbound-unbound cases, of which, in 12 cases, the unbound RNAs are taken from another complex. It also consists of 95 unbound-bound cases where only the protein is available in the unbound state. Besides, we introduce 10 new bound-unbound cases where only the RNA is found in the unbound state. Based on the degree of conformational change of the interface residues upon complex formation the benchmark is classified into 72 rigid-body cases, 25 semiflexible cases and 19 full flexible cases. It also covers a wide range of conformational flexibility including small side chain movement to large domain swapping in protein structures as well as flipping and restacking in RNA bases. This benchmark should provide the docking community with more test cases for evaluating rigid-body as well as flexible docking algorithms. Besides, it will also facilitate the development of new algorithms that require large number of training set. The protein-RNA docking benchmark version 2.0 can be freely downloaded from http://www.csb.iitkgp.ernet.in/applications/PRDBv2. Proteins 2017; 85:256-267. © 2016 Wiley Periodicals, Inc.