RESUMEN
The Nucleic Acid InfraRed Data Bank (NAIRDB) serves as a comprehensive public repository dedicated to the archival and free distribution of Fourier transform infrared (FTIR) spectral data specific to nucleic acids. This database encompasses a collection of FTIR spectra covering diverse nucleic acid molecules, including DNA, RNA, DNA/RNA hybrids and their various derivatives. NAIRDB covers details of the experimental conditions for FTIR measurements, literature links, primary sequence data, information about experimentally determined structures for related nucleic acid molecules and/or computationally modeled 3D structures. All entries undergo expert validation and curation to maintain the completeness, consistency and quality of the data. NAIRDB can be searched by similarity of nucleic acid sequences or by direct comparison of spectra. The database is open for the submission of the FTIR data for nucleic acids. NAIRDB is available at https://nairdb.genesilico.pl.
RESUMEN
N6-Methyladenosine (m6A) is the most abundant internal modification in mRNAs. Despite accumulating evidence for the profound impact of m6A on cancer biology, there are conflicting reports that alterations in genes encoding the m6A machinery proteins can either promote or suppress cancer, even in the same tumor type. Using data from The Cancer Genome Atlas, we performed a pan-cancer investigation of 15 m6A core factors in nearly 10000 samples from 31 tumor types to reveal underlying cross-tumor patterns. Altered expression, largely driven by copy number variations at the chromosome arm level, results in the most common mode of dysregulation of these factors. YTHDF1, YTHDF2, YTHDF3 and VIRMA are the most frequently altered factors and the only ones to be uniquely altered when tumors are grouped according to the expression pattern of the m6A factors. These genes are also the only ones with coherent, pan-cancer predictive power for progression-free survival. On the contrary, METTL3, the most intensively studied m6A factor as a cancer target, shows much lower levels of alteration and no predictive power for patient survival. Therefore, we propose the non-enzymatic YTHDF and VIRMA genes as preferred subjects to dissect the role of m6A in cancer and as priority cancer targets.
RESUMEN
Advancements in genome-wide sequence analysis have led to the discovery of numerous novel bacterial non-coding RNAs (ncRNAs). These ncRNAs have been categorized into various RNA families and classes based on their size, structure, function, and evolutionary relationships. One such ncRNA family, raiA, is notably abundant in the bacterial phyla Firmicutes and Actinobacteria and is remarkably well-conserved across many Gram-positive bacteria. In this study, we integrated cryo-electron microscopy single-particle analysis with computational modeling and biochemical techniques to elucidate the structural characteristics of raiA from Clostridium sp. CAG 138. Our findings reveal the globular 3D fold of raiA, providing valuable structural insights. This analysis paves the way for future investigations into the functional properties of raiA, potentially uncovering new regulatory mechanisms in bacterial ncRNAs.
RESUMEN
Recent advancements in RNA three-dimensional (3D) structure prediction have provided significant insights into RNA biology, highlighting the essential role of RNA in cellular functions and its therapeutic potential. This review summarizes the latest developments in computational methods, particularly the incorporation of artificial intelligence and machine learning, which have improved the efficiency and accuracy of RNA structure predictions. We also discuss the integration of new experimental data types, including cryoelectron microscopy (cryo-EM) techniques and high-throughput sequencing, which have transformed RNA structure modeling. The combination of experimental advances with computational methods represents a significant leap in RNA structure determination. We review the outcomes of RNA-Puzzles and critical assessment of structure prediction (CASP) challenges, which assess the state of the field and limitations of existing methods. Future perspectives are discussed, focusing on the impact of RNA 3D structure prediction on understanding RNA mechanisms and its implications for drug discovery and RNA-targeted therapies, opening new avenues in molecular biology.
RESUMEN
Non-coding RNAs play a major role in diverse processes in living cells with their sequence and spatial structure serving as the principal determinants of their function. Superposition of RNA 3D structures is the most accurate method for comparative analysis of RNA molecules and for inferring structure-based sequence alignments. Topology-independent superposition is particularly relevant, as evidenced by structurally similar RNAs with sequence permutations such as tRNA and Y RNA. To date, state-of-the-art methods for RNA 3D structure superposition rely on intricate heuristics, and the potential for topology-independent superposition has not been exhausted. Recently, we introduced the ARTEM method for unrestrained pairwise superposition of RNA 3D modules and now we developed it further to solve the global RNA 3D structure alignment problem. Our new tool ARTEMIS significantly outperforms state-of-the-art tools in both sequentially-ordered and topology-independent RNA 3D structure superposition. Using ARTEMIS we discovered a helical packing motif to be preserved within different backbone topology contexts across various non-coding RNAs, including multiple ribozymes and riboswitches. We anticipate that ARTEMIS will be essential for elucidating the landscape of RNA 3D folds and motifs featuring sequence permutations that thus far remained unexplored due to limitations in previous computational approaches.
Asunto(s)
Conformación de Ácido Nucleico , Alineación de Secuencia , Programas Informáticos , Alineación de Secuencia/métodos , ARN/química , Modelos Moleculares , ARN no Traducido/química , ARN no Traducido/genética , ARN Catalítico/química , Algoritmos , Riboswitch , ARN de Transferencia/química , ARN de Transferencia/genética , Análisis de Secuencia de ARN/métodosRESUMEN
Research on ribonucleic acid (RNA) structures and functions benefits from easy-to-use tools for computational prediction and analyses of RNA three-dimensional (3D) structure. The SimRNAweb server version 2.0 offers an enhanced, user-friendly platform for RNA 3D structure prediction and analysis of RNA folding trajectories based on the SimRNA method. SimRNA employs a coarse-grained model, Monte Carlo sampling and statistical potentials to explore RNA conformational space, optionally guided by spatial restraints. Recognized for its accuracy in RNA 3D structure prediction in RNA-Puzzles and CASP competitions, SimRNA is particularly useful for incorporating restraints based on experimental data. The new server version introduces performance optimizations and extends user control over simulations and the processing of results. It allows the application of various hard and soft restraints, accommodating alternative structures involving canonical and noncanonical base pairs and unpaired residues, while also integrating data from chemical probing methods. Enhanced features include an improved analysis of folding trajectories, offering advanced clustering options and multiple analyses of the generated trajectories. These updates provide comprehensive tools for detailed RNA structure analysis. SimRNAweb v2.0 significantly broadens the scope of RNA modeling, emphasizing flexibility and user-defined parameter control. The web server is available at https://genesilico.pl/SimRNAweb.
Asunto(s)
Internet , Modelos Moleculares , Conformación de Ácido Nucleico , Pliegue del ARN , ARN , Programas Informáticos , ARN/química , Método de MontecarloRESUMEN
E3 ubiquitin ligases recognize substrates through their short linear motifs termed degrons. While degron-signaling has been a subject of extensive study, resources for its systematic screening are limited. To bridge this gap, we developed DEGRONOPEDIA, a web server that searches for degrons and maps them to nearby residues that can undergo ubiquitination and disordered regions, which may act as protein unfolding seeds. Along with an evolutionary assessment of degron conservation, the server also reports on post-translational modifications and mutations that may modulate degron availability. Acknowledging the prevalence of degrons at protein termini, DEGRONOPEDIA incorporates machine learning to assess N-/C-terminal stability, supplemented by simulations of proteolysis to identify degrons in newly formed termini. An experimental validation of a predicted C-terminal destabilizing motif, coupled with the confirmation of a post-proteolytic degron in another case, exemplifies its practical application. DEGRONOPEDIA can be freely accessed at degronopedia.com.
Asunto(s)
Internet , Procesamiento Proteico-Postraduccional , Proteolisis , Proteoma , Programas Informáticos , Ubiquitina-Proteína Ligasas , Ubiquitinación , Proteoma/química , Ubiquitina-Proteína Ligasas/metabolismo , Ubiquitina-Proteína Ligasas/química , Ubiquitina-Proteína Ligasas/genética , Humanos , Aprendizaje Automático , Secuencias de Aminoácidos , DegronesRESUMEN
Betacoronaviruses are a genus within the Coronaviridae family of RNA viruses. They are capable of infecting vertebrates and causing epidemics as well as global pandemics in humans. Mitigating the threat posed by Betacoronaviruses requires an understanding of their molecular diversity. The development of novel antivirals hinges on understanding the key regulatory elements within the viral RNA genomes, in particular the 5'-proximal region, which is pivotal for viral protein synthesis. Using a combination of cryo-electron microscopy, atomic force microscopy, chemical probing, and computational modeling, we determined the structures of 5'-proximal regions in RNA genomes of Betacoronaviruses from four subgenera: OC43-CoV, SARS-CoV-2, MERS-CoV, and Rousettus bat-CoV. We obtained cryo-electron microscopy maps and determined atomic-resolution models for the stem-loop-5 (SL5) region at the translation start site and found that despite low sequence similarity and variable length of the helical elements it exhibits a remarkable structural conservation. Atomic force microscopy imaging revealed a common domain organization and a dynamic arrangement of structural elements connected with flexible linkers across all four Betacoronavirus subgenera. Together, these results reveal common features of a critical regulatory region shared between different Betacoronavirus RNA genomes, which may allow targeting of these RNAs by broad-spectrum antiviral therapeutics.
Asunto(s)
Betacoronavirus , ARN Viral , Betacoronavirus/genética , Microscopía por Crioelectrón , Genoma Viral/genética , ARN Viral/química , ARN Viral/genética , ARN Viral/ultraestructura , SARS-CoV-2/genéticaRESUMEN
Knots are very common in polymers, including DNA and protein molecules. Yet, no genuine knot has been identified in natural RNA molecules to date. Upon re-examining experimentally determined RNA 3D structures, we discovered a trefoil knot 31, the most basic non-trivial knot, in the RydC RNA. This knotted RNA is a member of a small family of short bacterial RNAs, whose secondary structure is characterized by an H-type pseudoknot. Molecular dynamics simulations suggest a folding pathway of the RydC RNA that starts with a native twisted loop. Based on sequence analyses and computational RNA 3D structure predictions, we postulate that this trefoil knot is a conserved feature of all RydC-related RNAs. The first discovery of a knot in a natural RNA molecule introduces a novel perspective on RNA 3D structure formation and on fundamental research on the relationship between function and spatial structure of biopolymers.
Asunto(s)
Pliegue del ARN , ARN , Simulación de Dinámica Molecular , ARN/química , ARN/genéticaRESUMEN
The MODOMICS database was updated with recent data and now includes new data types related to RNA modifications. Changes to the database include an expanded modification catalog, encompassing both natural and synthetic residues identified in RNA structures. This addition aids in representing RNA sequences from the RCSB PDB database more effectively. To manage the increased number of modifications, adjustments to the nomenclature system were made. Updates in the RNA sequences section include the addition of new sequences and the reintroduction of sequence alignments for tRNAs and rRNAs. The protein section was updated and connected to structures from the RCSB PDB database and predictions by AlphaFold. MODOMICS now includes a data annotation system, with 'Evidence' and 'Estimated Reliability' features, offering clarity on data support and accuracy. This system is open to all MODOMICS entries, enhancing the accuracy of RNA modification data representation. MODOMICS is available at https://iimcb.genesilico.pl/modomics/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN , Bases de Datos de Proteínas , ARN/química , ARN/genética , Internet , Análisis de Secuencia de ARN , Interfaz Usuario-ComputadorRESUMEN
Ribonucleic acid (RNA) molecules serve as master regulators of cells by encoding their biological function in the ribonucleotide sequence, particularly their ability to interact with other molecules. To understand how RNA molecules perform their biological tasks and to design new sequences with specific functions, it is of great benefit to be able to computationally predict how RNA folds and interacts in the cellular environment. Our workflow for computational modeling of the 3D structures of RNA and its interactions with other molecules uses a set of methods developed in our laboratory, including MeSSPredRNA for predicting canonical and non-canonical base pairs, PARNASSUS for detecting remote homology based on comparisons of sequences and secondary structures, ModeRNA for comparative modeling, the SimRNA family of programs for modeling RNA 3D structure and its complexes with other molecules, and QRNAS for model refinement. In this study, we present the results of testing this workflow in predicting RNA 3D structures in the CASP15 experiment. The overall high score of the computational models predicted by our group demonstrates the robustness of our workflow and its individual components in terms of predicting RNA 3D structures of acceptable quality that are close to the target structures. However, the variance in prediction quality is still quite high, and the results are still too far from the level of protein 3D structure predictions. This exercise led us to consider several improvements, especially to better predict and enforce stacking interactions and non-canonical base pairs.
Asunto(s)
ARN , ARN/química , Conformación de Ácido Nucleico , Modelos Moleculares , Emparejamiento Base , Simulación por ComputadorRESUMEN
SUMMARY: Structure determination is a key step in the functional characterization of many non-coding RNA molecules. High-resolution RNA 3D structure determination efforts, however, are not keeping up with the pace of discovery of new non-coding RNA sequences. This increases the importance of computational approaches and low-resolution experimental data, such as from the small-angle X-ray scattering experiments. We present RNA Masonry, a computer program and a web service for a fully automated modeling of RNA 3D structures. It assemblies RNA fragments into geometrically plausible models that meet user-provided secondary structure constraints, restraints on tertiary contacts, and small-angle X-ray scattering data. We illustrate the method description with detailed benchmarks and its application to structural studies of viral RNAs with SAXS restraints. AVAILABILITY AND IMPLEMENTATION: The program web server is available at http://iimcb.genesilico.pl/rnamasonry. The source code is available at https://gitlab.com/gchojnowski/rnamasonry.
Asunto(s)
ARN no Traducido , ARN Viral , Dispersión del Ángulo Pequeño , Rayos X , Difracción de Rayos XRESUMEN
The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.
Asunto(s)
Biología Computacional , Proteínas , Conformación Proteica , Proteínas/química , Modelos Moleculares , Biología Computacional/métodos , Difracción de Rayos XRESUMEN
Understanding the 3D structure of RNA is key to understanding RNA function. RNA 3D structure is modular and can be seen as a composition of building blocks of various sizes called tertiary motifs. Currently, long-range motifs formed between distant loops and helical regions are largely less studied than the local motifs determined by the RNA secondary structure. We surveyed long-range tertiary interactions and motifs in a non-redundant set of non-coding RNA 3D structures. A new dataset of annotated LOng-RAnge RNA 3D modules (LORA) was built using an approach that does not rely on the automatic annotations of non-canonical interactions. An original algorithm, ARTEM, was developed for annotation-, sequence- and topology-independent superposition of two arbitrary RNA 3D modules. The proposed methods allowed us to identify and describe the most common long-range RNA tertiary motifs. Along with the prevalent canonical A-minor interactions, a large number of previously undescribed staple interactions were observed. The most frequent long-range motifs were found to belong to three main motif families: planar staples, tilted staples, and helical packing motifs.
Asunto(s)
Conformación de Ácido Nucleico , ARN no Traducido , Emparejamiento Base , Motivos de Nucleótidos , ARN no Traducido/químicaRESUMEN
Ribonucleic acids (RNAs) play crucial roles in living organisms and some of them, such as bacterial ribosomes and precursor messenger RNA, are targets of small molecule drugs, whereas others, e.g. bacterial riboswitches or viral RNA motifs are considered as potential therapeutic targets. Thus, the continuous discovery of new functional RNA increases the demand for developing compounds targeting them and for methods for analyzing RNA-small molecule interactions. We recently developed fingeRNAt-a software for detecting non-covalent bonds formed within complexes of nucleic acids with different types of ligands. The program detects several non-covalent interactions and encodes them as structural interaction fingerprint (SIFt). Here, we present the application of SIFts accompanied by machine learning methods for binding prediction of small molecules to RNA. We show that SIFt-based models outperform the classic, general-purpose scoring functions in virtual screening. We also employed Explainable Artificial Intelligence (XAI)-the SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations and other methods to help understand the decision-making process behind the predictive models. We conducted a case study in which we applied XAI on a predictive model of ligand binding to human immunodeficiency virus type 1 trans-activation response element RNA to distinguish between residues and interaction types important for binding. We also used XAI to indicate whether an interaction has a positive or negative effect on binding prediction and to quantify its impact. Our results obtained using all XAI methods were consistent with the literature data, demonstrating the utility and importance of XAI in medicinal chemistry and bioinformatics.
Asunto(s)
Inteligencia Artificial , ARN , Humanos , Ligandos , Aprendizaje Automático , Precursores del ARN , ARN MensajeroRESUMEN
The biologically relevant structures of proteins and nucleic acids and their complexes are dynamic. They include a combination of regions ranging from rigid structural segments to structural switches to regions that are almost always disordered, which interact with each other in various ways. Comparing conformational changes and variation in contacts between different conformational states is essential to understand the biological functions of proteins, nucleic acids, and their complexes. Here, we describe a new computational tool, 1D2DSimScore, for comparing contacts and contact interfaces in all kinds of macromolecules and macromolecular complexes, including proteins, nucleic acids, and other molecules. 1D2DSimScore can be used to compare structural features of macromolecular models between alternative structures obtained in a particular experiment or to score various predictions against a defined "ideal" reference structure. Comparisons at the level of contacts are particularly useful for flexible molecules, for which comparisons in 3D that require rigid-body superpositions are difficult, and in biological systems where the formation of specific inter-residue contacts is more relevant for the biological function than the maintenance of a specific global 3D structure. Similarity/dissimilarity scores calculated by 1D2DSimScore can be used to complement scores describing 3D structural similarity measures calculated by the existing tools.
Asunto(s)
Ácidos Nucleicos , Proteínas , Modelos Moleculares , Proteínas/químicaRESUMEN
We have been aware of the existence of knotted proteins for over 30 years-but it is hard to predict what is the most complicated knot that can be formed in proteins. Here, we show new and the most complex knotted topologies recorded to date-double trefoil knots (31 #31). We found five domain arrangements (architectures) that result in a doubly knotted structure in almost a thousand proteins. The double knot topology is found in knotted membrane proteins from the CaCA family, that function as ion transporters, in the group of carbonic anhydrases that catalyze the hydration of carbon dioxide, and in the proteins from the SPOUT superfamily that gathers 31 knotted methyltransferases with the active site-forming knot. For each family, we predict the presence of a double knot using AlphaFold and RoseTTaFold structure prediction. In the case of the TrmD-Tm1570 protein, which is a member of SPOUT superfamily, we show that it folds in vitro and is biologically active. Our results show that this protein forms a homodimeric structure and retains the ability to modify tRNA, which is the function of the single-domain TrmD protein. However, how the protein folds and is degraded remains unknown.
RESUMEN
RNA is a unique biomolecule that is involved in a variety of fundamental biological functions, all of which depend solely on its structure and dynamics. Since the experimental determination of crystal RNA structures is laborious, computational 3D structure prediction methods are experiencing an ongoing and thriving development. Such methods can lead to many models; thus, it is necessary to build comparisons and extract common structural motifs for further medical or biological studies. Here, we introduce a computational pipeline dedicated to reference-free high-throughput comparative analysis of 3D RNA structures. We show its application in the RNA-Puzzles challenge, in which five participating groups attempted to predict the three-dimensional structures of 5'- and 3'-untranslated regions (UTRs) of the SARS-CoV-2 genome. We report the results of this puzzle and discuss the structural motifs obtained from the analysis. All simulated models and tools incorporated into the pipeline are open to scientific and academic use.
Asunto(s)
COVID-19 , ARN , Regiones no Traducidas 3' , Humanos , Conformación de Ácido Nucleico , ARN/química , SARS-CoV-2RESUMEN
Computational methods play a pivotal role in drug discovery and are widely applied in virtual screening, structure optimization, and compound activity profiling. Over the last decades, almost all the attention in medicinal chemistry has been directed to protein-ligand binding, and computational tools have been created with this target in mind. With novel discoveries of functional RNAs and their possible applications, RNAs have gained considerable attention as potential drug targets. However, the availability of bioinformatics tools for nucleic acids is limited. Here, we introduce fingeRNAt-a software tool for detecting non-covalent interactions formed in complexes of nucleic acids with ligands. The program detects nine types of interactions: (i) hydrogen and (ii) halogen bonds, (iii) cation-anion, (iv) pi-cation, (v) pi-anion, (vi) pi-stacking, (vii) inorganic ion-mediated, (viii) water-mediated, and (ix) lipophilic interactions. However, the scope of detected interactions can be easily expanded using a simple plugin system. In addition, detected interactions can be visualized using the associated PyMOL plugin, which facilitates the analysis of medium-throughput molecular complexes. Interactions are also encoded and stored as a bioinformatics-friendly Structural Interaction Fingerprint (SIFt)-a binary string where the respective bit in the fingerprint is set to 1 if a particular interaction is present and to 0 otherwise. This output format, in turn, enables high-throughput analysis of interaction data using data analysis techniques. We present applications of fingeRNAt-generated interaction fingerprints for visual and computational analysis of RNA-ligand complexes, including analysis of interactions formed in experimentally determined RNA-small molecule ligand complexes deposited in the Protein Data Bank. We propose interaction fingerprint-based similarity as an alternative measure to RMSD to recapitulate complexes with similar interactions but different folding. We present an application of interaction fingerprints for the clustering of molecular complexes. This approach can be used to group ligands that form similar binding networks and thus have similar biological properties. The fingeRNAt software is freely available at https://github.com/n-szulc/fingeRNAt.