RESUMO
With increasing interest in RNA as a therapeutic and a potential target, the role of RNA structures has become more important. Even slight changes in nucleobases, such as modifications or protomeric and tautomeric states, can have a large impact on RNA structure and function, while local environments in turn affect protonation and tautomerization. In this work, the application of empirical tools for pKa and tautomer prediction for RNA modifications was elucidated and compared with ab initio quantum mechanics (QM) methods and expanded toward macromolecular RNA structures, where QM is no longer feasible. In this regard, the Protonate3D functionality within the molecular operating environment (MOE) was expanded for nucleobase protomer and tautomer predictions and applied to reported examples of altered protonation states depending on the local environment. Overall, observations of nonstandard protomers and tautomers were well reproduced, including structural C+G:C(A) and A+GG motifs, several mismatches, and protonation of adenosine or cytidine as the general acid in nucleolytic ribozymes. Special cases, such as cobalt hexamine-soaked complexes or the deprotonation of guanosine as the general base in nucleolytic ribozymes, proved to be challenging. The collected set of examples shall serve as a starting point for the development of further RNA protonation prediction tools, while the presented Protonate3D implementation already delivers reasonable protonation predictions for RNA and DNA macromolecules. For cases where higher accuracy is needed, like following catalytic pathways of ribozymes, incorporation of QM-based methods can build upon the Protonate3D-generated starting structures. Likewise, this protonation prediction can be used for structure-based RNA-ligand design approaches.
Assuntos
Conformação de Ácido Nucleico , Teoria Quântica , RNA , Ligantes , RNA/química , Modelos Moleculares , Prótons , Desenho de FármacosRESUMO
Molecular dynamics simulations play a pivotal role in elucidating the dynamic behaviors of RNA structures, offering a valuable complement to traditional methods such as nuclear magnetic resonance or X-ray. Despite this, the current precision of RNA force fields lags behind that of protein force fields. In this work, we systematically compared the performance of four RNA force fields (ff99bsc0χOL3, AMBERDES, ff99OL3_CMAP1, AMBERMaxEnt) across diverse RNA structures. Our findings highlight significant challenges in maintaining stability, particularly with regard to cross-strand and cross-loop hydrogen bonds. Furthermore, we observed the limitations in accurately describing the conformations of nonhelical structural motif, terminal nucleotides, and also base pairing and base stacking interactions by the tested RNA force fields. The identified deficiencies in existing RNA force fields provide valuable insights for subsequent force field development. Concurrently, these findings offer recommendations for selecting appropriate force fields in RNA simulations.
Assuntos
Simulação de Dinâmica Molecular , RNA , Conformação de Ácido Nucleico , RNA/química , Pareamento de Bases , Espectroscopia de Ressonância MagnéticaRESUMO
We describe the modeling method for RNA tertiary structures employed by team AIchemy_RNA2 in the 15th Critical Assessment of Structure Prediction (CASP15). The method consists of the following steps. Firstly, secondary structure information was derived from various manually-verified sources. With this information, the full length RNA was fragmented into structural modules. The structures of each module were predicted and then assembled into the full structure. To reduce the searching conformational space, an RNA structure was organized into an optimal base folding tree. And to further improve the sampling efficiency, the energy surface was smoothed at high temperatures during the Monte Carlo sampling to make it easier to move across the energy barrier. The statistical potential energy function BRiQ was employed during Monte Carlo energy optimization.
Assuntos
Algoritmos , RNA , RNA/química , Conformação Proteica , Método de Monte CarloRESUMO
RNA (ribonucleic acid) structure prediction finds many applications in health science and drug discovery due to its importance in several life regulatory processes. But despite significant advances in the close field of protein prediction, RNA 3D structure still poses a tremendous challenge to predict, especially for large sequences. In this regard, the approach unfolded by Rosetta FARFAR2 (Fragment Assembly of RNA with Full-Atom Refinement, version 2) has shown promising results, but the algorithm is non-deterministic by nature. In this paper, we develop P-FARFAR2: a parallel enhancement of FARFAR2 that increases its ability to assemble low-energy structures via multithreaded exploration of random configurations in a greedy manner. This strategy, appearing in the literature under the term "parallel mechanism", is made viable through two measures: first, the synchronization window is coarsened to several Monte Carlo cycles; second, all but one of the threads are differentiated as auxiliary and set to perform a weakened version of the problem. Following empirical analysis on a diverse range of RNA structures, we report achieving statistical significance in lowering the energy levels of ensuing samples. And consequently, despite the moderate-to-weak correlation between energy levels and prediction accuracy, this achievement happens to propagate to accuracy measurements.
Assuntos
RNA , Software , RNA/química , Algoritmos , Proteínas/química , Método de Monte CarloRESUMO
There have been many engineered Cas9 variants that were developed to minimize unintended cleavage of off-target DNAs, but detailed mechanism for the way they regulate the target specificity through DNA:RNA heteroduplexation remains poorly understood. We used single-molecule FRET assay to follow the dynamics of DNA:RNA heteroduplexation for various engineered Cas9 variants with respect to on-target and off-target DNAs. Just like wild-type Cas9, these engineered Cas9 variants exhibit a strong correlation between their conformational structure and nuclease activity. Compared with wild-type Cas9, the fraction of the cleavage-competent state dropped more rapidly with increasing base-pair mismatch, which gives rise to their enhanced target specificity. We proposed a reaction model to quantitatively analyze the degree of off-target discrimination during the successive process of R-loop expansion. We found that the critical specificity enhancement step is activated during DNA:RNA heteroduplexation for evoCas9 and HypaCas9, while it occurs in the post-heteroduplexation stage for Cas9-HF1, eCas9, and Sniper-Cas9. This study sheds new light on the conformational dynamics behind the target specificity of Cas9, which will help strengthen its rational designing principles in the future.
Assuntos
Proteína 9 Associada à CRISPR/genética , DNA/genética , RNA/genética , Imagem Individual de Molécula/métodos , Pareamento de Bases , Proteína 9 Associada à CRISPR/química , Proteína 9 Associada à CRISPR/metabolismo , Clonagem Molecular , DNA/química , DNA/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Transferência Ressonante de Energia de Fluorescência , Expressão Gênica , Vetores Genéticos/química , Vetores Genéticos/metabolismo , Células HEK293 , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Humanos , Modelos Moleculares , Mutação , Hibridização de Ácido Nucleico , Conformação Proteica , Engenharia de Proteínas/métodos , RNA/química , RNA/metabolismo , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Especificidade por Substrato , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
RNA molecules participate in many important biological processes, and they need to fold into well-defined secondary and tertiary structures to realize their functions. Like the well-known protein folding problem, there is also an RNA folding problem. The folding problem includes two aspects: structure prediction and folding mechanism. Although the former has been widely studied, the latter is still not well understood. Here we present a deep reinforcement learning algorithms 2dRNA-Fold to study the fastest folding paths of RNA secondary structure. 2dRNA-Fold uses a neural network combined with Monte Carlo tree search to select residue pairing step by step according to a given RNA sequence until the final secondary structure is formed. We apply 2dRNA-Fold to several short RNA molecules and one longer RNA 1Y26 and find that their fastest folding paths show some interesting features. 2dRNA-Fold is further trained using a set of RNA molecules from the dataset bpRNA and is used to predict RNA secondary structure. Since in 2dRNA-Fold the scoring to determine next step is based on possible base pairings, the learned or predicted fastest folding path may not agree with the actual folding paths determined by free energy according to physical laws.
Assuntos
Aprendizado de Máquina , Modelos Moleculares , Dobramento de RNA , RNA , Software , RNA/química , RNA/genéticaRESUMO
Advancing RNA structural probing techniques with next-generation sequencing has generated demands for complementary computational tools to robustly extract RNA structural information amidst sampling noise and variability. We present diffBUM-HMM, a noise-aware model that enables accurate detection of RNA flexibility and conformational changes from high-throughput RNA structure-probing data. diffBUM-HMM is widely compatible, accounting for sampling variation and sequence coverage biases, and displays higher sensitivity than existing methods while robust against false positives. Our analyses of datasets generated with a variety of RNA probing chemistries demonstrate the value of diffBUM-HMM for quantitatively detecting RNA structural changes and RNA-binding protein binding sites.
Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Cadeias de Markov , Modelos Estatísticos , RNA/química , RNA/genética , Sequência de Bases , Sítios de Ligação , Bases de Dados Genéticas , Modelos Teóricos , Mutação/genética , Nucleotídeos/genética , Ligação Proteica , Precursores de RNA/genética , RNA Longo não Codificante/genética , Ribossomos/metabolismoRESUMO
Some of the amazing contributions brought to the scientific community by the Protein Data Bank (PDB) are described. The focus is on nucleic acid structures with a bias toward RNA. The evolution and key roles in science of the PDB and other structural databases for nucleic acids illustrate how small initial ideas can become huge and indispensable resources with the unflinching willingness of scientists to cooperate globally. The progress in the understanding of the molecular interactions driving RNA architectures followed the rapid increase in RNA structures in the PDB. That increase was consecutive to improvements in chemical synthesis and purification of RNA molecules, as well as in biophysical methods for structure determination and computer technology. The RNA modeling efforts from the early beginnings are also described together with their links to the state of structural knowledge and technological development. Structures of RNA and of its assemblies are physical objects, which, together with genomic data, allow us to integrate present-day biological functions and the historical evolution in all living species on earth.
Assuntos
Bases de Dados de Proteínas , RNA/química , Biologia Computacional/métodosRESUMO
Motivated by the fine compositional control observed in membraneless droplet organelles in cells, we investigate how a sharp binding-unbinding transition can occur between multivalent client molecules and receptors embedded in a porous three-dimensional structure. In contrast to similar superselective binding previously observed at surfaces, we have identified that a key effect in a three-dimensional environment is that the presence of inert crowding agents can significantly enhance or even introduce superselectivity. In essence, molecular crowding initially suppresses binding via an entropic penalty, but the clients can then more easily form many bonds simultaneously. We demonstrate the robustness of the superselective behavior with respect to client valency, linker length, and binding interactions in Monte Carlo simulations of an archetypal lattice polymer model.
Assuntos
Modelos Biológicos , Proteínas/química , RNA/química , Método de Monte Carlo , Organelas/química , Organelas/metabolismo , Ligação Proteica , Proteínas/metabolismo , RNA/metabolismo , Receptores de Superfície Celular/química , Receptores de Superfície Celular/metabolismoRESUMO
We demonstrate a loop-mediated isothermal amplification (LAMP) method to detect and amplify SARS-CoV-2 genetic sequences using a set of in-house designed initiators that target regions encoding the N protein. We were able to detect and amplify SARS-CoV-2 nucleic acids in the range of 62 to 2 × 105 DNA copies by this straightforward method. Using synthetic SARS-CoV-2 samples and RNA extracts from patients, we demonstrate that colorimetric LAMP is a quantitative method comparable in diagnostic performance to RT-qPCR (i.e., sensitivity of 92.85% and specificity of 81.25% in a set of 44 RNA extracts from patients analyzed in a hospital setting).
Assuntos
Teste de Ácido Nucleico para COVID-19/métodos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Amplificação de Ácido Nucleico/métodos , RNA/análise , SARS-CoV-2/química , Carga Viral/métodos , COVID-19/diagnóstico , Colorimetria/métodos , Proteínas do Nucleocapsídeo de Coronavírus , DNA/análise , DNA/química , Corantes Fluorescentes/química , Humanos , Substâncias Intercalantes/química , Fenolsulfonaftaleína/química , Fosfoproteínas , RNA/químicaRESUMO
Nucleic acid detection by electrophoresis is still a quick and accessible technique for many diagnosis methods, primarily at research laboratories or at the point of care units. Standard protocols detect DNA/RNA molecules through specific bound chemical dyes using a UV-transilluminator or UV-photo documentation system. However, the acquisition costs and availability of these devices, mainly the ones with photography and internet connection capabilities, can be prohibitive, especially in developing countries public health units. Also, ultraviolet radiation is a common additional risk factor to professionals that use electrophoresis-based nucleic acid detection. With that in mind, this work describes the development of a low-cost DNA/RNA detection smart system capable of obtaining qualitative and semi-quantitative data from gel analysis. The proposed device explores the visible light absorption range of commonly used DNA/RNA dyes using readily available parts, and simple manufacturing processes, such as light-emitting diodes (LEDs) and 3D impression. By applying IoT techniques, our system covers a wide range of color spectrum in order to detect bands from various commercially used dyes, using Bluetooth communication and a smartphone for hardware control, image capturing, and sharing. The project also enables process scalability and has low manufacturing and maintenance costs. The use of LEDs at the visible spectrum can achieve very reproducible images, providing a high potential for rapid and point-of-care diagnostics as well as applications in several fields such as healthcare, agriculture, and aquaculture.
Assuntos
DNA/isolamento & purificação , Sistemas Automatizados de Assistência Junto ao Leito/economia , RNA/isolamento & purificação , Custos e Análise de Custo , DNA/química , Eletroforese em Gel de Ágar/economia , Eletroforese em Gel de Ágar/instrumentação , Desenho de Equipamento , Corantes Fluorescentes/química , Luz , RNA/química , Smartphone , SoftwareRESUMO
With close to 30 sequence-based predictors of RNA-binding residues (RBRs), this comparative survey aims to help with understanding and selection of the appropriate tools. We discuss past reviews on this topic, survey a comprehensive collection of predictors, and comparatively assess six representative methods. We provide a novel and well-designed benchmark dataset and we are the first to report and compare protein-level and datasets-level results, and to contextualize performance to specific types of RNAs. The methods considered here are well-cited and rely on machine learning algorithms on occasion combined with homology-based prediction. Empirical tests reveal that they provide relatively accurate predictions. Virtually all methods perform well for the proteins that interact with rRNAs, some generate accurate predictions for mRNAs, snRNA, SRP and IRES, while proteins that bind tRNAs are predicted poorly. Moreover, except for DRNApred, they confuse DNA and RNA-binding residues. None of the six methods consistently outperforms the others when tested on individual proteins. This variable and complementary protein-level performance suggests that users should not rely on applying just the single best dataset-level predictor. We recommend that future work should focus on the development of approaches that facilitate protein-level selection of accurate predictors and the consensus-based prediction of RBRs.
Assuntos
Proteínas de Ligação a RNA , RNA , Análise de Sequência de Proteína , DNA/química , DNA/genética , DNA/metabolismo , Ligação Proteica , RNA/química , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismoRESUMO
BACKGROUND: As the barriers to incorporating RNA sequencing (RNA-Seq) into biomedical studies continue to decrease, the complexity and size of RNA-Seq experiments are rapidly growing. Paired, longitudinal, and other correlated designs are becoming commonplace, and these studies offer immense potential for understanding how transcriptional changes within an individual over time differ depending on treatment or environmental conditions. While several methods have been proposed for dealing with repeated measures within RNA-Seq analyses, they are either restricted to handling only paired measurements, can only test for differences between two groups, and/or have issues with maintaining nominal false positive and false discovery rates. In this work, we propose a Bayesian hierarchical negative binomial generalized linear mixed model framework that can flexibly model RNA-Seq counts from studies with arbitrarily many repeated observations, can include covariates, and also maintains nominal false positive and false discovery rates in its posterior inference. RESULTS: In simulation studies, we showed that our proposed method (MCMSeq) best combines high statistical power (i.e. sensitivity or recall) with maintenance of nominal false positive and false discovery rates compared the other available strategies, especially at the smaller sample sizes investigated. This behavior was then replicated in an application to real RNA-Seq data where MCMSeq was able to find previously reported genes associated with tuberculosis infection in a cohort with longitudinal measurements. CONCLUSIONS: Failing to account for repeated measurements when analyzing RNA-Seq experiments can result in significantly inflated false positive and false discovery rates. Of the methods we investigated, whether they model RNA-Seq counts directly or worked on transformed values, the Bayesian hierarchical model implemented in the mcmseq R package (available at https://github.com/stop-pre16/mcmseq ) best combined sensitivity and nominal error rate control.
Assuntos
RNA/química , Análise de Sequência de RNA/métodos , Interface Usuário-Computador , Teorema de Bayes , Humanos , Método de Monte Carlo , RNA/genética , RNA/metabolismo , Tuberculose/genética , Tuberculose/patologiaRESUMO
The molecules of the ribonucleic acid (RNA) perform a variety of vital roles in all living cells. Their biological function depends on their structure and dynamics, both of which are difficult to experimentally determine but can be theoretically inferred based on the RNA sequence. SimRNA is one of the computational methods for molecular simulations of RNA 3D structure formation. The method is based on a simplified (coarse-grained) representation of nucleotide chains, a statistically derived model of interactions (statistical potential), and the Monte Carlo method as a conformational sampling scheme.The current version of SimRNA (3.22) is able to predict basic topologies of RNA molecules with sizes up to about 50-70 nucleotides, based on their sequences only, and larger molecules if supplied with appropriate distance restraints. The user can specify various types of restraints, including secondary structure, pairwise atom-atom distances, and positions of atoms. SimRNA can be also used for studying systems composed of several chains of RNA. SimRNA is a folding simulations method, thus it allows for examining folding pathways, getting an approximate view of the energy landscapes.
Assuntos
Simulação de Dinâmica Molecular , Dobramento de RNA , RNA/química , Método de Monte CarloRESUMO
BACKGROUND: Mammalian hair play an important role in mammals' ability to adapt to changing climatic environments. The seasonal circulation of yak hair helps them adapt to high altitude but the regulation mechanisms of the proliferation and differentiation of hair follicles (HFs) cells during development are still unknown. Here, using time series data for transcriptome and hormone contents, we systematically analyzed the mechanism regulating the periodic expression of hair development in the yak and reviewed how different combinations of genetic pathways regulate HFs development and cycling. RESULTS: This study used high-throughput RNA sequencing to provide a detailed description of global gene expression in 15 samples from five developmental time points during the yak hair cycle. According to clustering analysis, we found that these 15 samples could be significantly grouped into three phases, which represent different developmental periods in the hair cycle. A total of 2316 genes were identified in these three consecutive developmental periods and their expression patterns could be divided into 9 clusters. In the anagen, genes involved in activating hair follicle growth are highly expressed, such as the WNT pathway, FGF pathway, and some genes related to hair follicle differentiation. In the catagen, genes that inhibit differentiation and promote hair follicle cell apoptosis are highly expressed, such as BMP4, and Wise. In the telogen, genes that inhibit hair follicle activity are highly expressed, such as DKK1 and BMP1. Through co-expression analysis, we revealed a number of modular hub genes highly associated with hormones, such as SLF2, BOP1 and DPP8. They may play unique roles in hormonal regulation of events associated with the hair cycle. CONCLUSIONS: Our results revealed the expression pattern and molecular mechanisms of the seasonal hair cycle in the yak. The findings will be valuable in further understanding the alpine adaptation mechanism in the yak, which is important in order to make full use of yak hair resources and promote the economic development of pastoral plateau areas.
Assuntos
Cabelo/metabolismo , Transcriptoma , Animais , Proteína Morfogenética Óssea 1/genética , Proteína Morfogenética Óssea 1/metabolismo , Bovinos , Análise por Conglomerados , Redes Reguladoras de Genes/genética , Folículo Piloso/crescimento & desenvolvimento , Folículo Piloso/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Peptídeos e Proteínas de Sinalização Intercelular/genética , Peptídeos e Proteínas de Sinalização Intercelular/metabolismo , Análise de Componente Principal , RNA/química , RNA/metabolismo , Estações do Ano , Análise de Sequência de RNA , Transdução de Sinais/genéticaRESUMO
RNA-protein complexes (RNPs) are essential components in a variety of cellular processes, and oftentimes exhibit complex structures and show mechanisms that are highly dynamic in conformation and structure. However, biochemical and structural biology approaches are mostly not able to fully elucidate the structurally and especially conformationally dynamic and heterogeneous nature of these RNPs, to which end single molecule Förster resonance energy transfer (smFRET) spectroscopy can be harnessed to fill this gap. Here we summarize the advantages of strategic smFRET studies to investigate RNP dynamics, complemented by structural and biochemical data. Focusing on recent smFRET studies of three essential biological systems, we demonstrate that investigation of RNPs on a single molecule level can answer important functional questions that remained elusive with structural or biochemical approaches alone: The complex structural rearrangements throughout the splicing cycle, unwinding dynamics of the G-quadruplex (G4) helicase RHAU, and aspects in telomere maintenance regulation and synthesis.
Assuntos
Transferência Ressonante de Energia de Fluorescência , Quadruplex G , RNA/química , Imagem Individual de Molécula , Animais , Bovinos , Análise por Conglomerados , Cristalografia por Raios X , Humanos , Cadeias de Markov , Conformação de Ácido Nucleico , Ligação Proteica , Desnaturação Proteica , Dobramento de Proteína , Estrutura Secundária de Proteína , Splicing de RNA , Ribonucleoproteínas , Spliceossomos/química , Telomerase/química , Telômero/química , Telômero/ultraestruturaRESUMO
For proteome analyses, the tissue samples are mostly preserved either snap frozen or formalin-fixed, paraffin-embedded form. Use of RNAlater-a non-toxic solution primarily used to stabilize the RNA content of samples-in tissue preservation for proteome analysis recently described equally reliable with snap-frozen preservation in human tissues. Even though RNALater storage has great potential in the preservation of Peripheral Blood Mononuclear Cells (PBMC), its impact on the results of proteome analysis is poorly described at qualitative and quantitative measures. The present study investigated protein profiles of RNAlater preserved and fresh PBMCs using three extraction buffers viz. Triton X-100, RIPA and SDS. Proteins are separated in SDS-PAGE and quantified using densitometry. On an average 19.3 bands from fresh and 15.6 bands from RNAlater storage cells were obtained with a molecular weight ranging from 25 to > 250 kDa. RNAlater storage generated a fewer number and lesser quantity of low molecular weight proteins while yielded a similar or high quantity of high molecular weight protein fractions. The principal component analysis showed that Triton X-100 is inferior as compared to SDS and RIPA with respect to their protein bands and quantity yielded. While RNAlater is effective in preserving PBMC for proteome analysis, our findings warrant caution in its use in proteomics experiments especially if the target is low molecular weight proteins.
Assuntos
Leucócitos Mononucleares/química , Proteoma/isolamento & purificação , RNA/química , Preservação de Tecido/métodos , Animais , Bovinos , Misturas Complexas/química , Eletroforese em Gel de Poliacrilamida , Microextração em Fase Líquida/métodos , Peso Molecular , Octoxinol/química , Conservantes Farmacêuticos/química , Cultura Primária de Células , Análise de Componente Principal , Proteoma/química , Proteoma/classificação , RNA/isolamento & purificação , Dodecilsulfato de Sódio/químicaRESUMO
Colocalization single-molecule spectroscopy (CoSMoS) allows studying RNA-protein complexes in the full complexity of their cellular environment at single-molecule resolution. Conventionally, the interaction between a single RNA species and multiple proteins is monitored in real time. However, comparing interactions of the same proteins with different RNA species in the same cell extract promises unique insights into RNA biology. Here, we describe an approach to monitor multiple RNA species simultaneously to enable direct comparison. This approach represents a technological development to avoid conventional inter-experiment comparisons.
Assuntos
Proteínas de Ligação a RNA/metabolismo , RNA/química , Imagem Individual de Molécula/métodos , Extratos Celulares/química , Corantes Fluorescentes/química , Microscopia de Fluorescência , RNA/metabolismo , Proteínas de Ligação a RNA/química , Coloração e RotulagemRESUMO
We have previously described (Geffroy et al. Methods Mol Biol 1665:25-40, 2018) how to unfold (or fold) a single RNA molecule under force using a dual-beam optical trap setup. In this chapter, we complementarily describe how to analyze the corresponding data and how to interpret it in terms of RNA three-dimensional structure. As with all single-molecule methods, single RNA molecule force data often exhibit several discrete states where state-to-state transitions are blurred in a noisy signal. In order to cope with this limitation, we have implemented a novel strategy to analyze the data, which uses a hidden Markov modeling procedure. A representative example of such an analysis is presented.
Assuntos
RNA/química , Imagem Individual de Molécula/métodos , Cadeias de Markov , Modelos Moleculares , Conformação de Ácido Nucleico , Pinças Ópticas , Dobramento de RNA , SoftwareRESUMO
Despite the large number of noncoding RNAs in human genome and their roles in many diseases include cancer, we know very little about them due to lack of structural clues. The centerpiece of the structural clues is the full RNA base-pairing structure of secondary and tertiary contacts that can be precisely obtained only from costly and time-consuming 3D structure determination. Here, we performed deep mutational scanning of self-cleaving CPEB3 ribozyme by error-prone PCR and showed that a library of <5 × 104 single-to-triple mutants is sufficient to infer 25 of 26 base pairs including non-nested, nonhelical, and noncanonical base pairs with both sensitivity and precision at 96%. Such accurate inference was further confirmed by a twister ribozyme at 100% precision with only noncanonical base pairs as false negatives. The performance was resulted from analyzing covariation-induced deviation of activity by utilizing both functional and nonfunctional variants for unsupervised classification, followed by Monte Carlo (MC) simulated annealing with mutation-derived scores. Highly accurate inference can also be obtained by combining MC with evolution/direct coupling analysis, R-scape or epistasis analysis. The results highlight the usefulness of deep mutational scanning for high-accuracy structural inference of self-cleaving ribozymes with implications for other structured RNAs that permit high-throughput functional selections.