Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 54
Filter
1.
Bioinformatics ; 39(4)2023 04 03.
Article in English | MEDLINE | ID: mdl-37079725

ABSTRACT

The DynaSig-ML ('Dynamical Signatures-Machine Learning') Python package allows the efficient, user-friendly exploration of 3D dynamics-function relationships in biomolecules, using datasets of experimental measures from large numbers of sequence variants. It does so by predicting 3D structural dynamics for every variant using the Elastic Network Contact Model (ENCoM), a sequence-sensitive coarse-grained normal mode analysis model. Dynamical Signatures represent the fluctuation at every position in the biomolecule and are used as features fed into machine learning models of the user's choice. Once trained, these models can be used to predict experimental outcomes for theoretical variants. The whole pipeline can be run with just a few lines of Python and modest computational resources. The compute-intensive steps are easily parallelized in the case of either large biomolecules or vast amounts of sequence variants. As an example application, we use the DynaSig-ML package to predict the maturation efficiency of human microRNA miR-125a variants from high-throughput enzymatic assays. AVAILABILITY AND IMPLEMENTATION: DynaSig-ML is open-source software available at https://github.com/gregorpatof/dynasigml_package.


Subject(s)
Machine Learning , Software , Humans
2.
PLoS Comput Biol ; 18(12): e1010777, 2022 12.
Article in English | MEDLINE | ID: mdl-36516216

ABSTRACT

The Elastic Network Contact Model (ENCoM) is a coarse-grained normal mode analysis (NMA) model unique in its all-atom sensitivity to the sequence of the studied macromolecule and thus to the effect of mutations. We adapted ENCoM to simulate the dynamics of ribonucleic acid (RNA) molecules, benchmarked its performance against other popular NMA models and used it to study the 3D structural dynamics of human microRNA miR-125a, leveraging high-throughput experimental maturation efficiency data of over 26 000 sequence variants. We also introduce a novel way of using dynamical information from NMA to train multivariate linear regression models, with the purpose of highlighting the most salient contributions of dynamics to function. ENCoM has a similar performance profile on RNA than on proteins when compared to the Anisotropic Network Model (ANM), the most widely used coarse-grained NMA model; it has the advantage on predicting large-scale motions while ANM performs better on B-factors prediction. A stringent benchmark from the miR-125a maturation dataset, in which the training set contains no sequence information in common with the testing set, reveals that ENCoM is the only tested model able to capture signal beyond the sequence. This ability translates to better predictive power on a second benchmark in which sequence features are shared between the train and test sets. When training the linear regression model using all available data, the dynamical features identified as necessary for miR-125a maturation point to known patterns but also offer new insights into the biogenesis of microRNAs. Our novel approach combining NMA with multivariate linear regression is generalizable to any macromolecule for which relatively high-throughput mutational data is available.


Subject(s)
MicroRNAs , Humans , MicroRNAs/chemistry , Motion , Protein Conformation , Proteins/chemistry , Linear Models
3.
RNA ; 26(8): 982-995, 2020 08.
Article in English | MEDLINE | ID: mdl-32371455

ABSTRACT

RNA-Puzzles is a collective endeavor dedicated to the advancement and improvement of RNA 3D structure prediction. With agreement from crystallographers, the RNA structures are predicted by various groups before the publication of the crystal structures. We now report the prediction of 3D structures for six RNA sequences: four nucleolytic ribozymes and two riboswitches. Systematic protocols for comparing models and crystal structures are described and analyzed. In these six puzzles, we discuss (i) the comparison between the automated web servers and human experts; (ii) the prediction of coaxial stacking; (iii) the prediction of structural details and ligand binding; (iv) the development of novel prediction methods; and (v) the potential improvements to be made. We show that correct prediction of coaxial stacking and tertiary contacts is essential for the prediction of RNA architecture, while ligand binding modes can only be predicted with low resolution and simultaneous prediction of RNA structure with accurate ligand binding still remains out of reach. All the predicted models are available for the future development of force field parameters and the improvement of comparison and assessment tools.


Subject(s)
Aptamers, Nucleotide/chemistry , RNA, Catalytic/chemistry , RNA/chemistry , Base Sequence , Ligands , Nucleic Acid Conformation , Riboswitch/genetics
4.
PLoS Comput Biol ; 17(10): e1009482, 2021 10.
Article in English | MEDLINE | ID: mdl-34679099

ABSTRACT

MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and neoplastic cells by CD8 T cells. However, accurately predicting the MAP repertoire remains difficult, because only a fraction of the transcriptome generates MAPs. In this study, we investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons (MCCs), while excluding the MCC per se. CAMAP predictions were significantly more accurate when using original codon sequences than shuffled codon sequences which reflect amino acid usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, transcript expression level and CAMAP scores was particularly useful to increase MAP prediction accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon arrangement in the regulation of MAP presentation and support integration of both translational and post-translational events in predictive algorithms to ameliorate modeling of the immunopeptidome.


Subject(s)
Codon , Computational Biology/methods , Histocompatibility Antigens Class I , Neural Networks, Computer , Algorithms , Amino Acid Sequence , Codon/chemistry , Codon/genetics , Codon/metabolism , Histocompatibility Antigens Class I/chemistry , Histocompatibility Antigens Class I/genetics , Histocompatibility Antigens Class I/metabolism , Humans
5.
Nucleic Acids Res ; 46(16): 8181-8196, 2018 09 19.
Article in English | MEDLINE | ID: mdl-30239883

ABSTRACT

MicroRNAs (miRNAs) are ribonucleic acids (RNAs) of ∼21 nucleotides that interfere with the translation of messenger RNAs (mRNAs) and play significant roles in development and diseases. In bilaterian animals, the specificity of miRNA targeting is determined by sequence complementarity involving the seed. However, the role of the remaining nucleotides (non-seed) is only vaguely defined, impacting negatively on our ability to efficiently use miRNAs exogenously to control gene expression. Here, using reporter assays, we deciphered the role of the base pairs formed between the non-seed region and target mRNA. We used molecular modeling to reveal that this mechanism corresponds to the formation of base pairs mediated by ordered motions of the miRNA-induced silencing complex. Subsequently, we developed an algorithm based on this distinctive recognition to predict from sequence the levels of mRNA downregulation with high accuracy (r2 > 0.5, P-value < 10-12). Overall, our discovery improves the design of miRNA-guide sequences used to simultaneously downregulate the expression of multiple predetermined target genes.


Subject(s)
Argonaute Proteins/genetics , MicroRNAs/genetics , Nucleotides/genetics , RNA, Messenger/genetics , Gene Expression Regulation/genetics , Gene Silencing , Humans , Models, Molecular , Nucleotides/chemistry , Protein Conformation
6.
RNA ; 23(5): 655-672, 2017 05.
Article in English | MEDLINE | ID: mdl-28138060

ABSTRACT

RNA-Puzzles is a collective experiment in blind 3D RNA structure prediction. We report here a third round of RNA-Puzzles. Five puzzles, 4, 8, 12, 13, 14, all structures of riboswitch aptamers and puzzle 7, a ribozyme structure, are included in this round of the experiment. The riboswitch structures include biological binding sites for small molecules (S-adenosyl methionine, cyclic diadenosine monophosphate, 5-amino 4-imidazole carboxamide riboside 5'-triphosphate, glutamine) and proteins (YbxF), and one set describes large conformational changes between ligand-free and ligand-bound states. The Varkud satellite ribozyme is the most recently solved structure of a known large ribozyme. All puzzles have established biological functions and require structural understanding to appreciate their molecular mechanisms. Through the use of fast-track experimental data, including multidimensional chemical mapping, and accurate prediction of RNA secondary structure, a large portion of the contacts in 3D have been predicted correctly leading to similar topologies for the top ranking predictions. Template-based and homology-derived predictions could predict structures to particularly high accuracies. However, achieving biological insights from de novo prediction of RNA 3D structures still depends on the size and complexity of the RNA. Blind computational predictions of RNA structures already appear to provide useful structural information in many cases. Similar to the previous RNA-Puzzles Round II experiment, the prediction of non-Watson-Crick interactions and the observed high atomic clash scores reveal a notable need for an algorithm of improvement. All prediction models and assessment results are available at http://ahsoka.u-strasbg.fr/rnapuzzles/.


Subject(s)
RNA, Catalytic/chemistry , Riboswitch , Aminoimidazole Carboxamide/chemistry , Aminoimidazole Carboxamide/metabolism , Aptamers, Nucleotide/chemistry , Aptamers, Nucleotide/metabolism , Dinucleoside Phosphates/metabolism , Endoribonucleases/chemistry , Endoribonucleases/metabolism , Glutamine/chemistry , Glutamine/metabolism , Ligands , Models, Molecular , Nucleic Acid Conformation , RNA, Catalytic/metabolism , Ribonucleotides/chemistry , Ribonucleotides/metabolism , S-Adenosylmethionine/chemistry , S-Adenosylmethionine/metabolism
7.
Nature ; 493(7432): 371-7, 2013 Jan 17.
Article in English | MEDLINE | ID: mdl-23172145

ABSTRACT

Hyperconnectivity of neuronal circuits due to increased synaptic protein synthesis is thought to cause autism spectrum disorders (ASDs). The mammalian target of rapamycin (mTOR) is strongly implicated in ASDs by means of upstream signalling; however, downstream regulatory mechanisms are ill-defined. Here we show that knockout of the eukaryotic translation initiation factor 4E-binding protein 2 (4E-BP2)-an eIF4E repressor downstream of mTOR-or eIF4E overexpression leads to increased translation of neuroligins, which are postsynaptic proteins that are causally linked to ASDs. Mice that have the gene encoding 4E-BP2 (Eif4ebp2) knocked out exhibit an increased ratio of excitatory to inhibitory synaptic inputs and autistic-like behaviours (that is, social interaction deficits, altered communication and repetitive/stereotyped behaviours). Pharmacological inhibition of eIF4E activity or normalization of neuroligin 1, but not neuroligin 2, protein levels restores the normal excitation/inhibition ratio and rectifies the social behaviour deficits. Thus, translational control by eIF4E regulates the synthesis of neuroligins, maintaining the excitation-to-inhibition balance, and its dysregulation engenders ASD-like phenotypes.


Subject(s)
Autistic Disorder/genetics , Autistic Disorder/physiopathology , Eukaryotic Initiation Factor-4E/metabolism , Protein Biosynthesis , Animals , Cell Adhesion Molecules, Neuronal/genetics , Cell Adhesion Molecules, Neuronal/metabolism , Eukaryotic Initiation Factor-4E/antagonists & inhibitors , Eukaryotic Initiation Factors/deficiency , Eukaryotic Initiation Factors/genetics , Eukaryotic Initiation Factors/metabolism , Male , Mice , Mice, Knockout , Phenotype , Synapses/metabolism
8.
Nucleic Acids Res ; 45(W1): W440-W444, 2017 07 03.
Article in English | MEDLINE | ID: mdl-28525607

ABSTRACT

RNA structures are hierarchically organized. The secondary structure is articulated around sophisticated local three-dimensional (3D) motifs shaping the full 3D architecture of the molecule. Recent contributions have identified and organized recurrent local 3D motifs, but applications of this knowledge for predictive purposes is still in its infancy. We recently developed a computational framework, named RNA-MoIP, to reconcile RNA secondary structure and local 3D motif information available in databases. In this paper, we introduce a web service using our software for predicting RNA hybrid 2D-3D structures from sequence data only. Optionally, it can be used for (i) local 3D motif prediction or (ii) the refinement of user-defined secondary structures. Importantly, our web server automatically generates a script for the MC-Sym software, which can be immediately used to quickly predict all-atom RNA 3D models. The web server is available at http://rnamoip.cs.mcgill.ca.


Subject(s)
Nucleotide Motifs , RNA/chemistry , Software , Base Sequence , Internet , Models, Molecular , Nucleic Acid Conformation
9.
Nucleic Acids Res ; 44(20): 9956-9964, 2016 Nov 16.
Article in English | MEDLINE | ID: mdl-27651454

ABSTRACT

MicroRNAs (miRNAs) are crucial gene expression regulators and first-order suspects in the development and progression of many diseases. Comparative analysis of cancer cell expression data highlights many deregulated miRNAs. Low expression of miR-125a was related to poor breast cancer prognosis. Interestingly, a single nucleotide polymorphism (SNP) in miR-125a was located within a minor allele expressed by breast cancer patients. The SNP is not predicted to affect the ground state structure of the primary transcript or precursor, but neither the precursor nor mature product is detected by RT-qPCR. How this SNP modulates the maturation of miR-125a is poorly understood. Here, building upon a model of RNA dynamics derived from nuclear magnetic resonance studies, we developed a quantitative model enabling the visualization and comparison of networks of transient structures. We observed a high correlation between the distances between networks of variants with that of their respective wild types and their relative degrees of maturation to the latter, suggesting an important role of transient structures in miRNA homeostasis. We classified the human miRNAs according to pairwise distances between their networks of transient structures.


Subject(s)
MicroRNAs/chemistry , MicroRNAs/genetics , Nucleic Acid Conformation , RNA Processing, Post-Transcriptional , Transcription, Genetic , Base Pairing , Cell Line , Humans , Magnetic Resonance Spectroscopy , MicroRNAs/metabolism , Polymorphism, Single Nucleotide , Structure-Activity Relationship
10.
RNA ; 21(6): 1066-84, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25883046

ABSTRACT

This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNA structure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited or no homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed to adenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta, DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the three subsequently released crystallographic structures, solved at diffraction resolutions of 2.5-3.2 Å, were carried out automatically using various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo prediction abilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies to the native structures, which suggests that computational methods for RNA structure prediction can already provide useful structural information for biological problems. However, the prediction accuracy for non-Watson-Crick interactions, key to proper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of the continuing bottlenecks in RNA structure prediction. All submitted models are available for download at http://ahsoka.u-strasbg.fr/rnapuzzles/.


Subject(s)
Computational Biology/methods , RNA/chemistry , Crystallography, X-Ray , Models, Molecular , Nucleic Acid Conformation , RNA, Messenger/chemistry , RNA, Transfer/chemistry , Software
11.
Nucleic Acids Res ; 43(14): 6730-8, 2015 Aug 18.
Article in English | MEDLINE | ID: mdl-26089388

ABSTRACT

In eucaryotes, gene expression is regulated by microRNAs (miRNAs) which bind to messenger RNAs (mRNAs) and interfere with their translation into proteins, either by promoting their degradation or inducing their repression. We study the effect of miRNA interference on each gene using experimental methods, such as microarrays and RNA-seq at the mRNA level, or luciferase reporter assays and variations of SILAC at the protein level. Alternatively, computational predictions would provide clear benefits. However, no algorithm toward this task has ever been proposed. Here, we introduce a new algorithm to predict genome-wide expression data from initial transcriptome abundance. The algorithm simulates the miRNA and mRNA hybridization competition that occurs in given cellular conditions, and derives the whole set of miRNA::mRNA interactions at equilibrium (microtargetome). Interestingly, solving the competition improves the accuracy of miRNA target predictions. Furthermore, this model implements a previously reported and fundamental property of the microtargetome: the binding between a miRNA and a mRNA depends on their sequence complementarity, but also on the abundance of all RNAs expressed in the cell, i.e. the stoichiometry of all the miRNA sites and all the miRNAs given their respective abundance. This model generalizes the miRNA-induced synchronistic silencing previously observed, and described as sponges and competitive endogenous RNAs.


Subject(s)
Algorithms , Gene Silencing , MicroRNAs/metabolism , Cell Line , Humans , MicroRNAs/chemistry , RNA, Messenger/chemistry , RNA, Messenger/metabolism , Transcriptome
12.
Nucleic Acids Res ; 42(17): 11261-71, 2014.
Article in English | MEDLINE | ID: mdl-25200082

ABSTRACT

Anti-infection drugs target vital functions of infectious agents, including their ribosome and other essential non-coding RNAs. One of the reasons infectious agents become resistant to drugs is due to mutations that eliminate drug-binding affinity while maintaining vital elements. Identifying these elements is based on the determination of viable and lethal mutants and associated structures. However, determining the structure of enough mutants at high resolution is not always possible. Here, we introduce a new computational method, MC-3DQSAR, to determine the vital elements of target RNA structure from mutagenesis and available high-resolution data. We applied the method to further characterize the structural determinants of the bacterial 23S ribosomal RNA sarcin-ricin loop (SRL), as well as those of the lead-activated and hammerhead ribozymes. The method was accurate in confirming experimentally determined essential structural elements and predicting the viability of new SRL variants, which were either observed in bacteria or validated in bacterial growth assays. Our results indicate that MC-3DQSAR could be used systematically to evaluate the drug-target potentials of any RNA sites using current high-resolution structural data.


Subject(s)
Quantitative Structure-Activity Relationship , RNA/chemistry , Computational Biology/methods , Models, Molecular , RNA, Bacterial/chemistry , RNA, Bacterial/metabolism , RNA, Catalytic/chemistry , RNA, Catalytic/metabolism , RNA, Ribosomal, 23S/chemistry , RNA, Ribosomal, 23S/metabolism
13.
RNA Biol ; 12(2): 162-74, 2015.
Article in English | MEDLINE | ID: mdl-25826568

ABSTRACT

ADARs (Adenosine deaminases that act on RNA) "edit" RNA by converting adenosines to inosines within double-stranded regions. The primary targets of ADARs are long duplexes present within noncoding regions of mRNAs, such as introns and 3' untranslated regions (UTRs). Because adenosine and inosine have different base-pairing properties, editing within these regions can alter splicing and recognition by small RNAs. However, despite numerous studies identifying multiple editing sites in these genomic regions, little is known about the extent to which editing sites co-occur on individual transcripts or the functional output of these combinatorial editing events. To begin to address these questions, we performed an ultra-deep sequencing analysis of 4 Caenorhabditis elegans 3' UTRs that are known ADAR targets. Synchronous editing events were determined for the long duplexes in vivo. Furthermore, the validity of each editing event was confirmed by sequencing the same regions of mRNA from worms that lack A-to-I editing. This analysis identified a large number of editing sites that can occur within each 3' UTR, but interestingly, each individual transcript contained only a small fraction of these A-to-I editing events. In addition, editing patterns were not random, indicating that an editing event can affect the efficiency of editing at subsequent adenosines. Furthermore, we identified specific sites that can be both positively and negatively correlated with additional sites leading to mutually exclusive editing patterns. These results suggest that editing in noncoding regions is selective and hyper-editing of cellular RNAs is rare.


Subject(s)
Adenosine Deaminase/metabolism , Adenosine/metabolism , Caenorhabditis elegans Proteins/metabolism , Caenorhabditis elegans/metabolism , Inosine/metabolism , RNA Editing , RNA, Helminth/metabolism , 3' Untranslated Regions , Adenosine Deaminase/genetics , Animals , Base Pairing , Base Sequence , Caenorhabditis elegans/genetics , Caenorhabditis elegans Proteins/genetics , Deamination , Exons , High-Throughput Nucleotide Sequencing , Introns , Molecular Sequence Data , Nucleic Acid Conformation , Open Reading Frames , RNA, Helminth/genetics
14.
RNA ; 18(4): 610-25, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22361291

ABSTRACT

We report the results of a first, collective, blind experiment in RNA three-dimensional (3D) structure prediction, encompassing three prediction puzzles. The goals are to assess the leading edge of RNA structure prediction techniques; compare existing methods and tools; and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in ongoing efforts to improve prediction tools. We also report the creation of an automated evaluation pipeline to facilitate the analysis of future RNA structure prediction exercises.


Subject(s)
Nucleic Acid Conformation , RNA/chemistry , Base Sequence , Dimerization , Models, Molecular , Molecular Sequence Data
15.
Nature ; 452(7183): 51-5, 2008 Mar 06.
Article in English | MEDLINE | ID: mdl-18322526

ABSTRACT

The classical RNA secondary structure model considers A.U and G.C Watson-Crick as well as G.U wobble base pairs. Here we substitute it for a new one, in which sets of nucleotide cyclic motifs define RNA structures. This model allows us to unify all base pairing energetic contributions in an effective scoring function to tackle the problem of RNA folding. We show how pipelining two computer algorithms based on nucleotide cyclic motifs, MC-Fold and MC-Sym, reproduces a series of experimentally determined RNA three-dimensional structures from the sequence. This demonstrates how crucial the consideration of all base-pairing interactions is in filling the gap between sequence and structure. We use the pipeline to define rules of precursor microRNA folding in double helices, despite the presence of a number of presumed mismatches and bulges, and to propose a new model of the human immunodeficiency virus-1 -1 frame-shifting element.


Subject(s)
Computational Biology , Nucleic Acid Conformation , RNA/chemistry , RNA/genetics , Software , Algorithms , Base Pairing , Base Sequence , Frameshifting, Ribosomal , Genes, gag/genetics , Genes, pol/genetics , HIV-1/genetics , Humans , MicroRNAs/chemistry , MicroRNAs/metabolism , Models, Genetic , Models, Molecular , Molecular Sequence Data , RNA Precursors/chemistry , RNA Precursors/metabolism , RNA, Viral/chemistry , RNA, Viral/genetics , RNA, Viral/metabolism , Thermodynamics
17.
Cancer Gene Ther ; 31(8): 1237-1250, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38977895

ABSTRACT

The majority of cancer deaths are caused by solid tumors, where the four most prevalent cancers (breast, lung, colorectal and prostate) account for more than 60% of all cases (1). Tumor cell heterogeneity driven by variable cancer microenvironments, such as hypoxia, is a key determinant of therapeutic outcome. We developed a novel culture protocol, termed the Long-Term Hypoxia (LTHY) time course, to recapitulate the gradual development of severe hypoxia seen in vivo to mimic conditions observed in primary tumors. Cells subjected to LTHY underwent a non-canonical epithelial to mesenchymal transition (EMT) based on miRNA and mRNA signatures as well as displayed EMT-like morphological changes. Concomitant to this, we report production of a novel truncated isoform of WT1 transcription factor (tWt1), a non-canonical EMT driver, with expression driven by a yet undescribed intronic promoter through hypoxia-responsive elements (HREs). We further demonstrated that tWt1 initiates translation from an intron-derived start codon, retains proper subcellular localization and DNA binding. A similar tWt1 is also expressed in LTHY-cultured human cancer cell lines as well as primary cancers and predicts long-term patient survival. Our study not only demonstrates the importance of culture conditions that better mimic those observed in primary cancers, especially with regards to hypoxia, but also identifies a novel isoform of WT1 which correlates with poor long-term survival in ovarian cancer.


Subject(s)
Epithelial-Mesenchymal Transition , Protein Isoforms , WT1 Proteins , Humans , Epithelial-Mesenchymal Transition/genetics , WT1 Proteins/metabolism , WT1 Proteins/genetics , Protein Isoforms/genetics , Protein Isoforms/metabolism , Cell Line, Tumor , Neoplasms/metabolism , Neoplasms/genetics , Neoplasms/pathology , Gene Expression Regulation, Neoplastic
18.
RNA ; 17(9): 1664-77, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21778280

ABSTRACT

The NMR solution structure is reported of a duplex, 5'GUGAAGCCCGU/3'UCACAGGAGGC, containing a 4 × 4 nucleotide internal loop from an R2 retrotransposon RNA. The loop contains three sheared purine-purine pairs and reveals a structural element found in other RNAs, which we refer to as the 3RRs motif. Optical melting measurements of the thermodynamics of the duplex indicate that the internal loop is 1.6 kcal/mol more stable at 37°C than predicted. The results identify the 3RRs motif as a common structural element that can facilitate prediction of 3D structure. Known examples include internal loops having the pairings: 5'GAA/3'AGG, 5'GAG/3'AGG, 5'GAA/3'AAG, and 5'AAG/3'AGG. The structural information is compared with predictions made with the MC-Sym program.


Subject(s)
Nuclear Magnetic Resonance, Biomolecular/methods , Nucleic Acid Conformation , Purine Nucleotides/chemistry , RNA/chemistry , Retroelements , Adenine/chemistry , Amino Acid Motifs , Base Pairing , Protein Interaction Domains and Motifs , RNA/genetics , Sequence Analysis, RNA , Thermodynamics
19.
Bioinformatics ; 28(12): i207-14, 2012 Jun 15.
Article in English | MEDLINE | ID: mdl-22689763

ABSTRACT

MOTIVATION: The prediction of RNA 3D structures from its sequence only is a milestone to RNA function analysis and prediction. In recent years, many methods addressed this challenge, ranging from cycle decomposition and fragment assembly to molecular dynamics simulations. However, their predictions remain fragile and limited to small RNAs. To expand the range and accuracy of these techniques, we need to develop algorithms that will enable to use all the structural information available. In particular, the energetic contribution of secondary structure interactions is now well documented, but the quantification of non-canonical interactions-those shaping the tertiary structure-is poorly understood. Nonetheless, even if a complete RNA tertiary structure energy model is currently unavailable, we now have catalogues of local 3D structural motifs including non-canonical base pairings. A practical objective is thus to develop techniques enabling us to use this knowledge for robust RNA tertiary structure predictors. RESULTS: In this work, we introduce RNA-MoIP, a program that benefits from the progresses made over the last 30 years in the field of RNA secondary structure prediction and expands these methods to incorporate the novel local motif information available in databases. Using an integer programming framework, our method refines predicted secondary structures (i.e. removes incorrect canonical base pairs) to accommodate the insertion of RNA 3D motifs (i.e. hairpins, internal loops and k-way junctions). Then, we use predictions as templates to generate complete 3D structures with the MC-Sym program. We benchmarked RNA-MoIP on a set of 9 RNAs with sizes varying from 53 to 128 nucleotides. We show that our approach (i) improves the accuracy of canonical base pair predictions; (ii) identifies the best secondary structures in a pool of suboptimal structures; and (iii) predicts accurate 3D structures of large RNA molecules. AVAILABILITY: RNA-MoIP is publicly available at: http://csb.cs.mcgill.ca/RNAMoIP.


Subject(s)
Algorithms , Nucleic Acid Conformation , Nucleotide Motifs , RNA/chemistry , Software , Base Pairing , Databases, Nucleic Acid , Models, Theoretical , RNA/genetics
20.
Front Genet ; 14: 1254226, 2023.
Article in English | MEDLINE | ID: mdl-37732325

ABSTRACT

Introduction: Prediction of RNA secondary structure from single sequences still needs substantial improvements. The application of machine learning (ML) to this problem has become increasingly popular. However, ML algorithms are prone to overfitting, limiting the ability to learn more about the inherent mechanisms governing RNA folding. It is natural to use high-capacity models when solving such a difficult task, but poor generalization is expected when too few examples are available. Methods: Here, we report the relation between capacity and performance on a fundamental related problem: determining whether two sequences are fully complementary. Our analysis focused on the impact of model architecture and capacity as well as dataset size and nature on classification accuracy. Results: We observed that low-capacity models are better suited for learning with mislabelled training examples, while large capacities improve the ability to generalize to structurally dissimilar data. It turns out that neural networks struggle to grasp the fundamental concept of base complementarity, especially in lengthwise extrapolation context. Discussion: Given a more complex task like RNA folding, it comes as no surprise that the scarcity of useable examples hurdles the applicability of machine learning techniques to this field.

SELECTION OF CITATIONS
SEARCH DETAIL