Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 824
Filter
1.
Nature ; 630(8018): 994-1002, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38926616

ABSTRACT

Insertion sequence (IS) elements are the simplest autonomous transposable elements found in prokaryotic genomes1. We recently discovered that IS110 family elements encode a recombinase and a non-coding bridge RNA (bRNA) that confers modular specificity for target DNA and donor DNA through two programmable loops2. Here we report the cryo-electron microscopy structures of the IS110 recombinase in complex with its bRNA, target DNA and donor DNA in three different stages of the recombination reaction cycle. The IS110 synaptic complex comprises two recombinase dimers, one of which houses the target-binding loop of the bRNA and binds to target DNA, whereas the other coordinates the bRNA donor-binding loop and donor DNA. We uncovered the formation of a composite RuvC-Tnp active site that spans the two dimers, positioning the catalytic serine residues adjacent to the recombination sites in both target and donor DNA. A comparison of the three structures revealed that (1) the top strands of target and donor DNA are cleaved at the composite active sites to form covalent 5'-phosphoserine intermediates, (2) the cleaved DNA strands are exchanged and religated to create a Holliday junction intermediate, and (3) this intermediate is subsequently resolved by cleavage of the bottom strands. Overall, this study reveals the mechanism by which a bispecific RNA confers target and donor DNA specificity to IS110 recombinases for programmable DNA recombination.


Subject(s)
DNA , RNA, Untranslated , Recombination, Genetic , Catalytic Domain , Cryoelectron Microscopy , DNA/chemistry , DNA/metabolism , DNA/ultrastructure , DNA Transposable Elements/genetics , Models, Molecular , Nucleic Acid Conformation , Protein Multimerization , Recombinases/chemistry , Recombinases/genetics , Recombinases/metabolism , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , RNA, Untranslated/ultrastructure , Substrate Specificity
2.
Bioinformatics ; 40(Supplement_1): i237-i246, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940169

ABSTRACT

MOTIVATION: Noncoding RNAs (ncRNAs) express their functions by adopting molecular structures. Specifically, RNA secondary structures serve as a relatively stable intermediate step before tertiary structures, offering a reliable signature of molecular function. Consequently, within an RNA functional family, secondary structures are generally more evolutionarily conserved than sequences. Conversely, homologous RNA families grouped within an RNA clan share ancestors but typically exhibit structural differences. Inferring the evolution of RNA structures within RNA families and clans is crucial for gaining insights into functional adaptations over time and providing clues about the Ancient RNA World Hypothesis. RESULTS: We introduce the median problem and the small parsimony problem for ncRNA families, where secondary structures are represented as leaf-labeled trees. We utilize the Robinson-Foulds (RF) tree distance, which corresponds to a specific edit distance between RNA trees, and a new metric called the Internal-Leafset (IL) distance. While the RF tree distance compares sets of leaves descending from internal nodes of two RNA trees, the IL distance compares the collection of leaf-children of internal nodes. The latter is better at capturing differences in structural elements of RNAs than the RF distance, which is more focused on base pairs. We also consider a more general tree edit distance that allows the mapping of base pairs that are not perfectly aligned. We study the theoretical complexity of the median problem and the small parsimony problem under the three distance metrics and various biologically relevant constraints, and we present polynomial-time maximum parsimony algorithms for solving some versions of the problems. Our algorithms are applied to ncRNA families from the RFAM database, illustrating their practical utility. AVAILABILITY AND IMPLEMENTATION: https://github.com/bmarchand/rna\_small\_parsimony.


Subject(s)
Nucleic Acid Conformation , RNA, Untranslated , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , Algorithms , Evolution, Molecular , Sequence Analysis, RNA/methods , Computational Biology/methods
3.
Article in English | MEDLINE | ID: mdl-38872612

ABSTRACT

Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by incorporating the non-coding RNA (ncRNA) sequences from RNAcentral, the transcriptome assembly and metagenome assembly from metagenomics RAST (MG-RAST), the genomic sequences from Genome Warehouse (GWH), and the genomic sequences from MGnify, in addition to the nucleotide (nt) database and its subsets in National Center of Biotechnology Information (NCBI). The resulting Master database of All possible RNA sequences (MARS) is 20-fold larger than NCBI's nt database or 60-fold larger than RNAcentral. The new dataset along with a new split-search strategy allows a substantial improvement in homology search over existing state-of-the-art techniques. It also yields more accurate and more sensitive multiple sequence alignments (MSAs) than manually curated MSAs from Rfam for the majority of structured RNAs mapped to Rfam. The results indicate that MARS coupled with the fully automatic homology search tool RNAcmap will be useful for improved structural and functional inference of ncRNAs and RNA language models based on MSAs. MARS is accessible at https://ngdc.cncb.ac.cn/omix/release/OMIX003037, and RNAcmap3 is accessible at http://zhouyq-lab.szbl.ac.cn/download/.


Subject(s)
Databases, Nucleic Acid , Sequence Alignment , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , Sequence Analysis, RNA/methods , RNA/genetics , RNA/chemistry , Software , Databases, Genetic
4.
Comput Biol Med ; 177: 108660, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38820774

ABSTRACT

Omics-based technologies have revolutionized our comprehension of microproteins encoded by ncRNAs, revealing their abundant presence and pivotal roles within complex functional landscapes. Here, we developed MicroProteinDB (http://bio-bigdata.hrbmu.edu.cn/MicroProteinDB), which offers and visualizes the extensive knowledge to aid retrieval and analysis of computationally predicted and experimentally validated microproteins originating from various ncRNA types. Employing prediction algorithms grounded in diverse deep learning approaches, MicroProteinDB comprehensively documents the fundamental physicochemical properties, secondary and tertiary structures, interactions with functional proteins, family domains, and inter-species conservation of microproteins. With five major analytical modules, it will serve as a valuable knowledge for investigating ncRNA-derived microproteins.


Subject(s)
Databases, Protein , RNA, Untranslated , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , Humans , Proteins/chemistry , Animals , Micropeptides
5.
Nucleic Acids Res ; 52(9): 5152-5165, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38647067

ABSTRACT

Structured noncoding RNAs (ncRNAs) contribute to many important cellular processes involving chemical catalysis, molecular recognition and gene regulation. Few ncRNA classes are broadly distributed among organisms from all three domains of life, but the list of rarer classes that exhibit surprisingly diverse functions is growing. We previously developed a computational pipeline that enables the near-comprehensive identification of structured ncRNAs expressed from individual bacterial genomes. The regions between protein coding genes are first sorted based on length and the fraction of guanosine and cytidine nucleotides. Long, GC-rich intergenic regions are then examined for sequence and structural similarity to other bacterial genomes. Herein, we describe the implementation of this pipeline on 50 bacterial genomes from varied phyla. More than 4700 candidate intergenic regions with the desired characteristics were identified, which yielded 44 novel riboswitch candidates and numerous other putative ncRNA motifs. Although experimental validation studies have yet to be conducted, this rate of riboswitch candidate discovery is consistent with predictions that many hundreds of novel riboswitch classes remain to be discovered among the bacterial species whose genomes have already been sequenced. Thus, many thousands of additional novel ncRNA classes likely remain to be discovered in the bacterial domain of life.


Subject(s)
Genome, Bacterial , RNA, Bacterial , RNA, Untranslated , DNA, Intergenic/genetics , Genome, Bacterial/genetics , Genomics/methods , Riboswitch/genetics , RNA, Bacterial/genetics , RNA, Bacterial/chemistry , RNA, Untranslated/genetics , RNA, Untranslated/classification , RNA, Untranslated/chemistry
6.
Chembiochem ; 25(11): e202400029, 2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38595046

ABSTRACT

Peptide nucleic acid (PNA) based antisense strategy is a promising therapeutic approach to specifically inhibit target gene expression. However, unlike protein coding genes, identification of an ideal PNA binding site for non-coding RNA is not straightforward. Here, we compare the inhibitory activities of PNA molecules that bind a non-coding 4.5S RNA called SRP RNA, a key component of the bacterial signal recognition particle (SRP). A 9-mer PNA (PNA9) complementary to the tetraloop region of the RNA was more potent in inhibiting its interaction with the SRP protein, compared to an 8-mer PNA (PNA8) targeting a stem-loop. PNA9, which contained a homo-pyrimidine sequence could form a triplex with the complementary stretch of RNA in vitro as confirmed using a fluorescent derivative of PNA9 (F-PNA13). The RNA-PNA complex formation resulted in inhibition of SRP function with PNA9 and F-PNA13, but not PNA8 highlighting the importance of target site selection. Surprisingly, F-PNA13 which was more potent in inhibiting SRP function in vitro, showed weaker antibacterial activity compared to PNA9 likely due to poor cell penetration of the longer PNA. Our results underscore the importance of suitable target site selection and optimum PNA length to develop better antisense molecules against non-coding RNA.


Subject(s)
Peptide Nucleic Acids , Peptide Nucleic Acids/chemistry , Peptide Nucleic Acids/pharmacology , Peptide Nucleic Acids/metabolism , Escherichia coli/drug effects , Escherichia coli/genetics , Binding Sites , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , RNA, Untranslated/metabolism , Signal Recognition Particle/metabolism , Signal Recognition Particle/chemistry , Signal Recognition Particle/genetics , RNA, Bacterial/chemistry , RNA, Bacterial/genetics , RNA, Bacterial/metabolism , Base Sequence , Nucleic Acid Conformation
7.
J Chem Inf Model ; 64(7): 2221-2235, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-37158609

ABSTRACT

Noncoding RNAs (ncRNAs) play crucial roles in many cellular life activities by interacting with proteins. Identification of ncRNA-protein interactions (ncRPIs) is key to understanding the function of ncRNAs. Although a number of computational methods for predicting ncRPIs have been developed, the problem of predicting ncRPIs remains challenging. It has always been the focus of ncRPIs research to select suitable feature extraction methods and develop a deep learning architecture with better recognition performance. In this work, we proposed an ensemble deep learning framework, RPI-EDLCN, based on a capsule network (CapsuleNet) to predict ncRPIs. In terms of feature input, we extracted the sequence features, secondary structure sequence features, motif information, and physicochemical properties of ncRNA/protein. The sequence and secondary structure sequence features of ncRNA/protein are encoded by the conjoint k-mer method and then input into an ensemble deep learning model based on CapsuleNet by combining the motif information and physicochemical properties. In this model, the encoding features are processed by convolution neural network (CNN), deep neural network (DNN), and stacked autoencoder (SAE). Then the advanced features obtained from the processing are input into the CapsuleNet for further feature learning. Compared with other state-of-the-art methods under 5-fold cross-validation, the performance of RPI-EDLCN is the best, and the accuracy of RPI-EDLCN on RPI1807, RPI2241, and NPInter v2.0 data sets was 93.8%, 88.2%, and 91.9%, respectively. The results of the independent test indicated that RPI-EDLCN can effectively predict potential ncRPIs in different organisms. In addition, RPI-EDLCN successfully predicted hub ncRNAs and proteins in Mus musculus ncRNA-protein networks. Overall, our model can be used as an effective tool to predict ncRPIs and provides some useful guidance for future biological studies.


Subject(s)
Deep Learning , Animals , Mice , RNA, Untranslated/chemistry , RNA, Untranslated/metabolism , Proteins , Neural Networks, Computer
8.
Nucleic Acids Res ; 52(1): 274-287, 2024 Jan 11.
Article in English | MEDLINE | ID: mdl-38000384

ABSTRACT

Most of the transcribed eukaryotic genomes are composed of non-coding transcripts. Among these transcripts, some are newly transcribed when compared to outgroups and are referred to as de novo transcripts. De novo transcripts have been shown to play a major role in genomic innovations. However, little is known about the rates at which de novo transcripts are gained and lost in individuals of the same species. Here, we address this gap and estimate the de novo transcript turnover rate with an evolutionary model. We use DNA long reads and RNA short reads from seven geographically remote samples of inbred individuals of Drosophila melanogaster to detect de novo transcripts that are gained on a short evolutionary time scale. Overall, each sampled individual contains around 2500 unspliced de novo transcripts, with most of them being sample specific. We estimate that around 0.15 transcripts are gained per year, and that each gained transcript is lost at a rate around 5× 10-5 per year. This high turnover of transcripts suggests frequent exploration of new genomic sequences within species. These rate estimates are essential to comprehend the process and timescale of de novo gene birth.


Subject(s)
Drosophila melanogaster , Evolution, Molecular , RNA, Untranslated , Transcription, Genetic , Animals , Humans , Biological Evolution , Drosophila melanogaster/genetics , Genome , Genomics , RNA , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Geography
9.
Eur J Med Chem ; 261: 115850, 2023 Dec 05.
Article in English | MEDLINE | ID: mdl-37839343

ABSTRACT

The growing information currently available on the central role of non-coding RNAs (ncRNAs) including microRNAs (miRNAS) and long non-coding RNAs (lncRNAs) for chronic and degenerative human diseases makes them attractive therapeutic targets. RNAs carry out different functional roles in human biology and are deeply deregulated in several diseases. So far, different attempts to therapeutically target the 3D RNA structures with small molecules have been reported. In this scenario, the development of computational tools suitable for describing RNA structures and their potential interactions with small molecules is gaining more and more interest. Here, we describe the most suitable strategies to study ncRNAs through computational tools. We focus on methods capable of predicting 2D and 3D ncRNA structures. Furthermore, we describe computational tools to identify, design and optimize small molecule ncRNA binders. This review aims to outline the state of the art and perspectives of computational methods for ncRNAs over the past decade.


Subject(s)
MicroRNAs , RNA, Long Noncoding , Humans , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , MicroRNAs/genetics , RNA, Long Noncoding/genetics , RNA, Long Noncoding/therapeutic use
10.
Nucleic Acids Res ; 51(16): 8367-8382, 2023 09 08.
Article in English | MEDLINE | ID: mdl-37471030

ABSTRACT

Understanding the 3D structure of RNA is key to understanding RNA function. RNA 3D structure is modular and can be seen as a composition of building blocks of various sizes called tertiary motifs. Currently, long-range motifs formed between distant loops and helical regions are largely less studied than the local motifs determined by the RNA secondary structure. We surveyed long-range tertiary interactions and motifs in a non-redundant set of non-coding RNA 3D structures. A new dataset of annotated LOng-RAnge RNA 3D modules (LORA) was built using an approach that does not rely on the automatic annotations of non-canonical interactions. An original algorithm, ARTEM, was developed for annotation-, sequence- and topology-independent superposition of two arbitrary RNA 3D modules. The proposed methods allowed us to identify and describe the most common long-range RNA tertiary motifs. Along with the prevalent canonical A-minor interactions, a large number of previously undescribed staple interactions were observed. The most frequent long-range motifs were found to belong to three main motif families: planar staples, tilted staples, and helical packing motifs.


Subject(s)
Nucleic Acid Conformation , RNA, Untranslated , Base Pairing , Nucleotide Motifs , RNA, Untranslated/chemistry
11.
Int J Mol Sci ; 24(10)2023 May 17.
Article in English | MEDLINE | ID: mdl-37240230

ABSTRACT

Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.


Subject(s)
MicroRNAs , RNA, Long Noncoding , RNA, Untranslated/chemistry , RNA, Long Noncoding/genetics , Neural Networks, Computer , Machine Learning , RNA, Ribosomal
12.
Comput Biol Med ; 157: 106783, 2023 05.
Article in English | MEDLINE | ID: mdl-36958237

ABSTRACT

Noncoding RNA (ncRNA) is a functional RNA derived from DNA transcription, and most transcribed genes are transcribed into ncRNA. ncRNA is not directly involved in the translation of proteins, but it can participate in gene expression in cells and affect protein synthesis, thus playing an important role in biological processes such as growth, proliferation, metabolism, and information transmission. Therefore, understanding the interaction between ncRNA and protein is the basis for studying ncRNA regulation of protein-related biological activities. However, it is very expensive and time-consuming to verify ncRNA-protein interaction through biological experiments, and prediction methods based on machine learning have been developed rapidly. Recently, the graph neural network model (GNN) stands out for its excellent performance, but lacks a general framework for predicting ncRNA-protein interactions. We propose a GNN-based framework to predict ncRNA-protein interactions, which can utilize topological structure information to complete prediction tasks faster and more accurately. Meanwhile, for some smaller datasets, many ncRNA nodes lack neighbor information, resulting in lower prediction accuracy. For some larger datasets, the long-tail distribution causes the prediction of the tail nodes (sparse nodes linking few neighbors) to be affected. Therefore, we propose a new sampling method named HeadTailTransfer to mitigate these effects. Experimental results illustrate the effectiveness of this method. Especially for task-specific prediction on the RPI369 dataset in the Graphsage-based neural network framework, the AUC and ACC values increased from 56.8% and 52.2% to 80.2% and 71.8%, respectively. Our data and codes are available: https://github.com/kkkayle/HeadTailTransfer.


Subject(s)
Neural Networks, Computer , RNA, Untranslated , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , RNA, Untranslated/metabolism , Machine Learning , Protein Binding , Proteins/metabolism
13.
Methods Mol Biol ; 2586: 121-146, 2023.
Article in English | MEDLINE | ID: mdl-36705902

ABSTRACT

Noncoding RNAs, ncRNAs, naturally fold into structures, which allow them to perform their functions in the cell. Evolutionarily close species share structures and functions. This occurs because of shared selective pressures, resulting in conserved groups. Previous efforts in finding functional RNAs have been made in detecting conserved structures in genomes or alignments. It may occur that, within a conserved group, species-specific structures arise after species split due to positive selection. Detecting positive selection in ncRNAs is a hard problem in biology as well as bioinformatics. To detect positive selection, one should find species-specific structures within a conserved set. This chapter provides protocols to detect and analyze positive selection in ncRNA structures with the SSS-test and other free software.


Subject(s)
RNA, Untranslated , RNA , RNA/genetics , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , Software , Biological Evolution , Nucleic Acid Conformation
14.
J Phys Chem B ; 126(48): 10018-10033, 2022 12 08.
Article in English | MEDLINE | ID: mdl-36417896

ABSTRACT

Less than one in thirty of the RNA sequences transcribed in humans are translated into protein. The noncoding RNA (ncRNA) functions in catalysis, structure, regulation, and more. However, for the most part, these functions are poorly characterized. RNA is modular and described by motifs that include helical A-RNA with canonical Watson-Crick base-pairing as well as structures with only noncanonical base pairs. Understanding the structure and dynamics of motifs will aid in deciphering functions of specific ncRNAs. We present computational studies on a standard sarcin/ricin domain (SRD), citrus bark cracking viroid SRD, as well as A-RNA. We have applied enhanced molecular dynamics techniques that construct an inverse free-energy surface (iFES) determined by collective variables that monitor base-pairing and backbone conformation. Each SRD RNA is flanked on each side by A-RNA, allowing comparison of the behavior of these motifs in the same molecule. The RNA iFESs have single peaks, indicating that the combined motifs should denature as a single cohesive unit, rather than by regional melting. Local root-mean-square deviation (RMSD) analysis and communication propensity (CProp, variance in distances between residue pairs) reveal distinct motif properties. Our analysis indicates that the standard SRD is more stable than the viroid SRD, which is more stable than A-RNA. Base pairs at SRD to A-RNA transitions have limited flexibility. Application of CProp reveals extraordinary stiffness of the SRD, allowing residues on opposite sides of the motif to sense each other's motions.


Subject(s)
Molecular Dynamics Simulation , Nucleotide Motifs , RNA, Untranslated , Humans , Ricin , RNA, Untranslated/chemistry , Base Pairing , Nucleic Acid Conformation
15.
Neural Netw ; 156: 170-178, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36274524

ABSTRACT

Non-coding RNAs (ncRNAs) play an important role in revealing the mechanism of human disease for anti-tumor and anti-virus substances. Detecting subcellular locations of ncRNAs is a necessary way to study ncRNA. Traditional biochemical methods are time-consuming and labor-intensive, and computational-based methods can help detect the location of ncRNAs on a large scale. However, many models did not consider the correlation information among multiple subcellular localizations of ncRNAs. This study proposes a radial basis function neural network based on shared subspace learning (RBFNN-SSL), which extract shared structures in multi-labels. To evaluate performance, our classifier is tested on three ncRNA datasets. Our model achieves better performance in experimental results.


Subject(s)
Neural Networks, Computer , RNA, Untranslated , Humans , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , Computational Biology/methods
16.
Nucleic Acids Res ; 50(19): 11229-11242, 2022 10 28.
Article in English | MEDLINE | ID: mdl-36259651

ABSTRACT

Non-coding RNAs (ncRNAs) ubiquitously exist in normal and cancer cells. Despite their prevalent distribution, the functions of most long ncRNAs remain uncharacterized. The fission yeast Schizosaccharomyces pombe expresses >1800 ncRNAs annotated to date, but most unconventional ncRNAs (excluding tRNA, rRNA, snRNA and snoRNA) remain uncharacterized. To discover the functional ncRNAs, here we performed a combinatory screening of computational and biological tests. First, all S. pombe ncRNAs were screened in silico for those showing conservation in sequence as well as in secondary structure with ncRNAs in closely related species. Almost a half of the 151 selected conserved ncRNA genes were uncharacterized. Twelve ncRNA genes that did not overlap with protein-coding sequences were next chosen for biological screening that examines defects in growth or sexual differentiation, as well as sensitivities to drugs and stresses. Finally, we highlighted an ncRNA transcribed from SPNCRNA.1669, which inhibited untimely initiation of sexual differentiation. A domain that was predicted as conserved secondary structure by the computational operations was essential for the ncRNA to function. Thus, this study demonstrates that in silico selection focusing on conservation of the secondary structure over species is a powerful method to pinpoint novel functional ncRNAs.


Subject(s)
Schizosaccharomyces , Schizosaccharomyces/genetics , Sex Differentiation , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , RNA, Small Nucleolar/genetics , Open Reading Frames
17.
Nature ; 609(7926): 394-399, 2022 09.
Article in English | MEDLINE | ID: mdl-35978193

ABSTRACT

Cellular RNAs are heterogeneous with respect to their alternative processing and secondary structures, but the functional importance of this complexity is still poorly understood. A set of alternatively processed antisense non-coding transcripts, which are collectively called COOLAIR, are generated at the Arabidopsis floral-repressor locus FLOWERING LOCUS C (FLC)1. Different isoforms of COOLAIR influence FLC transcriptional output in warm and cold conditions2-7. Here, to further investigate the function of COOLAIR, we developed an RNA structure-profiling method to determine the in vivo structure of single RNA molecules rather than the RNA population average. This revealed that individual isoforms of the COOLAIR transcript adopt multiple structures with different conformational dynamics. The major distally polyadenylated COOLAIR isoform in warm conditions adopts three predominant structural conformations, the proportions and conformations of which change after cold exposure. An alternatively spliced, strongly cold-upregulated distal COOLAIR isoform6 shows high structural diversity, in contrast to proximally polyadenylated COOLAIR. A hyper-variable COOLAIR structural element was identified that was complementary to the FLC transcription start site. Mutations altering the structure of this region changed FLC expression and flowering time, consistent with an important regulatory role of the COOLAIR structure in FLC transcription. Our work demonstrates that isoforms of non-coding RNA transcripts adopt multiple distinct and functionally relevant structural conformations, which change in abundance and shape in response to external conditions.


Subject(s)
Arabidopsis , Nucleic Acid Conformation , RNA, Antisense , RNA, Plant , RNA, Untranslated , Single Molecule Imaging , Arabidopsis/genetics , Arabidopsis Proteins/genetics , Flowers/genetics , Flowers/growth & development , Gene Expression Regulation, Plant , MADS Domain Proteins/genetics , RNA, Antisense/chemistry , RNA, Antisense/genetics , RNA, Plant/chemistry , RNA, Plant/genetics , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , Transcription Initiation Site , Transcription, Genetic
18.
PLoS Comput Biol ; 18(7): e1010240, 2022 07.
Article in English | MEDLINE | ID: mdl-35797361

ABSTRACT

It is well-established that neural networks can predict or identify structural motifs of non-coding RNAs (ncRNAs). Yet, the neural network based identification of RNA structural motifs is limited by the availability of training data that are often insufficient for learning features of specific ncRNA families or structural motifs. Aiming to reliably identify intrinsic transcription terminators in bacteria, we introduce a novel pre-training approach that uses inverse folding to generate training data for predicting or identifying a specific family or structural motif of ncRNA. We assess the ability of neural networks to identify secondary structure by systematic in silico mutagenesis experiments. In a study to identify intrinsic transcription terminators as functionally well-understood RNA structural motifs, our inverse folding based pre-training approach significantly boosts the performance of neural network topologies, which outperform previous approaches to identify intrinsic transcription terminators. Inverse-folding based pre-training provides a simple, yet highly effective way to integrate the well-established thermodynamic energy model into deep neural networks for identifying ncRNA families or motifs. The pre-training technique is broadly applicable to a range of network topologies as well as different types of ncRNA families and motifs.


Subject(s)
Neural Networks, Computer , RNA, Untranslated , Humans , Nucleotide Motifs , RNA, Untranslated/chemistry , RNA, Untranslated/genetics
19.
J Virol ; 96(8): e0194621, 2022 04 27.
Article in English | MEDLINE | ID: mdl-35353000

ABSTRACT

Hepatitis C virus (HCV) is a positive-strand RNA virus that remains one of the main contributors to chronic liver disease worldwide. Studies over the last 30 years have demonstrated that HCV contains a highly structured RNA genome and many of these structures play essential roles in the HCV life cycle. Despite the importance of riboregulation in this virus, most of the HCV RNA genome remains functionally unstudied. Here, we report a complete secondary structure map of the HCV RNA genome in vivo, which was studied in parallel with the secondary structure of the same RNA obtained in vitro. Our results show that HCV is folded extensively in the cellular context. By performing comprehensive structural analyses on both in vivo data and in vitro data, we identify compact and conserved secondary and tertiary structures throughout the genome. Genetic and evolutionary functional analyses demonstrate that many of these elements play important roles in the virus life cycle. In addition to providing a comprehensive map of RNA structures and riboregulatory elements in HCV, this work provides a resource for future studies aimed at identifying therapeutic targets and conducting further mechanistic studies on this important human pathogen. IMPORTANCE HCV has one of the most highly structured RNA genomes studied to date, and it is a valuable model system for studying the role of RNA structure in protein-coding genes. While previous studies have identified individual cases of regulatory RNA structures within the HCV genome, the full-length structure of the HCV genome has not been determined in vivo. Here, we present the complete secondary structure map of HCV determined both in cells and from corresponding transcripts generated in vitro. In addition to providing a comprehensive atlas of functional secondary structural elements throughout the genomic RNA, we identified a novel set of tertiary interactions and demonstrated their functional importance. In terms of broader implications, the pipeline developed in this study can be applied to other long RNAs, such as long noncoding RNAs. In addition, the RNA structural motifs characterized in this study broaden the repertoire of known riboregulatory elements.


Subject(s)
Genome, Viral , Hepacivirus , RNA, Viral , Genome, Viral/genetics , Hepacivirus/genetics , Hepatitis C/virology , Humans , RNA, Untranslated/chemistry , RNA, Viral/chemistry , RNA, Viral/genetics
20.
Genome Res ; 32(5): 968-985, 2022 05.
Article in English | MEDLINE | ID: mdl-35332099

ABSTRACT

The recent development and application of methods based on the general principle of "crosslinking and proximity ligation" (crosslink-ligation) are revolutionizing RNA structure studies in living cells. However, extracting structure information from such data presents unique challenges. Here, we introduce a set of computational tools for the systematic analysis of data from a wide variety of crosslink-ligation methods, specifically focusing on read mapping, alignment classification, and clustering. We design a new strategy to map short reads with irregular gaps at high sensitivity and specificity. Analysis of previously published data reveals distinct properties and bias caused by the crosslinking reactions. We perform rigorous and exhaustive classification of alignments and discover eight types of arrangements that provide distinct information on RNA structures and interactions. To deconvolve the dense and intertwined gapped alignments, we develop a network/graph-based tool Crosslinked RNA Secondary Structure Analysis using Network Techniques (CRSSANT), which enables clustering of gapped alignments and discovery of new alternative and dynamic conformations. We discover that multiple crosslinking and ligation events can occur on the same RNA, generating multisegment alignments to report complex high-level RNA structures and multi-RNA interactions. We find that alignments with overlapped segments are produced from potential homodimers and develop a new method for their de novo identification. Analysis of overlapping alignments revealed potential new homodimers in cellular noncoding RNAs and RNA virus genomes in the Picornaviridae family. Together, this suite of computational tools enables rapid and efficient analysis of RNA structure and interaction data in living cells.


Subject(s)
RNA, Untranslated , RNA , Algorithms , Cluster Analysis , RNA/chemistry , RNA/genetics , RNA, Untranslated/chemistry , Sequence Analysis, RNA/methods , Software
SELECTION OF CITATIONS
SEARCH DETAIL