ABSTRACT
N6-methyladenosine (m6A) is the most abundant internal eukaryotic mRNA modification, and is involved in the regulation of various biological processes. Direct Nanopore sequencing of native RNA (dRNA-seq) emerged as a leading approach for its identification. Several software were published for m6A detection and there is a strong need for independent studies benchmarking their performance on data from different species, and against various reference datasets. Moreover, a computational workflow is needed to streamline the execution of tools whose installation and execution remains complicated. We developed NanOlympicsMod, a Nextflow pipeline exploiting containerized technology for comparing 14 tools for m6A detection on dRNA-seq data. NanOlympicsMod was tested on dRNA-seq data generated from in vitro (un)modified synthetic oligos. The m6A hits returned by each tool were compared to the m6A position known by design of the oligos. In addition, NanOlympicsMod was used on dRNA-seq datasets from wild-type and m6A-depleted yeast, mouse and human, and each tool's hits were compared to reference m6A sets generated by leading orthogonal methods. The performance of the tools markedly differed across datasets, and methods adopting different approaches showed different preferences in terms of precision and recall. Changing the stringency cut-offs allowed for tuning the precision-recall trade-off towards user preferences. Finally, we determined that precision and recall of tools are markedly influenced by sequencing depth, and that additional sequencing would likely reveal additional m6A sites. Thanks to the possibility of including novel tools, NanOlympicsMod will streamline the benchmarking of m6A detection tools on dRNA-seq data, improving future RNA modification characterization.
Subject(s)
Adenine/analogs & derivatives , Nanopore Sequencing , Nanopores , Humans , Animals , Mice , RNA/genetics , Benchmarking , Sequence Analysis, RNA/methodsABSTRACT
Nanopore sequencing devices read individual RNA strands directly. This facilitates identification of exon linkages and nucleotide modifications; however, using conventional direct RNA nanopore sequencing, the 5' and 3' ends of poly(A) RNA cannot be identified unambiguously. This is due in part to RNA degradation in vivo and in vitro that can obscure transcription start and end sites. In this study, we aimed to identify individual full-length human RNA isoforms among â¼4 million nanopore poly(A)-selected RNA reads. First, to identify RNA strands bearing 5' m7G caps, we exchanged the biological cap for a modified cap attached to a 45-nt oligomer. This oligomer adaptation method improved 5' end sequencing and ensured correct identification of the 5' m7G capped ends. Second, among these 5'-capped nanopore reads, we screened for features consistent with a 3' polyadenylation site. Combining these two steps, we identified 294,107 individual high-confidence full-length RNA scaffolds from human GM12878 cells, most of which (257,721) aligned to protein-coding genes. Of these, 4876 scaffolds indicated unannotated isoforms that were often internal to longer, previously identified RNA isoforms. Orthogonal data for m7G caps and open chromatin, such as CAGE and DNase-HS seq, confirmed the validity of these high-confidence RNA scaffolds.
Subject(s)
RNA Isoforms/chemistry , RNA, Messenger/chemistry , Cell Line, Tumor , Humans , Nanopore Sequencing/methods , RNA 3' Polyadenylation Signals , RNA Isoforms/genetics , RNA, Messenger/genetics , TranscriptomeABSTRACT
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested subgenomic RNAsused to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5' cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.
Subject(s)
COVID-19 , Nanopores , RNA, Guide, Kinetoplastida/chemistry , COVID-19/genetics , Genome, Viral/genetics , Humans , RNA Caps , RNA, Viral/genetics , RNA, Viral/metabolism , SARS-CoV-2/geneticsABSTRACT
Understanding transcriptomes requires documenting the structures, modifications, and abundances of RNAs as well as their proximity to other molecules. The methods that make this possible depend critically on enzymes (including mutant derivatives) that act on nucleic acids for capturing and sequencing RNA. We tested two 3' nucleotidyl transferases, Saccharomyces cerevisiae poly(A) polymerase and Schizosaccharomyces pombe Cid1, for the ability to add base and sugar modified rNTPs to free RNA 3' ends, eventually focusing on Cid1. Although unable to polymerize ΨTP or 1meΨTP, Cid1 can use 5meUTP and 4thioUTP. Surprisingly, Cid1 can use inosine triphosphate to add poly(I) to the 3' ends of a wide variety of RNA molecules. Most poly(A) mRNAs efficiently acquire a uniform tract of about 50 inosine residues from Cid1, whereas non-poly(A) RNAs acquire longer, more heterogeneous tails. Here we test these activities for use in direct RNA sequencing on nanopores, and find that Cid1-mediated poly(I)-tailing permits detection and quantification of both mRNAs and non-poly(A) RNAs simultaneously, as well as enabling the analysis of nascent RNAs associated with RNA polymerase II. Poly(I) produces a different current trace than poly(A), enabling recognition of native RNA 3' end sequence lost by in vitro poly(A) addition. Addition of poly(I) by Cid1 offers a broadly useful alternative to poly(A) capture for direct RNA sequencing on nanopores.
Subject(s)
Nanopores , Nucleotides/chemistry , Nucleotidyltransferases/metabolism , Polymers/chemistry , Polynucleotide Adenylyltransferase/metabolism , Saccharomyces cerevisiae/enzymology , Schizosaccharomyces pombe Proteins/metabolism , Schizosaccharomyces/enzymology , Sequence Analysis, RNA/methods , Nucleotidyltransferases/genetics , Polynucleotide Adenylyltransferase/genetics , Schizosaccharomyces pombe Proteins/geneticsABSTRACT
The covalent modification of RNA molecules is a pervasive feature of all classes of RNAs and has fundamental roles in the regulation of several cellular processes. Mapping the location of RNA modifications transcriptome-wide is key to unveiling their role and dynamic behaviour, but technical limitations have often hampered these efforts. Nanopore direct RNA sequencing is a third-generation sequencing technology that allows the sequencing of native RNA molecules, thus providing a direct way to detect modifications at single-molecule resolution. Despite recent advances, the analysis of nanopore sequencing data for RNA modification detection is still a complex task that presents many challenges. Many works have addressed this task using different approaches, resulting in a large number of tools with different features and performances. Here we review the diverse approaches proposed so far and outline the principles underlying currently available algorithms.
Subject(s)
Algorithms , Computational Biology/methods , Nanopore Sequencing/methods , RNA Processing, Post-Transcriptional , RNA/chemistry , RNA/genetics , Transcriptome , Animals , Humans , SoftwareABSTRACT
RNA modifications can alter the behavior of RNA molecules depending on where they are located on the strands. Traditionally, RNA modifications have been detected and characterized by biophysical assays, mass spectrometry, or specific next-generation sequencing techniques, but are limited to specific modifications or are low throughput. Nanopore is a platform capable of sequencing RNA strands directly, which permits transcriptome-wide detection of RNA modifications. RNA modifications alter the nanopore raw signal relative to the canonical form of the nucleotide, and several software tools detect these signal alterations. One such tool is Nanocompore, which compares the ionic current features between two different experimental conditions (i.e., with and without RNA modifications) to detect RNA modifications. Nanocompore is not limited to a single type of RNA modification, has a high specificity for detecting RNA modifications, and does not require model training. To use Nanocompore, the following steps are needed: (i) the data must be basecalled and aligned to the reference transcriptome, then the raw ionic current signals are aligned to the sequences and transformed into a Nanocompore-compatible format; (ii) finally, the statistical testing is conducted on the transformed data and produces a table of p-value predictions for the positions of the RNA modifications. These steps can be executed with several different methods, and thus we have also included two alternative protocols for running Nanocompore. Once the positions of RNA modifications are determined by Nanocompore, users can investigate their function in various metabolic pathways. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: RNA modification detection by Nanocompore Alternate Protocol 1: RNA modification detection by Nanocompore with f5c Alternate Protocol 2: RNA modification detection by Nanocompore using Nextflow.
Subject(s)
Nanopore Sequencing , Nanopores , Nanopore Sequencing/methods , RNA/chemistry , RNA/genetics , RNA/metabolism , Sequence Analysis, RNA , High-Throughput Nucleotide Sequencing/methodsABSTRACT
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a positive single-stranded RNA virus, engages in complex interactions with host cell proteins throughout its life cycle. While these interactions enable the host to recognize and inhibit viral replication, they also facilitate essential viral processes such as transcription, translation, and replication. Many aspects of these virus-host interactions remain poorly understood. Here, we employed the catRAPID algorithm and utilized the RNA-protein interaction detection coupled with mass spectrometry technology to predict and validate the host proteins that specifically bind to the highly structured 5' and 3' terminal regions of the SARS-CoV-2 RNA. Among the interactions identified, we prioritized pseudouridine synthase PUS7, which binds to both ends of the viral RNA. Using nanopore direct RNA sequencing, we discovered that the viral RNA undergoes extensive post-transcriptional modifications. Modified consensus regions for PUS7 were identified at both terminal regions of the SARS-CoV-2 RNA, including one in the viral transcription regulatory sequence leader. Collectively, our findings offer insights into host protein interactions with the SARS-CoV-2 UTRs and highlight the likely significance of pseudouridine synthases and other post-transcriptional modifications in the viral life cycle. This new knowledge enhances our understanding of virus-host dynamics and could inform the development of targeted therapeutic strategies.
ABSTRACT
The Limnospira genus is a recently established clade that is economically important due to its worldwide use in biotechnology and agriculture. This genus includes organisms that were reclassified from Arthrospira, which are commercially marketed as "Spirulina." Limnospira are photoautotrophic organisms that are widely used for research in nutrition, medicine, bioremediation, and biomanufacturing. Despite its widespread use, there is no closed genome for the Limnospira genus, and no reference genome for the type strain, Limnospira fusiformis. In this work, the L. fusiformis genome was sequenced using Oxford Nanopore Technologies MinION and assembled using only ultra-long reads (>35 kb). This assembly was polished with Illumina MiSeq reads sourced from an axenic L. fusiformis culture; axenicity was verified via microscopy and rDNA analysis. Ultra-long read sequencing resulted in a 6.42 Mb closed genome assembled as a single contig with no plasmid. Phylogenetic analysis placed L. fusiformis in the Limnospira clade; some Arthrospira were also placed in this clade, suggesting a misclassification of these strains. This work provides a fully closed and accurate reference genome for the economically important type strain, L. fusiformis. We also present a rapid axenicity method to isolate L. fusiformis. These contributions enable future biotechnological development of L. fusiformis by way of genetic engineering.
ABSTRACT
Proteins present a significant challenge for nanopore-based sequence analysis. This is partly due to their stable tertiary structures that must be unfolded for linear translocation, and the absence of regular charge density. To address these challenges, here we describe how ClpXP, an ATP-dependent protein unfoldase, can be harnessed to unfold and processively translocate multi-domain protein substrates through an alpha-hemolysin nanopore sensor. This process results in ionic current patterns that are diagnostic of protein sequence and structure at the single-molecule level.
Subject(s)
Endopeptidase Clp/metabolism , Hemolysin Proteins/chemistry , Hemolysin Proteins/metabolism , Lipid Bilayers/metabolism , Nanopores , Protein Unfolding , Protein TransportABSTRACT
The ribosome small subunit is expressed in all living cells. It performs numerous essential functions during translation, including formation of the initiation complex and proofreading of base-pairs between mRNA codons and tRNA anticodons. The core constituent of the small ribosomal subunit is a ~1.5 kb RNA strand in prokaryotes (16S rRNA) and a homologous ~1.8 kb RNA strand in eukaryotes (18S rRNA). Traditional sequencing-by-synthesis (SBS) of rRNA genes or rRNA cDNA copies has achieved wide use as a 'molecular chronometer' for phylogenetic studies, and as a tool for identifying infectious organisms in the clinic. However, epigenetic modifications on rRNA are erased by SBS methods. Here we describe direct MinION nanopore sequencing of individual, full-length 16S rRNA absent reverse transcription or amplification. As little as 5 picograms (~10 attomole) of purified E. coli 16S rRNA was detected in 4.5 micrograms of total human RNA. Nanopore ionic current traces that deviated from canonical patterns revealed conserved E. coli 16S rRNA 7-methylguanosine and pseudouridine modifications, and a 7-methylguanosine modification that confers aminoglycoside resistance to some pathological E. coli strains.
Subject(s)
Nanopores , RNA, Ribosomal, 16S/genetics , Sequence Analysis, RNA/methods , Escherichia coli/genetics , RNA, Bacterial/geneticsABSTRACT
Previously we showed that the protein unfoldase ClpX could facilitate translocation of individual proteins through the α-hemolysin nanopore. This results in ionic current fluctuations that correlate with unfolding and passage of intact protein strands through the pore lumen. It is plausible that this technology could be used to identify protein domains and structural modifications at the single-molecule level that arise from subtle changes in primary amino acid sequence (e.g., point mutations). As a test, we engineered proteins bearing well-characterized domains connected in series along an â¼700 amino acid strand. Point mutations in a titin immunoglobulin domain (titin I27) and point mutations, proteolytic cleavage, and rearrangement of beta-strands in green fluorescent protein (GFP), caused ionic current pattern changes for single strands predicted by bulk phase and force spectroscopy experiments. Among these variants, individual proteins could be classified at 86-99% accuracy using standard machine learning tools. We conclude that a ClpXP-nanopore device can discriminate among distinct protein domains, and that sequence-dependent variations within those domains are detectable.