Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 4.230
Filter
Add more filters

Publication year range
1.
Nature ; 634(8035): 824-832, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39443776

ABSTRACT

DNA storage has shown potential to transcend current silicon-based data storage technologies in storage density, longevity and energy consumption1-3. However, writing large-scale data directly into DNA sequences by de novo synthesis remains uneconomical in time and cost4. We present an alternative, parallel strategy that enables the writing of arbitrary data on DNA using premade nucleic acids. Through self-assembly guided enzymatic methylation, epigenetic modifications, as information bits, can be introduced precisely onto universal DNA templates to enact molecular movable-type printing. By programming with a finite set of 700 DNA movable types and five templates, we achieved the synthesis-free writing of approximately 275,000 bits on an automated platform with 350 bits written per reaction. The data encoded in complex epigenetic patterns were retrieved high-throughput by nanopore sequencing, and algorithms were developed to finely resolve 240 modification patterns per sequencing reaction. With the epigenetic information bits framework, distributed and bespoke DNA storage was implemented by 60 volunteers lacking professional biolab experience. Our framework presents a new modality of DNA data storage that is parallel, programmable, stable and scalable. Such an unconventional modality opens up avenues towards practical data storage and dual-mode data functions in biomolecular systems.


Subject(s)
DNA Methylation , DNA , Epigenesis, Genetic , Information Storage and Retrieval , Algorithms , DNA/chemistry , DNA/genetics , High-Throughput Nucleotide Sequencing/methods , Information Storage and Retrieval/methods , Nanopores , Templates, Genetic
2.
Nature ; 633(8030): 662-669, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39261738

ABSTRACT

The ability to sequence single protein molecules in their native, full-length form would enable a more comprehensive understanding of proteomic diversity. Current technologies, however, are limited in achieving this goal1,2. Here, we establish a method for the long-range, single-molecule reading of intact protein strands on a commercial nanopore sensor array. By using the ClpX unfoldase to ratchet proteins through a CsgG nanopore3,4, we provide single-molecule evidence that ClpX translocates substrates in two-residue steps. This mechanism achieves sensitivity to single amino acids on synthetic protein strands hundreds of amino acids in length, enabling the sequencing of combinations of single-amino-acid substitutions and the mapping of post-translational modifications, such as phosphorylation. To enhance classification accuracy further, we demonstrate the ability to reread individual protein molecules multiple times, and we explore the potential for highly accurate protein barcode sequencing. Furthermore, we develop a biophysical model that can simulate raw nanopore signals a priori on the basis of residue volume and charge, enhancing the interpretation of raw signal data. Finally, we apply these methods to examine full-length, folded protein domains for complete end-to-end analysis. These results provide proof of concept for a platform that has the potential to identify and characterize full-length proteoforms at single-molecule resolution.


Subject(s)
Nanopores , Proteins , Sequence Analysis, Protein , Single Molecule Imaging , Amino Acid Substitution , Endopeptidase Clp/chemistry , Endopeptidase Clp/metabolism , Phosphorylation , Protein Domains , Protein Processing, Post-Translational , Proteins/chemistry , Proteins/metabolism , Sequence Analysis, Protein/methods , Single Molecule Imaging/methods
3.
Mol Cell ; 82(2): 237-238, 2022 01 20.
Article in English | MEDLINE | ID: mdl-35063092

ABSTRACT

Novel techniques for single-protein molecule sequencing are rapidly becoming the focus of contemporary biomedical research. Here, Brinkerhoff et al. (2021) report a significant progress in nanopore-based rereading of DNA-peptide conjugates.


Subject(s)
Nanopores , DNA , Nanotechnology , Proteomics , Sequence Analysis, DNA
4.
Mol Cell ; 77(5): 985-998.e8, 2020 03 05.
Article in English | MEDLINE | ID: mdl-31839405

ABSTRACT

Understanding how splicing events are coordinated across numerous introns in metazoan RNA transcripts requires quantitative analyses of transient RNA processing events in living cells. We developed nanopore analysis of co-transcriptional processing (nano-COP), in which nascent RNAs are directly sequenced through nanopores, exposing the dynamics and patterns of RNA splicing without biases introduced by amplification. Long nano-COP reads reveal that, in human and Drosophila cells, splicing occurs after RNA polymerase II transcribes several kilobases of pre-mRNA, suggesting that metazoan splicing transpires distally from the transcription machinery. Inhibition of the branch-site recognition complex SF3B rapidly diminished global co-transcriptional splicing. We found that splicing order does not strictly follow the order of transcription and is associated with cis-acting elements, alternative splicing, and RNA-binding factors. Further, neighboring introns in human cells tend to be spliced concurrently, implying that splicing of these introns occurs cooperatively. Thus, nano-COP unveils the organizational complexity of RNA processing.


Subject(s)
Nanopore Sequencing , Nanopores , RNA Precursors/metabolism , RNA Splicing , RNA, Messenger/metabolism , Sequence Analysis, RNA/methods , Transcriptome , Animals , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Drosophila melanogaster , Humans , Introns , K562 Cells , Kinetics , RNA Polymerase II/genetics , RNA Polymerase II/metabolism , RNA Precursors/genetics , RNA Splicing Factors/genetics , RNA Splicing Factors/metabolism , RNA, Messenger/genetics , Transcription, Genetic
5.
Genome Res ; 34(3): 454-468, 2024 04 25.
Article in English | MEDLINE | ID: mdl-38627094

ABSTRACT

Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse. We test using new variants of ONT PromethION sequencing, including those using proximity ligation, and show that newer, higher accuracy ONT reads substantially improve assembly quality.


Subject(s)
Nanopores , Humans , Sequence Analysis, DNA/methods , Nanopore Sequencing/methods , High-Throughput Nucleotide Sequencing/methods , Software , Genomics/methods
6.
Genome Res ; 34(5): 778-783, 2024 06 25.
Article in English | MEDLINE | ID: mdl-38692839

ABSTRACT

In silico simulation of high-throughput sequencing data is a technique used widely in the genomics field. However, there is currently a lack of effective tools for creating simulated data from nanopore sequencing devices, which measure DNA or RNA molecules in the form of time-series current signal data. Here, we introduce Squigulator, a fast and simple tool for simulation of realistic nanopore signal data. Squigulator takes a reference genome, a transcriptome, or read sequences, and generates corresponding raw nanopore signal data. This is compatible with basecalling software from Oxford Nanopore Technologies (ONT) and other third-party tools, thereby providing a useful substrate for development, testing, debugging, validation, and optimization at every stage of a nanopore analysis workflow. The user may generate data with preset parameters emulating specific ONT protocols or noise-free "ideal" data, or they may deterministically modify a range of experimental variables and/or noise parameters to shape the data to their needs. We present a brief example of Squigulator's use, creating simulated data to model the degree to which different parameters impact the accuracy of ONT basecalling and downstream variant detection. This analysis reveals new insights into the nature of ONT data and basecalling algorithms. We provide Squigulator as an open-source tool for the nanopore community.


Subject(s)
Nanopore Sequencing , Software , Nanopore Sequencing/methods , Computer Simulation , High-Throughput Nucleotide Sequencing/methods , Nanopores , Humans , Genomics/methods , Sequence Analysis, DNA/methods , Algorithms
7.
Nat Methods ; 21(1): 102-109, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37957431

ABSTRACT

Direct protein sequencing technologies with improved sensitivity and throughput are still needed. Here, we propose an alternative method for peptide sequencing based on enzymatic cleavage and host-guest interaction-assisted nanopore sensing. We serendipitously discovered that the identity of any proteinogenic amino acid in a particular position of a phenylalanine-containing peptide could be determined via current blockage during translocation of the peptide through α-hemolysin nanopores in the presence of cucurbit[7]uril. Building upon this, we further present a proof-of-concept demonstration of peptide sequencing by sequentially cleaving off amino acids from C terminus of a peptide with carboxypeptidases, and then determining their identities and sequence with a peptide probe in nanopore. With future optimization, our results point to a different way of nanopore-based protein sequencing.


Subject(s)
Nanopores , Peptides , Amino Acid Sequence , Hemolysin Proteins/chemistry
8.
Nat Methods ; 21(4): 609-618, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38443507

ABSTRACT

Precise identification and quantification of amino acids is crucial for many biological applications. Here we report a copper(II)-functionalized Mycobacterium smegmatis porin A (MspA) nanopore with the N91H substitution, which enables direct identification of all 20 proteinogenic amino acids when combined with a machine-learning algorithm. The validation accuracy reaches 99.1%, with 30.9% signal recovery. The feasibility of ultrasensitive quantification of amino acids was also demonstrated at the nanomolar range. Furthermore, the capability of this system for real-time analyses of two representative post-translational modifications (PTMs), one unnatural amino acid and ten synthetic peptides using exopeptidases, including clinically relevant peptides associated with Alzheimer's disease and cancer neoantigens, was demonstrated. Notably, our strategy successfully distinguishes peptides with only one amino acid difference from the hydrolysate and provides the possibility to infer the peptide sequence.


Subject(s)
Nanopores , Amino Acids/chemistry , Peptides/chemistry , Amino Acid Sequence , Porins/chemistry , Porins/metabolism
9.
Nat Methods ; 21(4): 574-583, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38459383

ABSTRACT

Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.


Subject(s)
Nanopores , Diploidy , Haploidy , High-Throughput Nucleotide Sequencing/methods , Telomere/genetics , Sequence Analysis, DNA/methods
10.
Nat Methods ; 21(1): 92-101, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37749214

ABSTRACT

Natural proteins are composed of 20 proteinogenic amino acids and their post-translational modifications (PTMs). However, due to the lack of a suitable nanopore sensor that can simultaneously discriminate between all 20 amino acids and their PTMs, direct sequencing of protein with nanopores has not yet been realized. Here, we present an engineered hetero-octameric Mycobacterium smegmatis porin A (MspA) nanopore containing a sole Ni2+ modification. It enables full discrimination of all 20 proteinogenic amino acids and 4 representative modified amino acids, Nω,N'ω-dimethyl-arginine (Me-R), O-acetyl-threonine (Ac-T), N4-(ß-N-acetyl-D-glucosaminyl)-asparagine (GlcNAc-N) and O-phosphoserine (P-S). Assisted by machine learning, an accuracy of 98.6% was achieved. Amino acid supplement tablets and peptidase-digested amino acids from peptides were also analyzed using this strategy. This capacity for simultaneous discrimination of all 20 proteinogenic amino acids and their PTMs suggests the potential to achieve protein sequencing using this nanopore-based strategy.


Subject(s)
Nanopores , Amino Acids/chemistry , Proteins/metabolism , Porins/chemistry , Porins/metabolism , Peptides/chemistry
11.
Proc Natl Acad Sci U S A ; 121(38): e2405018121, 2024 Sep 17.
Article in English | MEDLINE | ID: mdl-39264741

ABSTRACT

The transport of biopolymers across nanopores is an important biological process currently under investigation for the rapid analysis of DNA and proteins. While the transport of DNA is generally understood, methods to induce unfolded protein translocation have only recently been discovered (Yu et al., 2023, Sauciuc et al., 2023). Here, we found that during electroosmotically driven translocation of polypeptides, blob-like structures typically form inside nanopores, often obstructing their transport and preventing addressing individual amino acids. This is in contrast with the electrophoretic transport of DNA, where the formation of such structures has not been reported. Comparisons between different nanopore sizes and shapes and modifications by different surface chemistries allowed formulating a mechanism for blob formation. We also show that single-file transport can be achieved by using 1) nanopores that have an entry and an internal diameter smaller than the persistence length of the polymer, 2) nanopores with a nonsticky (i.e., nonaromatic) inner surface, and 3) moderate translocation velocities. These experiments provide a basis for understanding polypeptide transport under confinement and for improving the design and engineering of nanopores for protein analysis.


Subject(s)
Nanopores , Protein Transport , Proteins/chemistry , Proteins/metabolism , Peptides/chemistry , Peptides/metabolism , DNA/chemistry , DNA/metabolism , Electroosmosis
12.
Proc Natl Acad Sci U S A ; 121(29): e2321017121, 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-38990947

ABSTRACT

RNA polymerases (RNAPs) carry out the first step in the central dogma of molecular biology by transcribing DNA into RNA. Despite their importance, much about how RNAPs work remains unclear, in part because the small (3.4 Angstrom) and fast (~40 ms/nt) steps during transcription were difficult to resolve. Here, we used high-resolution nanopore tweezers to observe the motion of single Escherichia coli RNAP molecules as it transcribes DNA ~1,000 times improved temporal resolution, resolving single-nucleotide and fractional-nucleotide steps of individual RNAPs at saturating nucleoside triphosphate concentrations. We analyzed RNAP during processive transcription elongation and sequence-dependent pausing at the yrbL elemental pause sequence. Each time RNAP encounters the yrbL elemental pause sequence, it rapidly interconverts between five translocational states, residing predominantly in a half-translocated state. The kinetics and force-dependence of this half-translocated state indicate it is a functional intermediate between pre- and post-translocated states. Using structural and kinetics data, we show that, in the half-translocated and post-translocated states, sequence-specific protein-DNA interaction occurs between RNAP and a guanine base at the downstream end of the transcription bubble (core recognition element). Kinetic data show that this interaction stabilizes the half-translocated and post-translocated states relative to the pre-translocated state. We develop a kinetic model for RNAP at the yrbL pause and discuss this in the context of key structural features.


Subject(s)
DNA-Directed RNA Polymerases , Escherichia coli , Nanopores , DNA-Directed RNA Polymerases/metabolism , DNA-Directed RNA Polymerases/chemistry , DNA-Directed RNA Polymerases/genetics , Escherichia coli/metabolism , Escherichia coli/genetics , Transcription, Genetic , Escherichia coli Proteins/metabolism , Escherichia coli Proteins/genetics , Escherichia coli Proteins/chemistry , Optical Tweezers , Kinetics , Nucleotides/metabolism
13.
Proc Natl Acad Sci U S A ; 121(16): e2400203121, 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38598338

ABSTRACT

Viral outbreaks can cause widespread disruption, creating the need for diagnostic tools that provide high performance and sample versatility at the point of use with moderate complexity. Current gold standards such as PCR and rapid antigen tests fall short in one or more of these aspects. Here, we report a label-free and amplification-free nanopore sensor platform that overcomes these challenges via direct detection and quantification of viral RNA in clinical samples from a variety of biological fluids. The assay uses an optofluidic chip that combines optical waveguides with a fluidic channel and integrates a solid-state nanopore for sensing of individual biomolecules upon translocation through the pore. High specificity and low limit of detection are ensured by capturing RNA targets on microbeads and collecting them by optical trapping at the nanopore location where targets are released and rapidly detected. We use this device for longitudinal studies of the viral load progression for Zika and Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) infections in marmoset and baboon animal models, respectively. The up to million-fold trapping-based target concentration enhancement enables amplification-free RNA quantification across the clinically relevant concentration range down to the assay limit of RT-qPCR as well as cases in which PCR failed. The assay operates across all relevant biofluids, including semen, urine, and whole blood for Zika and nasopharyngeal and throat swab, rectal swab, and bronchoalveolar lavage for SARS-CoV-2. The versatility, performance, simplicity, and potential for full microfluidic integration of the amplification-free nanopore assay points toward a unique approach to molecular diagnostics for nucleic acids, proteins, and other targets.


Subject(s)
Nanopores , Zika Virus Infection , Zika Virus , Animals , RNA, Viral/genetics , RNA, Viral/metabolism , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , Primates/genetics , Zika Virus/genetics , Sensitivity and Specificity , Nucleic Acid Amplification Techniques
14.
Genome Res ; 33(12): 2029-2040, 2023 12 27.
Article in English | MEDLINE | ID: mdl-38190646

ABSTRACT

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.


Subject(s)
Genomics , Nanopores , INDEL Mutation , Whole Genome Sequencing
15.
Genome Res ; 33(4): 612-621, 2023 04.
Article in English | MEDLINE | ID: mdl-37041035

ABSTRACT

Rare species are vital members of a microbial community, but retrieving their genomes is difficult because of their low abundance. The ReadUntil (RU) approach allows nanopore devices to sequence specific DNA molecules selectively in real time, which provides an opportunity for enriching rare species. Despite the robustness of enriching rare species by reducing the sequencing depth of known host sequences, such as the human genome, there is still a gap in RU-based enriching of rare species in environmental samples whose community composition is unclear, and many rare species have poor or incomplete reference genomes in public databases. Therefore, here we present metaRUpore to overcome this challenge. When we applied metaRUpore to a thermophilic anaerobic digester (TAD) community and human gut microbial community, it reduced coverage of the high-abundance populations and modestly increased (∼2×) the genome coverage of the rare taxa, facilitating successful recovery of near-finished metagenome-assembled genomes (nf-MAGs) of rare species. The simplicity and robustness of the approach make it accessible for laboratories with moderate computational resources, and hold the potential to become the standard practice in future metagenomic sequencing of complicated microbiomes.


Subject(s)
Microbiota , Nanopores , Humans , Microbiota/genetics , Metagenome , Metagenomics
16.
Genome Res ; 33(6): 907-922, 2023 06.
Article in English | MEDLINE | ID: mdl-37433640

ABSTRACT

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.


Subject(s)
DNA, Z-Form , Nanopores , Humans , Nucleotide Motifs , Sequence Analysis, DNA , DNA/genetics , Base Composition , High-Throughput Nucleotide Sequencing
17.
Nat Methods ; 20(6): 849-859, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37106231

ABSTRACT

Genome-wide measurements of RNA structure can be obtained using reagents that react with unpaired bases, leading to adducts that can be identified by mutational profiling on next-generation sequencing machines. One drawback of these experiments is that short sequencing reads can rarely be mapped to specific transcript isoforms. Consequently, information is acquired as a population average in regions that are shared between transcripts, thus blurring the underlying structural landscape. Here, we present nanopore dimethylsulfate mutational profiling (Nano-DMS-MaP)-a method that exploits long-read sequencing to provide isoform-resolved structural information of highly similar RNA molecules. We demonstrate the value of Nano-DMS-MaP by resolving the complex structural landscape of human immunodeficiency virus-1 transcripts in infected cells. We show that unspliced and spliced transcripts have distinct structures at the packaging site within the common 5' untranslated region, likely explaining why spliced viral RNAs are excluded from viral particles. Thus, Nano-DMS-MaP is a straightforward method to resolve biologically important transcript-specific RNA structures that were previously hidden in short-read ensemble analyses.


Subject(s)
Nanopores , RNA , Humans , RNA/genetics , Mutation , Protein Isoforms/genetics , RNA, Viral/genetics , RNA, Viral/chemistry , Sequence Analysis, RNA
18.
Nat Methods ; 20(1): 75-85, 2023 01.
Article in English | MEDLINE | ID: mdl-36536091

ABSTRACT

RNA polyadenylation plays a central role in RNA maturation, fate, and stability. In response to developmental cues, polyA tail lengths can vary, affecting the translation efficiency and stability of mRNAs. Here we develop Nanopore 3' end-capture sequencing (Nano3P-seq), a method that relies on nanopore cDNA sequencing to simultaneously quantify RNA abundance, tail composition, and tail length dynamics at per-read resolution. By employing a template-switching-based sequencing protocol, Nano3P-seq can sequence RNA molecule from its 3' end, regardless of its polyadenylation status, without the need for PCR amplification or ligation of RNA adapters. We demonstrate that Nano3P-seq provides quantitative estimates of RNA abundance and tail lengths, and captures a wide diversity of RNA biotypes. We find that, in addition to mRNA and long non-coding RNA, polyA tails can be identified in 16S mitochondrial ribosomal RNA in both mouse and zebrafish models. Moreover, we show that mRNA tail lengths are dynamically regulated during vertebrate embryogenesis at an isoform-specific level, correlating with mRNA decay. Finally, we demonstrate the ability of Nano3P-seq in capturing non-A bases within polyA tails of various lengths, and reveal their distribution during vertebrate embryogenesis. Overall, Nano3P-seq is a simple and robust method for accurately estimating transcript levels, tail lengths, and tail composition heterogeneity in individual reads, with minimal library preparation biases, both in the coding and non-coding transcriptome.


Subject(s)
Nanopores , Transcriptome , Animals , Mice , DNA, Complementary/genetics , Zebrafish/genetics , Zebrafish/metabolism , Poly A/genetics , Poly A/metabolism , Gene Expression Profiling , RNA/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Analysis, RNA/methods
19.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38279646

ABSTRACT

N6-methyladenosine (m6A) is the most abundant internal eukaryotic mRNA modification, and is involved in the regulation of various biological processes. Direct Nanopore sequencing of native RNA (dRNA-seq) emerged as a leading approach for its identification. Several software were published for m6A detection and there is a strong need for independent studies benchmarking their performance on data from different species, and against various reference datasets. Moreover, a computational workflow is needed to streamline the execution of tools whose installation and execution remains complicated. We developed NanOlympicsMod, a Nextflow pipeline exploiting containerized technology for comparing 14 tools for m6A detection on dRNA-seq data. NanOlympicsMod was tested on dRNA-seq data generated from in vitro (un)modified synthetic oligos. The m6A hits returned by each tool were compared to the m6A position known by design of the oligos. In addition, NanOlympicsMod was used on dRNA-seq datasets from wild-type and m6A-depleted yeast, mouse and human, and each tool's hits were compared to reference m6A sets generated by leading orthogonal methods. The performance of the tools markedly differed across datasets, and methods adopting different approaches showed different preferences in terms of precision and recall. Changing the stringency cut-offs allowed for tuning the precision-recall trade-off towards user preferences. Finally, we determined that precision and recall of tools are markedly influenced by sequencing depth, and that additional sequencing would likely reveal additional m6A sites. Thanks to the possibility of including novel tools, NanOlympicsMod will streamline the benchmarking of m6A detection tools on dRNA-seq data, improving future RNA modification characterization.


Subject(s)
Adenine/analogs & derivatives , Nanopore Sequencing , Nanopores , Humans , Animals , Mice , RNA/genetics , Benchmarking , Sequence Analysis, RNA/methods
20.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39226890

ABSTRACT

Nanopore selective sequencing allows the targeted sequencing of DNA of interest using computational approaches rather than experimental methods such as targeted multiplex polymerase chain reaction or hybridization capture. Compared to sequence-alignment strategies, deep learning (DL) models for classifying target and nontarget DNA provide large speed advantages. However, the relatively low accuracy of these DL-based tools hinders their application in nanopore selective sequencing. Here, we present a DL-based tool named ReadCurrent for nanopore selective sequencing, which takes electric currents as inputs. ReadCurrent employs a modified very deep convolutional neural network (VDCNN) architecture, enabling significantly lower computational costs for training and quicker inference compared to conventional VDCNN. We evaluated the performance of ReadCurrent across 10 nanopore sequencing datasets spanning human, yeasts, bacteria, and viruses. We observed that ReadCurrent achieved a mean accuracy of 98.57% for classification, outperforming four other DL-based selective sequencing methods. In experimental validation that selectively sequenced microbial DNA from human DNA, ReadCurrent achieved an enrichment ratio of 2.85, which was higher than the 2.7 ratio achieved by MinKNOW using the sequence-alignment strategy. In summary, ReadCurrent can rapidly classify target and nontarget DNA with high accuracy, providing an alternative in the toolbox for nanopore selective sequencing. ReadCurrent is available at https://github.com/Ming-Ni-Group/ReadCurrent.


Subject(s)
Nanopore Sequencing , Nanopore Sequencing/methods , Humans , Sequence Analysis, DNA/methods , Neural Networks, Computer , Nanopores , Software , Deep Learning , Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods
SELECTION OF CITATIONS
SEARCH DETAIL