RESUMO
High-throughput sequencing dramatically changed our view of transcriptome architectures and allowed for ground-breaking discoveries in RNA biology. Recently, sequencing of full-length transcripts based on the single-molecule sequencing platform from Oxford Nanopore Technologies (ONT) was introduced and is widely used to sequence eukaryotic and viral RNAs. However, experimental approaches implementing this technique for prokaryotic transcriptomes remain scarce. Here, we present an experimental and bioinformatic workflow for ONT RNA-seq in the bacterial model organism Escherichia coli, which can be applied to any microorganism. Our study highlights critical steps of library preparation and computational analysis and compares the results to gold standards in the field. Furthermore, we comprehensively evaluate the applicability and advantages of different ONT-based RNA sequencing protocols, including direct RNA, direct cDNA, and PCR-cDNA. We find that (PCR)-cDNA-seq offers improved yield and accuracy compared to direct RNA sequencing. Notably, (PCR)-cDNA-seq is suitable for quantitative measurements and can be readily used for simultaneous and accurate detection of transcript 5' and 3' boundaries, analysis of transcriptional units, and transcriptional heterogeneity. In summary, based on our comprehensive study, we show nanopore RNA-seq to be a ready-to-use tool allowing rapid, cost-effective, and accurate annotation of multiple transcriptomic features. Thereby nanopore RNA-seq holds the potential to become a valuable alternative method for RNA analysis in prokaryotes.
Assuntos
DNA Bacteriano/química , Sequenciamento por Nanoporos/métodos , RNA Bacteriano/química , DNA Bacteriano/genética , Escherichia coli , Sequenciamento por Nanoporos/normas , RNA Bacteriano/genéticaRESUMO
Nanopore sequencing has been widely used for the reconstruction of microbial genomes. Owing to higher error rates, errors on the genome are corrected via neural networks trained by Nanopore reads. However, the systematic errors usually remain uncorrected. This paper designs a model that is trained by homologous sequences for the correction of Nanopore systematic errors. The developed program, Homopolish, outperforms Medaka and HELEN in bacteria, viruses, fungi, and metagenomic datasets. When combined with Medaka/HELEN, the genome quality can exceed Q50 on R9.4 flow cells. We show that Nanopore-only sequencing can produce high-quality microbial genomes sufficient for downstream analysis.
Assuntos
Biologia Computacional/métodos , Biologia Computacional/normas , Genômica/métodos , Sequenciamento por Nanoporos/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Bactérias/genética , Fungos/genética , Genômica/normas , Metagenoma , Metagenômica , Sequenciamento por Nanoporos/métodos , Vírus/genéticaRESUMO
Protein engineering and synthetic biology applications increasingly rely on the assembly of modular libraries composed of thousands of different combinations of DNA building blocks. At present, the validation of such libraries is performed by Sanger sequencing analysis on a small subset of clones on an ad hoc basis. Here, we implement a systematic procedure for the comprehensive evaluation of combinatorial libraries, immediately after their creation in vitro, using long reads sequencing technology. After an initial step of nanopore sequencing, we use straightforward bioinformatics tools to tabulate the composition and synteny of the building blocks in each read. We subsequently use exploratory statistics to assess the library and validate its diversity before carrying downstream cloning and screening assays.
Assuntos
Biblioteca Gênica , Sequenciamento por Nanoporos/métodos , Sequenciamento por Nanoporos/normas , Estatística como Assunto , Controle de Qualidade , Análise de Sequência de DNARESUMO
BACKGROUND: We benchmarked the hybrid assembly approaches of MaSuRCA, SPAdes, and Unicycler for bacterial pathogens using Illumina and Oxford Nanopore sequencing by determining genome completeness and accuracy, antimicrobial resistance (AMR), virulence potential, multilocus sequence typing (MLST), phylogeny, and pan genome. Ten bacterial species (10 strains) were tested for simulated reads of both mediocre- and low-quality, whereas 11 bacterial species (12 strains) were tested for real reads. RESULTS: Unicycler performed the best for achieving contiguous genomes, closely followed by MaSuRCA, while all SPAdes assemblies were incomplete. MaSuRCA was less tolerant of low-quality long reads than SPAdes and Unicycler. The hybrid assemblies of five antimicrobial-resistant strains with simulated reads provided consistent AMR genotypes with the reference genomes. The MaSuRCA assembly of Staphylococcus aureus with real reads contained msr(A) and tet(K), while the reference genome and SPAdes and Unicycler assemblies harbored blaZ. The AMR genotypes of the reference genomes and hybrid assemblies were consistent for the other five antimicrobial-resistant strains with real reads. The numbers of virulence genes in all hybrid assemblies were similar to those of the reference genomes, irrespective of simulated or real reads. Only one exception existed that the reference genome and hybrid assemblies of Pseudomonas aeruginosa with mediocre-quality long reads carried 241 virulence genes, whereas 184 virulence genes were identified in the hybrid assemblies of low-quality long reads. The MaSuRCA assemblies of Escherichia coli O157:H7 and Salmonella Typhimurium with mediocre-quality long reads contained 126 and 118 virulence genes, respectively, while 110 and 107 virulence genes were detected in their MaSuRCA assemblies of low-quality long reads, respectively. All approaches performed well in our MLST and phylogenetic analyses. The pan genomes of the hybrid assemblies of S. Typhimurium with mediocre-quality long reads were similar to that of the reference genome, while SPAdes and Unicycler were more tolerant of low-quality long reads than MaSuRCA for the pan-genome analysis. All approaches functioned well in the pan-genome analysis of Campylobacter jejuni with real reads. CONCLUSIONS: Our research demonstrates the hybrid assembly pipeline of Unicycler as a superior approach for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.
Assuntos
Genoma Bacteriano , Genômica/métodos , Sequenciamento por Nanoporos/métodos , Benchmarking , Campylobacter jejuni , Mapeamento de Sequências Contíguas/métodos , Mapeamento de Sequências Contíguas/normas , Cronobacter sakazakii , Farmacorresistência Bacteriana , Genômica/normas , Listeria monocytogenes , Sequenciamento por Nanoporos/normas , Pseudomonas aeruginosa , Salmonella typhimurium , VirulênciaRESUMO
The recent advent of third-generation sequencing technologies brings promise for better characterization of genomic structural variants by virtue of having longer reads. However, long-read applications are still constrained by their high sequencing error rates and low sequencing throughput. Here, we present NanoVar, an optimized structural variant caller utilizing low-depth (8X) whole-genome sequencing data generated by Oxford Nanopore Technologies. NanoVar exhibits higher structural variant calling accuracy when benchmarked against current tools using low-depth simulated datasets. In patient samples, we successfully validate structural variants characterized by NanoVar and uncover normal alternative sequences or alleles which are present in healthy individuals.
Assuntos
Testes Genéticos/métodos , Variação Estrutural do Genoma , Leucemia Mieloide/genética , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Células Cultivadas , Testes Genéticos/normas , Células HCT116 , Humanos , Leucemia Mieloide/patologia , Sequenciamento por Nanoporos/normas , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA/normasRESUMO
BACKGROUND: Klebsiella pneumoniae frequently harbours multidrug resistance, and current diagnostics struggle to rapidly identify appropriate antibiotics to treat these bacterial infections. The MinION device can sequence native DNA and RNA in real time, providing an opportunity to compare the utility of DNA and RNA for prediction of antibiotic susceptibility. However, the effectiveness of bacterial direct RNA sequencing and base-calling has not previously been investigated. This study interrogated the genome and transcriptome of 4 extensively drug-resistant (XDR) K. pneumoniae clinical isolates; however, further antimicrobial susceptibility testing identified 3 isolates as pandrug-resistant (PDR). RESULTS: The majority of acquired resistance (≥75%) resided on plasmids including several megaplasmids (≥100 kb). DNA sequencing detected most resistance genes (≥70%) within 2 hours of sequencing. Neural network-based base-calling of direct RNA achieved up to 86% identity rate, although ≤23% of reads could be aligned. Direct RNA sequencing (with â¼6 times slower pore translocation) was able to identify (within 10 hours) ≥35% of resistance genes, including those associated with resistance to aminoglycosides, ß-lactams, trimethoprim, and sulphonamide and also quinolones, rifampicin, fosfomycin, and phenicol in some isolates. Direct RNA sequencing also identified the presence of operons containing up to 3 resistance genes. Polymyxin-resistant isolates showed a heightened transcription of phoPQ (≥2-fold) and the pmrHFIJKLM operon (≥8-fold). Expression levels estimated from direct RNA sequencing displayed strong correlation (Pearson: 0.86) compared to quantitative real-time PCR across 11 resistance genes. CONCLUSION: Overall, MinION sequencing rapidly detected the XDR/PDR K. pneumoniae resistome, and direct RNA sequencing provided accurate estimation of expression levels of these genes.
Assuntos
Farmacorresistência Bacteriana Múltipla , Klebsiella pneumoniae/genética , Sequenciamento por Nanoporos/métodos , RNA-Seq/métodos , Genoma Bacteriano , Klebsiella pneumoniae/efeitos dos fármacos , Sequenciamento por Nanoporos/normas , RNA-Seq/normas , TranscriptomaRESUMO
BACKGROUND: Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic guanine-cytosine (GC) contents. RESULTS: We explored such GC biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows using MiSeq and NextSeq were hindered by major GC biases, with problems becoming increasingly severe outside the 45-65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had >10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC biases to each other, which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted by GC bias. CONCLUSIONS: These findings indicate potential sources of difficulty, arising from GC biases, in genome sequencing that could be pre-emptively addressed with methodological optimizations provided that the GC biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach be taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC bias before drawing conclusions, or they should use a demonstrably unbiased workflow.
Assuntos
Composição de Bases , Genoma Bacteriano , Metagenoma , Metagenômica/normas , Viés , Fusobacterium/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Metagenômica/métodos , Sequenciamento por Nanoporos/métodos , Sequenciamento por Nanoporos/normas , Software/normasRESUMO
Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
Assuntos
Genômica/métodos , Sequenciamento por Nanoporos/métodos , Sequenciamento Completo do Genoma/métodos , Animais , Ciência de Dados/métodos , Ciência de Dados/normas , Genômica/normas , Humanos , Sequenciamento por Nanoporos/normas , Sequenciamento Completo do Genoma/normasRESUMO
INTRODUCTION: Bi-allelic mutations in the gene for glucocerebrosidase (GBA) cause Gaucher disease, an autosomal recessive lysosomal storage disorder. Gaucher disease causing GBA mutations in the heterozygous state are also high risk factors for Parkinson's disease (PD). GBA analysis is challenging due to a related pseudogene and structural variations (SVs) that can occur at this locus. We have applied and refined a recently developed nanopore DNA sequencing method to analyze GBA variants in a clinically assessed New Zealand longitudinal cohort of PD. METHOD: We examined amplicons encompassing the coding region of GBA (8.9 kb) from 229 PD cases and 50 healthy controls using the GridION nanopore sequencing platform, and Sanger validation. RESULTS: We detected 23 variants in 21 PD cases (9.2% of patients). We detected modest PD risk variant p.N409S (rs76763715) in one case, p.E365K (rs2230288) in 12 cases, and p.T408 M (rs75548401) in seven cases, one of whom also had p.E365K. We additionally detected the possible risk variants p.R78C (rs146774384) in one case, p.D179H (rs147138516) in one case which occurred on the same haplotype as p.E365K, and one novel variant c.335C > T or p.(L335 = ), that potentially impacts splicing of GBA transcripts. Additionally, we found a higher prevalence of dementia among patients with GBA variants. CONCLUSION: This work confirmed the utility of nanopore sequencing as a high-throughput method to identify known and novel GBA variants, and to assign precise haplotypes. Our observations may contribute to improved understanding of the effects of variants on disease pathogenesis, and to the development of more targeted treatments.
Assuntos
Demência/genética , Glucosilceramidase/genética , Sequenciamento por Nanoporos/normas , Doença de Parkinson/genética , Idoso , Idoso de 80 Anos ou mais , Estudos de Coortes , Demência/etiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Nova Zelândia , Doença de Parkinson/complicações , Reprodutibilidade dos Testes , Análise de Sequência de DNARESUMO
Metagenomic sequencing for infectious disease diagnostics is an important tool that holds promise for use in the clinical laboratory. Challenges for implementation so far include high cost, the length of time to results, and the need for technical and bioinformatics expertise. However, the recent technological innovation of nanopore sequencing from Oxford Nanopore Technologies (ONT) has the potential to address these challenges. ONT sequencing is an attractive platform for clinical laboratories to adopt due to its low cost, rapid turnaround time, and user-friendly bioinformatics pipelines. However, this method still faces the problem of base-calling accuracy compared to other platforms. This review highlights the general challenges of pathogen detection in clinical specimens by metagenomic sequencing, the advantages and disadvantages of the ONT platform, and how research to date supports the potential future use of nanopore sequencing in infectious disease diagnostics.
Assuntos
Serviços de Laboratório Clínico , Técnicas de Laboratório Clínico , Doenças Transmissíveis/diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento por Nanoporos , Serviços de Laboratório Clínico/normas , Doenças Transmissíveis/etiologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Sequenciamento por Nanoporos/métodos , Sequenciamento por Nanoporos/normasRESUMO
BACKGROUND: Long sequencing reads are information-rich: aiding de novo assembly and reference mapping, and consequently have great potential for the study of microbial communities. However, the best approaches for analysis of long-read metagenomic data are unknown. Additionally, rigorous evaluation of bioinformatics tools is hindered by a lack of long-read data from validated samples with known composition. FINDINGS: We sequenced 2 commercially available mock communities containing 10 microbial species (ZymoBIOMICS Microbial Community Standards) with Oxford Nanopore GridION and PromethION. Both communities and the 10 individual species isolates were also sequenced with Illumina technology. We generated 14 and 16 gigabase pairs from 2 GridION flowcells and 150 and 153 gigabase pairs from 2 PromethION flowcells for the evenly distributed and log-distributed communities, respectively. Read length N50 ranged between 5.3 and 5.4 kilobase pairs over the 4 sequencing runs. Basecalls and corresponding signal data are made available (4.2 TB in total). Alignment to Illumina-sequenced isolates demonstrated the expected microbial species at anticipated abundances, with the limit of detection for the lowest abundance species below 50 cells (GridION). De novo assembly of metagenomes recovered long contiguous sequences without the need for pre-processing techniques such as binning. CONCLUSIONS: We present ultra-deep, long-read nanopore datasets from a well-defined mock community. These datasets will be useful for those developing bioinformatics methods for long-read metagenomics and for the validation and comparison of current laboratory and software pipelines.
Assuntos
Metagenoma , Metagenômica/métodos , Microbiota/genética , Sequenciamento por Nanoporos/métodos , Metagenômica/normas , Sequenciamento por Nanoporos/normas , Padrões de ReferênciaRESUMO
We describe fast and accurate algorithm for IonTorrent read error correction capable of significantly reducing the number of sequencing errors over the wide range of data sets. IonHammer is implemented in C++ and is freely available as part of the SPAdes genome assembler package.