Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Genome Res ; 2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38692839

RESUMO

In silico simulation of high-throughput sequencing data is a technique used widely in the genomics field. However, there is currently a lack of effective tools for creating simulated data from nanopore sequencing devices, which measure DNA or RNA molecules in the form of time-series current signal data. Here, we introduce Squigulator, a fast and simple tool for simulation of realistic nanopore signal data. Squigulator takes a reference genome, a transcriptome, or read sequences, and generates corresponding raw nanopore signal data. This is compatible with basecalling software from Oxford Nanopore Technologies (ONT) and other third-party tools, thereby providing a useful substrate for development, testing, debugging, validation, and optimization at every stage of a nanopore analysis workflow. The user may generate data with preset parameters emulating specific ONT protocols or noise-free "ideal" data, or they may deterministically modify a range of experimental variables and/or noise parameters to shape the data to their needs. We present a brief example of Squigulator's use, creating simulated data to model the degree to which different parameters impact the accuracy of ONT basecalling and downstream variant detection. This analysis reveals new insights into the nature of ONT data and basecalling algorithms. We provide Squigulator as an open-source tool for the nanopore community.

2.
Sci Rep ; 13(1): 20174, 2023 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-37978244

RESUMO

minimap2 is the gold-standard software for reference-based sequence mapping in third-generation long-read sequencing. While minimap2 is relatively fast, further speedup is desirable, especially when processing a multitude of large datasets. In this work, we present minimap2-fpga, a hardware-accelerated version of minimap2 that speeds up the mapping process by integrating an FPGA kernel optimised for chaining. Integrating the FPGA kernel into minimap2 posed significant challenges that we solved by accurately predicting the processing time on hardware while considering data transfer overheads, mitigating hardware scheduling overheads in a multi-threaded environment, and optimizing memory management for processing large realistic datasets. We demonstrate speed-ups in end-to-end run-time for data from both Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). minimap2-fpga is up to 79% and 53% faster than minimap2 for [Formula: see text] ONT and [Formula: see text] PacBio datasets respectively, when mapping without base-level alignment. When mapping with base-level alignment, minimap2-fpga is up to 62% and 10% faster than minimap2 for [Formula: see text] ONT and [Formula: see text] PacBio datasets respectively. The accuracy is near-identical to that of original minimap2 for both ONT and PacBio data, when mapping both with and without base-level alignment. minimap2-fpga is supported on Intel FPGA-based systems (evaluations performed on an on-premise system) and Xilinx FPGA-based systems (evaluations performed on a cloud system). We also provide a well-documented library for the FPGA-accelerated chaining kernel to be used by future researchers developing sequence alignment software with limited hardware background.


Assuntos
Algoritmos , Software , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Alinhamento de Sequência
3.
PLoS One ; 18(9): e0291763, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37729154

RESUMO

Cinnamomum species have gained worldwide attention because of their economic benefits. Among them, C. verum (synonymous with C. zeylanicum Blume), commonly known as Ceylon Cinnamon or True Cinnamon is mainly produced in Sri Lanka. In addition, Sri Lanka is home to seven endemic wild cinnamon species, C. capparu-coronde, C. citriodorum, C. dubium, C. litseifolium, C. ovalifolium, C. rivulorum and C. sinharajaense. Proper identification and genetic characterization are fundamental for the conservation and commercialization of these species. While some species can be identified based on distinct morphological or chemical traits, others cannot be identified easily morphologically or chemically. The DNA barcoding using rbcL, matK, and trnH-psbA regions could not also resolve the identification of Cinnamomum species in Sri Lanka. Therefore, we generated Illumina Hiseq data of about 20x coverage for each identified species and a C. verum sample (India) and assembled the chloroplast genome, nuclear ITS regions, and several mitochondrial genes, and conducted Skmer analysis. Chloroplast genomes of all eight species were assembled using a seed-based method.According to the Bayesian phylogenomic tree constructed with the complete chloroplast genomes, the C. verum (Sri Lanka) is sister to previously sequenced C. verum (NC_035236.1, KY635878.1), C. dubium and C. rivulorum. The C. verum sample from India is sister to C. litseifolium and C. ovalifolium. According to the ITS regions studied, C. verum (Sri Lanka) is sister to C. verum (NC_035236.1), C. dubium and C. rivulorum. Cinnamomum verum (India) shares an identical ITS region with C. ovalifolium, C. litseifolium, C. citriodorum, and C. capparu-coronde. According to the Skmer analysis C. verum (Sri Lanka) is sister to C. dubium and C. rivulorum, whereas C. verum (India) is sister to C. ovalifolium, and C. litseifolium. The chloroplast gene ycf1 was identified as a chloroplast barcode for the identification of Cinnamomum species. We identified an 18 bp indel region in the ycf1 gene, that could differentiate C. verum (India) and C. verum (Sri Lanka) samples tested.


Assuntos
Cinnamomum , Genoma de Cloroplastos , Genoma Mitocondrial , Cinnamomum/genética , Sri Lanka , Teorema de Bayes , Cinnamomum zeylanicum
4.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37252813

RESUMO

MOTIVATION: Nanopore sequencing is emerging as a key pillar in the genomic technology landscape but computational constraints limiting its scalability remain to be overcome. The translation of raw current signal data into DNA or RNA sequence reads, known as 'basecalling', is a major friction in any nanopore sequencing workflow. Here, we exploit the advantages of the recently developed signal data format 'SLOW5' to streamline and accelerate nanopore basecalling on high-performance computing (HPC) and cloud environments. RESULTS: SLOW5 permits highly efficient sequential data access, eliminating a potential analysis bottleneck. To take advantage of this, we introduce Buttery-eel, an open-source wrapper for Oxford Nanopore's Guppy basecaller that enables SLOW5 data access, resulting in performance improvements that are essential for scalable, affordable basecalling. AVAILABILITY AND IMPLEMENTATION: Buttery-eel is available at https://github.com/Psy-Fer/buttery-eel.


Assuntos
Nanoporos , Software , Análise de Sequência de DNA/métodos , Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala
5.
Genome Biol ; 24(1): 69, 2023 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-37024927

RESUMO

Nanopore sequencing is being rapidly adopted in genomics. We recently developed SLOW5, a new file format with advantages for storage and analysis of raw signal data from nanopore experiments. Here we introduce slow5tools, an intuitive toolkit for handling nanopore data in SLOW5 format. Slow5tools enables lossless data conversion and a range of tools for interacting with SLOW5 files. Slow5tools uses multi-threading, multi-processing, and other engineering strategies to achieve fast data conversion and manipulation, including live FAST5-to-SLOW5 conversion during sequencing. We provide examples and benchmarking experiments to illustrate slow5tools usage, and describe the engineering principles underpinning its performance.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Análise de Sequência de DNA , Genômica , Software , Sequenciamento de Nucleotídeos em Larga Escala
6.
Nat Biotechnol ; 40(7): 1026-1029, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34980914

RESUMO

Nanopore sequencing depends on the FAST5 file format, which does not allow efficient parallel analysis. Here we introduce SLOW5, an alternative format engineered for efficient parallelization and acceleration of nanopore data analysis. Using the example of DNA methylation profiling of a human genome, analysis runtime is reduced from more than two weeks to approximately 10.5 h on a typical high-performance computer. SLOW5 is approximately 25% smaller than FAST5 and delivers consistent improvements on different computer architectures.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Análise de Dados , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
7.
Commun Biol ; 3(1): 538, 2020 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-32994472

RESUMO

The advent of portable nanopore sequencing devices has enabled DNA and RNA sequencing to be performed in the field or the clinic. However, advances in in situ genomics require parallel development of portable, offline solutions for the computational analysis of sequencing data. Here we introduce Genopo, a mobile toolkit for nanopore sequencing analysis. Genopo compacts popular bioinformatics tools to an Android application, enabling fully portable computation. To demonstrate its utility for in situ genome analysis, we use Genopo to determine the complete genome sequence of the human coronavirus SARS-CoV-2 in nine patient isolates sequenced on a nanopore device, with Genopo executing this workflow in less than 30 min per sample on a range of popular smartphones. We further show how Genopo can be used to profile DNA methylation in a human genome sample, illustrating a flexible, efficient architecture that is suitable to run many popular bioinformatics tools and accommodate small or large genomes. As the first ever smartphone application for nanopore sequencing analysis, Genopo enables the genomics community to harness this cheap, ubiquitous computational resource.


Assuntos
Betacoronavirus/genética , Biologia Computacional/métodos , Genoma Humano , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Betacoronavirus/patogenicidade , COVID-19 , Telefone Celular/instrumentação , Biologia Computacional/instrumentação , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/virologia , Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Nanoporos , Pandemias , Pneumonia Viral/diagnóstico , Pneumonia Viral/virologia , SARS-CoV-2 , Sequenciamento Completo do Genoma/instrumentação
8.
BMC Bioinformatics ; 21(1): 343, 2020 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-32758139

RESUMO

BACKGROUND: Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. RESULTS: By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. CONCLUSIONS: Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c .


Assuntos
Gráficos por Computador , Nanoporos , Processamento de Sinais Assistido por Computador , Algoritmos , Biologia Computacional , Bases de Dados como Assunto , Genoma Humano , Humanos , Análise de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA