Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i287-i296, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38940135

RESUMEN

SUMMARY: Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications. AVAILABILITY AND IMPLEMENTATION: Sigmoni is implemented in Python, and is available open-source at https://github.com/vshiv18/sigmoni.


Asunto(s)
Algoritmos , Humanos , Secuenciación de Nanoporos/métodos , Programas Informáticos , Nanoporos , Genoma Humano , Genómica/métodos , Análisis de Secuencia de ADN/métodos
2.
bioRxiv ; 2024 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-38826299

RESUMEN

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8× smaller than a comparable KMC3 index and 11.4× smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5x faster than other approaches. MEMO's small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.

3.
bioRxiv ; 2024 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-38496646

RESUMEN

Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic/transcriptomic and epigenetic information without additional library preparation. Presently, only a limited set of modifications can be directly basecalled (e.g. 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis, and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods, and a reproducible de novo training method for k-mer-based pore models, revealing potential errors in ONT's state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open-source at github.com/skovaka/uncalled4.

4.
bioRxiv ; 2023 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-37645873

RESUMEN

Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes.

6.
Nucleic Acids Res ; 49(21): e124, 2021 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-34551429

RESUMEN

Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms, limiting CNA inference accuracy. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that short-molecule nanopore sequencing reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for short-molecule nanopore sequencing with applications in research and medicine, which includes but is not limited to, CNAs.


Asunto(s)
Variaciones en el Número de Copia de ADN , ADN/análisis , Oncología Médica/métodos , Secuenciación de Nanoporos/métodos , Neoplasias/genética , Línea Celular Tumoral , Humanos
7.
iScience ; 24(6): 102696, 2021 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-34195571

RESUMEN

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject "nontarget" DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.

8.
Nat Biotechnol ; 39(4): 431-441, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33257863

RESUMEN

Conventional targeted sequencing methods eliminate many of the benefits of nanopore sequencing, such as the ability to accurately detect structural variants or epigenetic modifications. The ReadUntil method allows nanopore devices to selectively eject reads from pores in real time, which could enable purely computational targeted sequencing. However, this requires rapid identification of on-target reads while most mapping methods require computationally intensive basecalling. We present UNCALLED ( https://github.com/skovaka/UNCALLED ), an open source mapper that rapidly matches streaming of nanopore current signals to a reference sequence. UNCALLED probabilistically considers k-mers that could be represented by the signal and then prunes the candidates based on the reference encoded within a Ferragina-Manzini index. We used UNCALLED to deplete sequencing of known bacterial genomes within a metagenomics community, enriching the remaining species 4.46-fold. UNCALLED also enriched 148 human genes associated with hereditary cancers to 29.6× coverage using one MinION flowcell, enabling accurate detection of single-nucleotide polymorphisms, insertions and deletions, structural variants and methylation in these genes.


Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Secuenciación de Nanoporos/métodos , Neoplasias/congénito , Algoritmos , Metilación de ADN , Predisposición Genética a la Enfermedad , Variación Genética , Genoma Bacteriano , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Análisis de Secuencia de ADN , Programas Informáticos
9.
Genome Biol ; 20(1): 278, 2019 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-31842956

RESUMEN

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.


Asunto(s)
Técnicas Genéticas , Genómica/métodos , Transcriptoma , Animales , Arabidopsis , Humanos , Análisis de Secuencia de ARN , Programas Informáticos , Zea mays
10.
Genome Biol Evol ; 10(12): 3250-3261, 2018 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-30398645

RESUMEN

Lentinus tigrinus is a species of wood-decaying fungi (Polyporales) that has an agaricoid form (a gilled mushroom) and a secotioid form (puffball-like, with enclosed spore-bearing structures). Previous studies suggested that the secotioid form is conferred by a recessive allele of a single locus. We sequenced the genomes of one agaricoid (Aga) strain and one secotioid (Sec) strain (39.53-39.88 Mb, with 15,581-15,380 genes, respectively). We mated the Sec and Aga monokaryons, genotyped the progeny, and performed bulked segregant analysis (BSA). We also fruited three Sec/Sec and three Aga/Aga dikaryons, and sampled transcriptomes at four developmental stages. Using BSA, we identified 105 top candidate genes with nonsynonymous SNPs that cosegregate with fruiting body phenotype. Transcriptome analyses of Sec/Sec versus Aga/Aga dikaryons identified 907 differentially expressed genes (DEGs) along four developmental stages. On the basis of BSA and DEGs, the top 25 candidate genes related to fruiting body development span 1.5 Mb (4% of the genome), possibly on a single chromosome, although the precise locus that controls the secotioid phenotype is unresolved. The top candidates include genes encoding a cytochrome P450 and an ATP-dependent RNA helicase, which may play a role in development, based on studies in other fungi.


Asunto(s)
Cuerpos Fructíferos de los Hongos/genética , Genoma Fúngico , Lentinula/genética , Evolución Biológica , Cuerpos Fructíferos de los Hongos/crecimiento & desarrollo , Cuerpos Fructíferos de los Hongos/metabolismo , Expresión Génica , Perfilación de la Expresión Génica , Lentinula/crecimiento & desarrollo , Lentinula/metabolismo , Fenotipo , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA