Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Bioinformatics ; 40(Supplement_1): i287-i296, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940135

RESUMO

SUMMARY: Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications. AVAILABILITY AND IMPLEMENTATION: Sigmoni is implemented in Python, and is available open-source at https://github.com/vshiv18/sigmoni.


Assuntos
Algoritmos , Humanos , Sequenciamento por Nanoporos/métodos , Software , Nanoporos , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos
2.
Nucleic Acids Res ; 49(21): e124, 2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34551429

RESUMO

Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms, limiting CNA inference accuracy. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that short-molecule nanopore sequencing reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for short-molecule nanopore sequencing with applications in research and medicine, which includes but is not limited to, CNAs.


Assuntos
Variações do Número de Cópias de DNA , DNA/análise , Oncologia/métodos , Sequenciamento por Nanoporos/métodos , Neoplasias/genética , Linhagem Celular Tumoral , Humanos
4.
bioRxiv ; 2024 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-38826299

RESUMO

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8× smaller than a comparable KMC3 index and 11.4× smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5x faster than other approaches. MEMO's small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.

5.
bioRxiv ; 2024 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-38496646

RESUMO

Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic/transcriptomic and epigenetic information without additional library preparation. Presently, only a limited set of modifications can be directly basecalled (e.g. 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis, and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods, and a reproducible de novo training method for k-mer-based pore models, revealing potential errors in ONT's state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open-source at github.com/skovaka/uncalled4.

6.
bioRxiv ; 2023 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-37645873

RESUMO

Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes.

7.
Nat Biotechnol ; 39(4): 431-441, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33257863

RESUMO

Conventional targeted sequencing methods eliminate many of the benefits of nanopore sequencing, such as the ability to accurately detect structural variants or epigenetic modifications. The ReadUntil method allows nanopore devices to selectively eject reads from pores in real time, which could enable purely computational targeted sequencing. However, this requires rapid identification of on-target reads while most mapping methods require computationally intensive basecalling. We present UNCALLED ( https://github.com/skovaka/UNCALLED ), an open source mapper that rapidly matches streaming of nanopore current signals to a reference sequence. UNCALLED probabilistically considers k-mers that could be represented by the signal and then prunes the candidates based on the reference encoded within a Ferragina-Manzini index. We used UNCALLED to deplete sequencing of known bacterial genomes within a metagenomics community, enriching the remaining species 4.46-fold. UNCALLED also enriched 148 human genes associated with hereditary cancers to 29.6× coverage using one MinION flowcell, enabling accurate detection of single-nucleotide polymorphisms, insertions and deletions, structural variants and methylation in these genes.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Sequenciamento por Nanoporos/métodos , Neoplasias/congênito , Algoritmos , Metilação de DNA , Predisposição Genética para Doença , Variação Genética , Genoma Bacteriano , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Análise de Sequência de DNA , Software
8.
iScience ; 24(6): 102696, 2021 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-34195571

RESUMO

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject "nontarget" DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing using efficient pan-genome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI's index and peak memory footprint are also 16 to 4 times smaller than those of minimap2, respectively. This could enable accurate targeted sequencing even when the targeted strains have not necessarily been sequenced or assembled previously.

9.
Genome Biol ; 20(1): 278, 2019 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-31842956

RESUMO

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.


Assuntos
Técnicas Genéticas , Genômica/métodos , Transcriptoma , Animais , Arabidopsis , Humanos , Análise de Sequência de RNA , Software , Zea mays
10.
Genome Biol Evol ; 10(12): 3250-3261, 2018 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-30398645

RESUMO

Lentinus tigrinus is a species of wood-decaying fungi (Polyporales) that has an agaricoid form (a gilled mushroom) and a secotioid form (puffball-like, with enclosed spore-bearing structures). Previous studies suggested that the secotioid form is conferred by a recessive allele of a single locus. We sequenced the genomes of one agaricoid (Aga) strain and one secotioid (Sec) strain (39.53-39.88 Mb, with 15,581-15,380 genes, respectively). We mated the Sec and Aga monokaryons, genotyped the progeny, and performed bulked segregant analysis (BSA). We also fruited three Sec/Sec and three Aga/Aga dikaryons, and sampled transcriptomes at four developmental stages. Using BSA, we identified 105 top candidate genes with nonsynonymous SNPs that cosegregate with fruiting body phenotype. Transcriptome analyses of Sec/Sec versus Aga/Aga dikaryons identified 907 differentially expressed genes (DEGs) along four developmental stages. On the basis of BSA and DEGs, the top 25 candidate genes related to fruiting body development span 1.5 Mb (4% of the genome), possibly on a single chromosome, although the precise locus that controls the secotioid phenotype is unresolved. The top candidates include genes encoding a cytochrome P450 and an ATP-dependent RNA helicase, which may play a role in development, based on studies in other fungi.


Assuntos
Carpóforos/genética , Genoma Fúngico , Lentinula/genética , Evolução Biológica , Carpóforos/crescimento & desenvolvimento , Carpóforos/metabolismo , Expressão Gênica , Perfilação da Expressão Gênica , Lentinula/crescimento & desenvolvimento , Lentinula/metabolismo , Fenótipo , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA