Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Sigmoni: classification of nanopore signal with a compressed pangenome index.

Shivakumar, Vikram S; Ahmed, Omar Y; Kovaka, Sam; Zakeri, Mohsen; Langmead, Ben.

Bioinformatics ; 40(Supplement_1): i287-i296, 2024 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-38940135

RESUMO

SUMMARY: Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications. AVAILABILITY AND IMPLEMENTATION: Sigmoni is implemented in Python, and is available open-source at https://github.com/vshiv18/sigmoni.

Assuntos

Algoritmos , Humanos , Sequenciamento por Nanoporos/métodos , Software , Nanoporos , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos

Sigmoni: classification of nanopore signal with a compressed pangenome index.

Shivakumar, Vikram S; Ahmed, Omar Y; Kovaka, Sam; Zakeri, Mohsen; Langmead, Ben.

bioRxiv ; 2023 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-37645873

RESUMO

Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes.

Leaf Transcriptome Assembly of Protium copal (Burseraceae) and Annotation of Terpene Biosynthetic Genes.

Damasco, Gabriel; Shivakumar, Vikram S; Misciewicz, Tracy M; Daly, Douglas C; Fine, Paul V A.

Genes (Basel) ; 10(5)2019 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-31121954

RESUMO

Plants in the Burseraceae are globally recognized for producing resins and essential oils with medicinal properties and have economic value. In addition, most of the aromatic and non-aromatic components of Burseraceae resins are derived from a variety of terpene and terpenoid chemicals. Although terpene genes have been identified in model plant crops (e.g., Citrus, Arabidopsis), very few genomic resources are available for non-model groups, including the highly diverse Burseraceae family. Here we report the assembly of a leaf transcriptome of Protium copal, an aromatic tree that has a large distribution in Central America, describe the functional annotation of putative terpene biosynthetic genes and compare terpene biosynthetic genes found in P. copal with those identified in other Burseraceae taxa. The genomic resources of Protium copal can be used to generate novel sequencing markers for population genetics and comparative phylogenetic studies, and to investigate the diversity and evolution of terpene genes in the Burseraceae.

Assuntos

Burseraceae/genética , Folhas de Planta/genética , Terpenos/metabolismo , Transcriptoma/genética , Burseraceae/metabolismo , Genômica , Anotação de Sequência Molecular , Óleos Voláteis/metabolismo , Extratos Vegetais/genética , Extratos Vegetais/metabolismo , Folhas de Planta/metabolismo

Transcriptome analysis of the curry tree (Bergera koenigii L., Rutaceae) during leaf development.

Shivakumar, Vikram S; Johnson, Gabriel; Zimmer, Elizabeth A.

Sci Rep ; 9(1): 4230, 2019 03 12.

Artigo em Inglês | MEDLINE | ID: mdl-30862864

RESUMO

The curry tree (Bergera koenigii L.) is a widely cultivated plant used in South Asian cooking. Next-generation sequencing was used to generate the transcriptome of the curry leaf to detect changes in gene expression during leaf development, such as those genes involved in the production of oils which lend the leaf its characteristic taste, aroma, and medicinal properties. Using abundance estimation (RSEM) and differential expression analysis, genes that were significantly differentially expressed were identified. The transcriptome was annotated with BLASTx using the non-redundant (nr) protein database, and Gene Ontology (GO) terms were assigned based on the top BLAST hit using Blast2GO. Lastly, functional enrichment of the assigned GO terms was analyzed for genes that were significantly differentially expressed. Of the most enriched GO categories, pathways involved in cell wall, membrane, and lignin synthesis were found to be most upregulated in immature leaf tissue, possibly due to the growth and expansion of the leaf tissue. Terpene synthases, which synthesize monoterpenes and sesquiterpenes, which comprise much of the curry essential oil, were found to be significantly upregulated in mature leaf tissue, suggesting that oil production increases later in leaf development. Enzymes involved in pigment production were also significantly upregulated in mature leaves. The findings were based on computational estimates of gene expression from RNA-seq data, and further study is warranted to validate these results using targeted techniques, such as quantitative PCR.

Assuntos

Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/fisiologia , Folhas de Planta/crescimento & desenvolvimento , Rutaceae/metabolismo , Transcriptoma/fisiologia , Folhas de Planta/genética , Rutaceae/genética

Analysis of whole chloroplast genomes from the genera of the Clauseneae, the curry tribe (Rutaceae, Citrus family).

Shivakumar, Vikram S; Appelhans, Marc S; Johnson, Gabriel; Carlsen, Monica; Zimmer, Elizabeth A.

Mol Phylogenet Evol ; 117: 135-140, 2017 12.

Artigo em Inglês | MEDLINE | ID: mdl-27965082

RESUMO

The Clauseneae (Aurantioideae, Rutaceae) is a tribe in the Citrus family that, although economically important as it contains the culinary and medicinally-useful curry tree (Bergera koenigii), has been relatively understudied. Due to the recent significant taxonomic changes made to this tribe, a closer inspection of the genetic relationships among its genera has been warranted. Whole genome skimming was used to generate chloroplast genomes from six species, representing each of the four genera (Bergera, Clausena, Glycosmis, Micromelum) in the Clauseneae tribe plus one closely related outgroup (Merrillia), using the published plastome sequence of Citrus sinensis as a reference. Phylogenetically informative character (PIC) data were analyzed using a genome alignment of the seven species, and variability frequency among the species was recorded for each coding and non-coding region, with the regions of highest variability identified for future phylogenetic studies. Non-coding regions exhibited a higher percentage of variable characters as expected, and the phylogenetic markers ycf1, matK, rpoC2, ndhF, trnS-trnG spacer, and trnH-psbA spacer proved to be among the most variable regions. Other markers that are frequently used in phylogenetic studies, e.g. rps16, atpB-rbcL, rps4-trnT, and trnL-trnF, proved to be far less variable. Phylogenetic analyses of the aligned sequences were conducted using Bayesian inference (MrBayes) and Maximum Likelihood (RAxML), yielding highly supported divisions among the four genera.

Assuntos

Genoma de Cloroplastos/genética , Filogenia , Rutaceae/classificação , Rutaceae/genética , Teorema de Bayes , Citrus/genética , Genoma de Planta/genética , Funções Verossimilhança , Murraya/genética , Alinhamento de Sequência

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA