Pesquisa | Biblioteca Virtual em Saúde

KCOSS: an ultra-fast k-mer counter for assembled genome analysis.

Tang, Deyou; Li, Yucheng; Tan, Daqiang; Fu, Juan; Tang, Yelei; Lin, Jiabin; Zhao, Rong; Du, Hongli; Zhao, Zhongming.

Bioinformatics ; 38(4): 933-940, 2022 01 27.

Artigo em Inglês | MEDLINE | ID: mdl-34849595

RESUMO

MOTIVATION: The k-mer frequency in whole genome sequences provides researchers with an insightful perspective on genomic complexity, comparative genomics, metagenomics and phylogeny. The current k-mer counting tools are typically slow, and they require large memory and hard disk for assembled genome analysis. RESULTS: We propose a novel and ultra-fast k-mer counting algorithm, KCOSS, to fulfill k-mer counting mainly for assembled genomes with segmented Bloom filter, lock-free queue, lock-free thread pool and cuckoo hash table. We optimize running time and memory consumption by recycling memory blocks, merging multiple consecutive first-occurrence k-mers into C-read, and writing a set of C-reads to disk asynchronously. KCOSS was comparatively tested with Jellyfish2, CHTKC and KMC3 on seven assembled genomes and three sequencing datasets in running time, memory consumption, and hard disk occupation. The experimental results show that KCOSS counts k-mer with less memory and disk while having a shorter running time on assembled genomes. KCOSS can be used to calculate the k-mer frequency not only for assembled genomes but also for sequencing data. AVAILABILITYAND IMPLEMENTATION: The KCOSS software is implemented in C++. It is freely available on GitHub: https://github.com/kcoss-2021/KCOSS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genoma , Software , Análise de Sequência de DNA/métodos , Algoritmos , Genômica/métodos

VISDB: a manually curated database of viral integration sites in the human genome.

Tang, Deyou; Li, Bingrui; Xu, Tianyi; Hu, Ruifeng; Tan, Daqiang; Song, Xiaofeng; Jia, Peilin; Zhao, Zhongming.

Nucleic Acids Res ; 48(D1): D633-D641, 2020 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-31598702

RESUMO

Virus integration into the human genome occurs frequently and represents a key driving event in human disease. Many studies have reported viral integration sites (VISs) proximal to structural or functional regions of the human genome. Here, we systematically collected and manually curated all VISs reported in the literature and publicly available data resources to construct the Viral Integration Site DataBase (VISDB, https://bioinfo.uth.edu/VISDB). Genomic information including target genes, nearby genes, nearest transcription start site, chromosome fragile sites, CpG islands, viral sequences and target sequences were integrated to annotate VISs. We further curated VIS-involved oncogenes and tumor suppressor genes, virus-host interactions involved in non-coding RNA (ncRNA), target gene and microRNA expression in five cancers, among others. Moreover, we developed tools to visualize single integration events, VIS clusters, DNA elements proximal to VISs and virus-host interactions involved in ncRNA. The current version of VISDB contains a total of 77 632 integration sites of five DNA viruses and four RNA retroviruses. VISDB is currently the only active comprehensive VIS database, which provides broad usability for the study of disease, virus related pathophysiology, virus biology, host-pathogen interactions, sequence motif discovery and pattern recognition, molecular evolution and adaption, among others.

Assuntos

Sítios Frágeis do Cromossomo , Ilhas de CpG , Bases de Dados Genéticas , Genoma Humano , Viroses/genética , Integração Viral , Mapeamento Cromossômico , Análise por Conglomerados , Evolução Molecular , Genoma Viral , Genômica , Interações entre Hospedeiro e Microrganismos , Humanos , Internet , Neoplasias/genética , Fenótipo , RNA não Traduzido , Retroviridae/genética , Sítio de Iniciação de Transcrição

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA