Your browser doesn't support javascript.
loading
Gclust: A Parallel Clustering Tool for Microbial Genomic Data.
Li, Ruilin; He, Xiaoyu; Dai, Chuangchuang; Zhu, Haidong; Lang, Xianyu; Chen, Wei; Li, Xiaodong; Zhao, Dan; Zhang, Yu; Han, Xinyin; Niu, Tie; Zhao, Yi; Cao, Rongqiang; He, Rong; Lu, Zhonghua; Chi, Xuebin; Li, Weizhong; Niu, Beifang.
Afiliação
  • Li R; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • He X; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • Dai C; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • Zhu H; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • Lang X; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
  • Chen W; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • Li X; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • Zhao D; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • Zhang Y; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • Han X; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China.
  • Niu T; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
  • Zhao Y; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
  • Cao R; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
  • He R; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
  • Lu Z; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
  • Chi X; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China; Center of Scientific Computing Applications & Research, Chinese Academy of Sciences, Beijing 100190, China.
  • Li W; J. Craig Venter Institute, La Jolla, CA 92037, USA. Electronic address: wli@jcvi.org.
  • Niu B; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China; Guizhou University School of Medicine, Guiyang 550025, China. Electronic address: niubf@cnic.cn.
Genomics Proteomics Bioinformatics ; 17(5): 496-502, 2019 10.
Article em En | MEDLINE | ID: mdl-31917259
ABSTRACT
The accelerating growth of the public microbial genomic data imposes substantial burden on the research community that uses such resources. Building databases for non-redundant reference sequences from massive microbial genomic data based on clustering analysis is essential. However, existing clustering algorithms perform poorly on long genomic sequences. In this article, we present Gclust, a parallel program for clustering complete or draft genomic sequences, where clustering is accelerated with a novel parallelization strategy and a fast sequence comparison algorithm using sparse suffix arrays (SSAs). Moreover, genome identity measures between two sequences are calculated based on their maximal exact matches (MEMs). In this paper, we demonstrate the high speed and clustering quality of Gclust by examining four genome sequence datasets. Gclust is freely available for non-commercial use at https//github.com/niu-lab/gclust. We also introduce a web server for clustering user-uploaded genomes at http//niulab.scgrid.cn/gclust.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Interface Usuário-Computador / Genoma Idioma: En Revista: Genomics Proteomics Bioinformatics Ano de publicação: 2019 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Interface Usuário-Computador / Genoma Idioma: En Revista: Genomics Proteomics Bioinformatics Ano de publicação: 2019 Tipo de documento: Article