Your browser doesn't support javascript.
loading
EdClust: A heuristic sequence clustering method with higher sensitivity.
Cao, Ming; Peng, Qinke; Wei, Ze-Gang; Liu, Fei; Hou, Yi-Fan.
Affiliation
  • Cao M; Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, P. R. China.
  • Peng Q; School of Mathematics and Statistics, Shaanxi Xueqian Normal University, Xi'an, 710100, P. R. China.
  • Wei ZG; Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, P. R. China.
  • Liu F; Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, P. R. China.
  • Hou YF; Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, P. R. China.
J Bioinform Comput Biol ; 20(1): 2150036, 2022 02.
Article in En | MEDLINE | ID: mdl-34939905
ABSTRACT
The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two

limitations:

overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https//github.com/zhang134/EdClust.git under the GNU GPL license.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Heuristics Type of study: Diagnostic_studies Language: En Journal: J Bioinform Comput Biol Journal subject: BIOLOGIA / INFORMATICA MEDICA Year: 2022 Type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Heuristics Type of study: Diagnostic_studies Language: En Journal: J Bioinform Comput Biol Journal subject: BIOLOGIA / INFORMATICA MEDICA Year: 2022 Type: Article