K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue.

Afiliação

Lin J; Department of Software engineering, College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350108, China.
Adjeroh DA; Department of Computer Science & Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA.
Jiang BH; Department of Pathology, Carver College of Medicine, The University of Iowa, Iowa City, IA 52242, USA.
Jiang Y; Department of Software engineering, College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350108, China.

Bioinformatics ; 34(10): 1682-1689, 2018 05 15.

Article em En | MEDLINE | ID: mdl-29253072

ABSTRACT

ABSTRACT

Motivation Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods.

Results:

We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. Availability and implementation The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http//community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). Contact yueljiang@163.com. Supplementary information Supplementary data are available at Bioinformatics online.

Assuntos

Filogenia; Análise de Sequência de DNA/métodos; Software; Algoritmos; Animais; Alinhamento de Sequência

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Filogenia / Software / Análise de Sequência de DNA Limite: Animals Idioma: En Ano de publicação: 2018 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google