Estimating evolutionary distances between genomic sequences from spaced-word matches.
Algorithms Mol Biol
; 10: 5, 2015.
Article
em En
| MEDLINE
| ID: mdl-25685176
ABSTRACT
Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d N of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'. Our software is available online and as downloadable source code at http//spaced.gobics.de/.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Idioma:
En
Ano de publicação:
2015
Tipo de documento:
Article