RESUMEN
Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. Distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. Corresponding corrections for genome rearrangement distances fall into 3 categories: Empirical computational studies, Bayesian/MCMC approaches, and combinatorial approaches. Here, we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using a group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance: in particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. The second aspect tackles the problem of accounting for the symmetries of circular arrangements. While, generally, a frame of reference is locked, and all computation made accordingly, this work incorporates the action of the dihedral group so that distance estimates are free from any a priori frame of reference. The philosophy of accounting for symmetries can be applied to any existing correction method, for which examples are offered.
Asunto(s)
Evolución Molecular , Genoma/genética , Filogenia , Funciones de Verosimilitud , Análisis EspacialRESUMEN
Measuring the distance between two bacterial genomes under the inversion process is usually done by assuming all inversions to occur with equal probability. Recently, an approach to calculating inversion distance using group theory was introduced, and is effective for the model in which only very short inversions occur. In this paper, we show how to use the group-theoretic framework to establish minimal distance for any weighting on the set of inversions, generalizing previous approaches. To do this we use the theory of rewriting systems for groups, and exploit the Knuth-Bendix algorithm, the first time this theory has been introduced into genome rearrangement problems. The central idea of the approach is to use existing group theoretic methods to find an initial path between two genomes in genome space (for instance using only short inversions), and then to deform this path to optimality using a confluent system of rewriting rules generated by the Knuth-Bendix algorithm.