Information theoretic generalized Robinson-Foulds metrics for comparing phylogenetic trees.
Bioinformatics
; 36(20): 5007-5013, 2020 12 22.
Article
em En
| MEDLINE
| ID: mdl-32619004
MOTIVATION: The Robinson-Foulds (RF) metric is widely used by biologists, linguists and chemists to quantify similarity between pairs of phylogenetic trees. The measure tallies the number of bipartition splits that occur in both trees-but this conservative approach ignores potential similarities between almost-identical splits, with undesirable consequences. 'Generalized' RF metrics address this shortcoming by pairing splits in one tree with similar splits in the other. Each pair is assigned a similarity score, the sum of which enumerates the similarity between two trees. The challenge lies in quantifying split similarity: existing definitions lack a principled statistical underpinning, resulting in misleading tree distances that are difficult to interpret. Here, I propose probabilistic measures of split similarity, which allow tree similarity to be measured in natural units (bits). RESULTS: My new information-theoretic metrics outperform alternative measures of tree similarity when evaluated against a broad suite of criteria, even though they do not account for the non-independence of splits within a single tree. Mutual clustering information exhibits none of the undesirable properties that characterize other tree comparison metrics, and should be preferred to the RF metric. AVAILABILITY AND IMPLEMENTATION: The methods discussed in this article are implemented in the R package 'TreeDist', archived at https://dx.doi.org/10.5281/zenodo.3528123. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Benchmarking
Idioma:
En
Ano de publicação:
2020
Tipo de documento:
Article