Parsimony analysis of phylogenomic datasets (II): evaluation of PAUP*, MEGA and MPBoot.

Goloboff, Pablo A; Catalano, Santiago A; Torres, Ambrosio

Goloboff, Pablo A; Catalano, Santiago A; Torres, Ambrosio.

Goloboff PA; Unidad Ejecutora Lillo, Consejo Nacional de Investigaciones Científicas y Técnicas - Fundación Miguel Lillo, Miguel Lillo 251, San Miguel de Tucumán, Tucumán, 4000, Argentina.
Catalano SA; American Museum of Natural History, 200 Central Park West, New York, NY, 10024, USA.
Torres A; Unidad Ejecutora Lillo, Consejo Nacional de Investigaciones Científicas y Técnicas - Fundación Miguel Lillo, Miguel Lillo 251, San Miguel de Tucumán, Tucumán, 4000, Argentina.

Cladistics ; 38(1): 126-146, 2022 02.

Article en En | MEDLINE | ID: mdl-35049082

RESUMEN

This paper examines the implementation of parsimony methods in the programs PAUP*, MEGA and MPBoot, and compares them with TNT. PAUP* implements standard, well-tested algorithms, and flexible search strategies and options for handling trees; its main drawback is the lack of advanced search algorithms, which makes it difficult to find most parsimonious trees for large and complex datasets. In addition, branch-swapping can be much slower than in TNT for datasets with large numbers of taxa, although this is only occasionally a problem for phylogenomic datasets given that they typically have small numbers of taxa. The parsimony implementation of MEGA has major drawbacks. MEGA often fails to find parsimonious trees because it does not perform all possible branch swapping subtree pruning regrafting (SPR)/tree bisection-reconnection (TBR) rearrangements. It furthermore fails to properly handle ambiguity or multiple equally parsimonious trees, and it uses the same addition sequence for all bootstrap replicates. The latter yields values of group support that depend on the order in which taxa are listed in the dataset. In addition, tree searches are very slow and do not facilitate the exploration of different starting points (as random seed is fixed). MPBoot searches for optimal trees using the ratchet, but it is based on SPR instead of TBR (and only evaluates by default a subset of the SPR rearrangements). MPBoot approximates bootstrap frequencies by first finding a sample of trees and then selecting from those trees for every replicate, without performing a tree-search. The approximation is too rough in many cases, producing serious under- or overestimations of the correct support values and, for most kinds of datasets, slower estimations than can be obtained with TNT. In addition, bootstrapping with PAUP*, MEGA or MPBoot can attribute strong supports to groups that have no support at all under any meaningful concept of support, such as likelihood ratios or Bremer supports. In TNT, this problem is decreased by using the strict consensus tree to represent each replicate, or eliminated entirely by using different approximations of the Bremer support.

Asunto(s)

Algoritmos; Modelos Genéticos; Filogenia

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Algoritmos / Modelos Genéticos Tipo de estudio: Prognostic_studies Idioma: En Año: 2022 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google