Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction.

Ashkenazy, Haim; Sela, Itamar; Levy Karin, Eli; Landan, Giddy; Pupko, Tal

Ashkenazy, Haim; Sela, Itamar; Levy Karin, Eli; Landan, Giddy; Pupko, Tal.

Ashkenazy H; Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel.
Sela I; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Levy Karin E; Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel.
Landan G; Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
Pupko T; Institute of Microbiology, Christian-Albrechts-University of Kiel, 24118 Kiel, Germany.

Syst Biol ; 68(1): 117-130, 2019 01 01.

Article en En | MEDLINE | ID: mdl-29771363

ABSTRACT

ABSTRACT

The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http//guidance.tau.ac.il.

Asunto(s)

Clasificación/métodos; Filogenia; Alineación de Secuencia; Programas Informáticos; Simulación por Computador; Reproducibilidad de los Resultados

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Filogenia / Programas Informáticos / Alineación de Secuencia / Clasificación Tipo de estudio: Prognostic_studies Idioma: En Año: 2019 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google