Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels.
Mol Biol Evol
; 41(7)2024 Jul 03.
Article
de En
| MEDLINE
| ID: mdl-38842253
ABSTRACT
Despite having important biological implications, insertion, and deletion (indel) events are often disregarded or mishandled during phylogenetic inference. In multiple sequence alignment, indels are represented as gaps and are estimated without considering the distinct evolutionary history of insertions and deletions. Consequently, indels are usually excluded from subsequent inference steps, such as ancestral sequence reconstruction and phylogenetic tree search. Here, we introduce indel-aware parsimony (indelMaP), a novel way to treat gaps under the parsimony criterion by considering insertions and deletions as separate evolutionary events and accounting for long indels. By identifying the precise location of an evolutionary event on the tree, we can separate overlapping indel events and use affine gap penalties for long indel modeling. Our indel-aware approach harnesses the phylogenetic signal from indels, including them into all inference stages. Validation and comparison to state-of-the-art inference tools on simulated data show that indelMaP is most suitable for densely sampled datasets with closely to moderately related sequences, where it can reach alignment quality comparable to probabilistic methods and accurately infer ancestral sequences, including indel patterns. Due to its remarkable speed, our method is well suited for epidemiological datasets, eliminating the need for downsampling and enabling the exploitation of the additional information provided by dense taxonomic sampling. Moreover, indelMaP offers new insights into the indel patterns of biologically significant sequences and advances our understanding of genetic variability by considering gaps as crucial evolutionary signals rather than mere artefacts.
Mots clés
Texte intégral:
1
Collection:
01-internacional
Base de données:
MEDLINE
Sujet principal:
Phylogenèse
/
Alignement de séquences
/
Mutation de type INDEL
Limites:
Humans
Langue:
En
Journal:
Mol Biol Evol
Sujet du journal:
BIOLOGIA MOLECULAR
Année:
2024
Type de document:
Article
Pays d'affiliation:
Suisse
Pays de publication:
États-Unis d'Amérique