Pesquisa | BVS Doenças Infecciosas e Parasitárias

COATi: Statistical Pairwise Alignment of Protein-Coding Sequences.

García Mesa, Juan José; Zhu, Ziqi; Cartwright, Reed A.

Mol Biol Evol ; 41(7)2024 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-38869090

RESUMO

Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequencing artifacts and errors made during genome assembly, such as abiological frameshifts and incorrect early stop codons, can impact downstream analyses leading to erroneous conclusions in comparative and functional genomic studies. More significantly, while indels can occur both within and between codons in natural sequences, most amino-acid- and codon-based aligners assume that indels only occur between codons. This mismatch between biology and alignment algorithms produces suboptimal alignments and errors in downstream analyses. To address these issues, we present COATi, a statistical, codon-aware pairwise aligner that supports complex insertion-deletion models and can handle artifacts present in genomic data. COATi allows users to reduce the amount of discarded data while generating more accurate sequence alignments. COATi can infer indels both within and between codons, leading to improved sequence alignments. We applied COATi to a dataset containing orthologous protein-coding sequences from humans and gorillas and conclude that 41% of indels occurred between codons, agreeing with previous work in other species. We also applied COATi to semiempirical benchmark alignments and find that it outperforms several popular alignment programs on several measures of alignment quality and accuracy.

Assuntos

Mutação INDEL , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Humanos , Animais , Software , Algoritmos , Códon , Gorilla gorilla/genética , Biologia Computacional/métodos , Fases de Leitura Aberta , Filogenia

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA