Your browser doesn't support javascript.
loading
A novel fast multiple nucleotide sequence alignment method based on FM-index.
Liu, Huan; Zou, Quan; Xu, Yun.
Afiliação
  • Liu H; School of Computer Science, University of Science and Technology of China and Key Laboratory on High Performance Computing of Anhui, China.
  • Zou Q; Institute of basic and Frontier Sciences, University of Electronic Science and Technology of China and Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
  • Xu Y; School of Computer Science, University of Science and Technology of China and Key Laboratory on High Performance Computing of Anhui, China.
Brief Bioinform ; 23(1)2022 01 17.
Article em En | MEDLINE | ID: mdl-34893794
ABSTRACT
Multiple sequence alignment (MSA) is fundamental to many biological applications. But most classical MSA algorithms are difficult to handle large-scale multiple sequences, especially long sequences. Therefore, some recent aligners adopt an efficient divide-and-conquer strategy to divide long sequences into several short sub-sequences. Selecting the common segments (i.e. anchors) for division of sequences is very critical as it directly affects the accuracy and time cost. So, we proposed a novel algorithm, FMAlign, to improve the performance of multiple nucleotide sequence alignment. We use FM-index to extract long common segments at a low cost rather than using a space-consuming hash table. Moreover, after finding the longer optimal common segments, the sequences are divided by the longer common segments. FMAlign has been tested on virus and bacteria genome and human mitochondrial genome datasets, and compared with existing MSA methods such as MAFFT, HAlign and FAME. The experiments show that our method outperforms the existing methods in terms of running time, and has a high accuracy on long sequence sets. All the results demonstrate that our method is applicable to the large-scale nucleotide sequences in terms of sequence length and sequence number. The source code and related data are accessible in https//github.com/iliuh/FMAlign.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Sequência de Bases / Alinhamento de Sequência / Análise de Sequência de DNA Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Sequência de Bases / Alinhamento de Sequência / Análise de Sequência de DNA Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article