Your browser doesn't support javascript.
loading
WMSA 2: a multiple DNA/RNA sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies.
Chen, Juntao; Chao, Jiannan; Liu, Huan; Yang, Fenglong; Zou, Quan; Tang, Furong.
Afiliación
  • Chen J; Quzhou People's Hospital, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, China, Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China, and the Institute of Fundamental and Frontier Sciences, University of Electronic Scienc
  • Chao J; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China, and the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
  • Liu H; School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, China.
  • Yang F; Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, China.
  • Zou Q; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China and the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
  • Tang F; Quzhou People's Hospital, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, China, and Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing, China.
Brief Bioinform ; 24(4)2023 07 20.
Article en En | MEDLINE | ID: mdl-37200156
Multiple sequence alignment is widely used for sequence analysis, such as identifying important sites and phylogenetic analysis. Traditional methods, such as progressive alignment, are time-consuming. To address this issue, we introduce StarTree, a novel method to fast construct a guide tree by combining sequence clustering and hierarchical clustering. Furthermore, we develop a new heuristic similar region detection algorithm using the FM-index and apply the k-banded dynamic program to the profile alignment. We also introduce a win-win alignment algorithm that applies the central star strategy within the clusters to fast the alignment process, then uses the progressive strategy to align the central-aligned profiles, guaranteeing the final alignment's accuracy. We present WMSA 2 based on these improvements and compare the speed and accuracy with other popular methods. The results show that the guide tree made by the StarTree clustering method can lead to better accuracy than that of PartTree while consuming less time and memory than that of UPGMA and mBed methods on datasets with thousands of sequences. During the alignment of simulated data sets, WMSA 2 can consume less time and memory while ranking at the top of Q and TC scores. The WMSA 2 is still better at the time, and memory efficiency on the real datasets and ranks at the top on the average sum of pairs score. For the alignment of 1 million SARS-CoV-2 genomes, the win-win mode of WMSA 2 significantly decreased the consumption time than the former version. The source code and data are available at https://github.com/malabz/WMSA2.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: ARN / COVID-19 Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: ARN / COVID-19 Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article