Your browser doesn't support javascript.
loading
ACMGA: a reference-free multiple-genome alignment pipeline for plant species.
Zhou, Huafeng; Su, Xiaoquan; Song, Baoxing.
Afiliação
  • Zhou H; College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China.
  • Su X; National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong, 261325, China.
  • Song B; College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China. suxq@qdu.edu.cn.
BMC Genomics ; 25(1): 515, 2024 May 25.
Article em En | MEDLINE | ID: mdl-38796435
ABSTRACT

BACKGROUND:

The short-read whole-genome sequencing (WGS) approach has been widely applied to investigate the genomic variation in the natural populations of many plant species. With the rapid advancements in long-read sequencing and genome assembly technologies, high-quality genome sequences are available for a group of varieties for many plant species. These genome sequences are expected to help researchers comprehensively investigate any type of genomic variants that are missed by the WGS technology. However, multiple genome alignment (MGA) tools designed by the human genome research community might be unsuitable for plant genomes.

RESULTS:

To fill this gap, we developed the AnchorWave-Cactus Multiple Genome Alignment (ACMGA) pipeline, which improved the alignment of repeat elements and could identify long (> 50 bp) deletions or insertions (INDELs). We conducted MGA using ACMGA and Cactus for 8 Arabidopsis (Arabidopsis thaliana) and 26 Maize (Zea mays) de novo assembled genome sequences and compared them with the previously published short-read variant calling results. MGA identified more single nucleotide variants (SNVs) and long INDELs than did previously published WGS variant callings. Additionally, ACMGA detected significantly more SNVs and long INDELs in repetitive regions and the whole genome than did Cactus. Compared with the results of Cactus, the results of ACMGA were more similar to the previously published variants called using short-read. These two MGA pipelines identified numerous multi-allelic variants that were missed by the WGS variant calling pipeline.

CONCLUSIONS:

Aligning de novo assembled genome sequences could identify more SNVs and INDELs than mapping short-read. ACMGA combines the advantages of AnchorWave and Cactus and offers a practical solution for plant MGA by integrating global alignment, a 2-piece-affine-gap cost strategy, and the progressive MGA algorithm.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Arabidopsis / Genoma de Planta / Zea mays Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Arabidopsis / Genoma de Planta / Zea mays Idioma: En Ano de publicação: 2024 Tipo de documento: Article