Your browser doesn't support javascript.
loading
Evaluation of variant calling tools for large plant genome re-sequencing.
Yao, Zhen; You, Frank M; N'Diaye, Amidou; Knox, Ron E; McCartney, Curt; Hiebert, Colin W; Pozniak, Curtis; Xu, Wayne.
Afiliação
  • Yao Z; Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, Manitoba, R6M 1Y5, Canada.
  • You FM; Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, Ontario, K1A 0C6, Canada.
  • N'Diaye A; Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, S7N 5A8, Canada.
  • Knox RE; Swift Current Research and Development Centre, Agriculture and Agri-Food Canada, Box 1030, Swift Current, Saskatchewan, S9H 3X2, Canada.
  • McCartney C; Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, Manitoba, R6M 1Y5, Canada.
  • Hiebert CW; Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, Manitoba, R6M 1Y5, Canada.
  • Pozniak C; Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, S7N 5A8, Canada.
  • Xu W; Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, Manitoba, R6M 1Y5, Canada. wayne.xu@canada.ca.
BMC Bioinformatics ; 21(1): 360, 2020 Aug 17.
Article em En | MEDLINE | ID: mdl-32807073
ABSTRACT

BACKGROUND:

Discovering single nucleotide polymorphisms (SNPs) from agriculture crop genome sequences has been a widely used strategy for developing genetic markers for several applications including marker-assisted breeding, population diversity studies for eco-geographical adaption, genotyping crop germplasm collections, and others. Accurately detecting SNPs from large polyploid crop genomes such as wheat is crucial and challenging. A few variant calling methods have been previously developed but they show a low concordance between their variant calls. A gold standard of variant sets generated from one human individual sample was established for variant calling tool evaluations, however hitherto no gold standard of crop variant set is available for wheat use. The intent of this study was to evaluate seven SNP variant calling tools (FreeBayes, GATK, Platypus, Samtools/mpileup, SNVer, VarScan, VarDict) with the two most popular mapping tools (BWA-mem and Bowtie2) on wheat whole exome capture (WEC) re-sequencing data from allohexaploid wheat.

RESULTS:

We found the BWA-mem mapping tool had both a higher mapping rate and a higher accuracy rate than Bowtie2. With the same mapping quality (MQ) cutoff, BWA-mem detected more variant bases in mapping reads than Bowtie2. The reads preprocessed with quality trimming or duplicate removal did not significantly affect the final mapping performance in terms of mapped reads. Based on the concordance and receiver operating characteristic (ROC), the Samtools/mpileup variant calling tool with BWA-mem mapping of raw sequence reads outperformed other tests followed by FreeBayes and GATK in terms of specificity and sensitivity. VarDict and VarScan were the poorest performing variant calling tools with the wheat WEC sequence data.

CONCLUSION:

The BWA-mem and Samtools/mpileup pipeline, with no need to preprocess the raw read data before mapping onto the reference genome, was ascertained the optimum for SNP calling for the complex wheat genome re-sequencing. These results also provide useful guidelines for reliable variant identification from deep sequencing of other large polyploid crop genomes.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Triticum / Genoma de Planta / Sequenciamento Completo do Genoma Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Canadá

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Triticum / Genoma de Planta / Sequenciamento Completo do Genoma Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Canadá