Your browser doesn't support javascript.
loading
A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome.
Park, HyeonSeul; Gim, JungSoo.
Afiliación
  • Park H; Chosun University.
  • Gim J; Chosun University.
Res Sq ; 2023 Mar 06.
Article en En | MEDLINE | ID: mdl-36945432
Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and 'NA12878' (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct result; 2) there is an optimal work flow for single genome data. We assessed the performance of variant calling pipelines using hg38 and a Korean genome (reference genomes) and two whole-genome sequencing (WGS) reads from different ethnic origins: Caucasian (NA12878) and Korean. The pipelines used BWA-mem and Novoalign as mapping tools and GATK4, Strelka2, DeepVariant, and Samtools as variant callers. Using hg38 led to better performance (based on precision and recall), regardless of the ethnic origin of the WGS reads. Novoalign + GATK4 demonstrated best performance when using both WGS data. We assessed pipeline efficiency by removing the markduplicate process, and all pipelines, except Novoalign + DeepVariant, maintained their performance. Novoalign identified more variants overall and in MHC of chr6 when combined with GATK4. No evidence suggested improved variant calling performance from single WGS reads with a different ethnic reference, re-validating hg38 utility. We recommend using Novoalign + GATK4 without markduplication for single PCR-free WGS data.

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Res Sq Año: 2023 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Res Sq Año: 2023 Tipo del documento: Article Pais de publicación: Estados Unidos