Your browser doesn't support javascript.
loading
Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines.
Weißbach, Stephan; Sys, Stanislav; Hewel, Charlotte; Todorov, Hristo; Schweiger, Susann; Winter, Jennifer; Pfenninger, Markus; Torkamani, Ali; Evans, Doug; Burger, Joachim; Everschor-Sitte, Karin; May-Simera, Helen Louise; Gerber, Susanne.
Afiliación
  • Weißbach S; Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.
  • Sys S; Institute of Developmental Biology and Neurobiology, Johannes Gutenberg-University Mainz, Mainz, Germany.
  • Hewel C; Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.
  • Todorov H; Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.
  • Schweiger S; Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.
  • Winter J; Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.
  • Pfenninger M; Leibniz Institute for Resilience Research, Mainz, Germany.
  • Torkamani A; Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.
  • Evans D; Leibniz Institute for Resilience Research, Mainz, Germany.
  • Burger J; Department of Molecular Ecology, Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany.
  • Everschor-Sitte K; Institute for Molecular and Organismic Evolution, Johannes Gutenberg-University Mainz, Johann-Joachim-Becher-Weg 7, 55128, Mainz, Germany.
  • May-Simera HL; LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Biodiversity, and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany.
  • Gerber S; Department of Integrative Structural and Computational Biology, Scripps Research Translational Institute, California Campus, San Diego, USA.
BMC Genomics ; 22(1): 62, 2021 Jan 19.
Article en En | MEDLINE | ID: mdl-33468057
ABSTRACT

BACKGROUND:

Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform's impact.

RESULTS:

The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups.

CONCLUSION:

We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Biología Computacional / Secuenciación de Nucleótidos de Alto Rendimiento Tipo de estudio: Guideline / Observational_studies / Prevalence_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: BMC Genomics Asunto de la revista: GENETICA Año: 2021 Tipo del documento: Article País de afiliación: Alemania

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Biología Computacional / Secuenciación de Nucleótidos de Alto Rendimiento Tipo de estudio: Guideline / Observational_studies / Prevalence_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: BMC Genomics Asunto de la revista: GENETICA Año: 2021 Tipo del documento: Article País de afiliación: Alemania