Your browser doesn't support javascript.
loading
Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control.
Betschart, Raphael O; Riccio, Cristian; Aguilera-Garcia, Domingo; Blankenberg, Stefan; Guo, Linlin; Moch, Holger; Seidl, Dagmar; Solleder, Hugo; Thalén, Felix; Thiéry, Alexandre; Twerenbold, Raphael; Zeller, Tanja; Zoche, Martin; Ziegler, Andreas.
Afiliação
  • Betschart RO; Cardio-CARE, Medizincampus Davos, Davos, Switzerland.
  • Riccio C; Cardio-CARE, Medizincampus Davos, Davos, Switzerland.
  • Aguilera-Garcia D; Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.
  • Blankenberg S; Cardio-CARE, Medizincampus Davos, Davos, Switzerland.
  • Guo L; Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
  • Moch H; Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
  • Seidl D; Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
  • Solleder H; Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.
  • Thalén F; Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.
  • Thiéry A; Cardio-CARE, Medizincampus Davos, Davos, Switzerland.
  • Twerenbold R; Cardio-CARE, Medizincampus Davos, Davos, Switzerland.
  • Zeller T; Cardio-CARE, Medizincampus Davos, Davos, Switzerland.
  • Zoche M; Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
  • Ziegler A; Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
Biom J ; 66(5): e202300278, 2024 Jul.
Article em En | MEDLINE | ID: mdl-38988195
ABSTRACT
Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg-Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.61, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Controle de Qualidade / Sequenciamento Completo do Genoma Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Controle de Qualidade / Sequenciamento Completo do Genoma Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article