Your browser doesn't support javascript.
loading
Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.
Naj, Adam C; Lin, Honghuang; Vardarajan, Badri N; White, Simon; Lancour, Daniel; Ma, Yiyi; Schmidt, Michael; Sun, Fangui; Butkiewicz, Mariusz; Bush, William S; Kunkle, Brian W; Malamon, John; Amin, Najaf; Choi, Seung Hoan; Hamilton-Nelson, Kara L; van der Lee, Sven J; Gupta, Namrata; Koboldt, Daniel C; Saad, Mohamad; Wang, Bowen; Nato, Alejandro Q; Sohi, Harkirat K; Kuzma, Amanda; Wang, Li-San; Cupples, L Adrienne; van Duijn, Cornelia; Seshadri, Sudha; Schellenberg, Gerard D; Boerwinkle, Eric; Bis, Joshua C; Dupuis, Josée; Salerno, William J; Wijsman, Ellen M; Martin, Eden R; DeStefano, Anita L.
Afiliación
  • Naj AC; Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. Electronic address: adamnaj@pen
  • Lin H; Department of Medicine, Boston University School of Medicine, Boston, MA, USA.
  • Vardarajan BN; Department of Neurology, Columbia University Medical Center, New York, NY, USA.
  • White S; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
  • Lancour D; Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA.
  • Ma Y; Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA.
  • Schmidt M; John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.
  • Sun F; Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
  • Butkiewicz M; Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.
  • Bush WS; Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.
  • Kunkle BW; John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.
  • Malamon J; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Amin N; Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands.
  • Choi SH; Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
  • Hamilton-Nelson KL; John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.
  • van der Lee SJ; Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands.
  • Gupta N; Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA.
  • Koboldt DC; Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.
  • Saad M; Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA.
  • Wang B; Department of Statistics, University of Washington, Seattle, WA, USA.
  • Nato AQ; Division of Medical Genetics, University of Washington, Seattle, WA, USA.
  • Sohi HK; Division of Medical Genetics, University of Washington, Seattle, WA, USA.
  • Kuzma A; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Wang LS; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Cupples LA; Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA.
  • van Duijn C; Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands.
  • Seshadri S; The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA.
  • Schellenberg GD; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Boerwinkle E; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Human Genetics Center, University of Texas Health Science Center, Houston, TX, USA.
  • Bis JC; Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA.
  • Dupuis J; Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA.
  • Salerno WJ; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
  • Wijsman EM; Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA.
  • Martin ER; John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.
  • DeStefano AL; Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA.
Genomics ; 111(4): 808-818, 2019 07.
Article en En | MEDLINE | ID: mdl-29857119
ABSTRACT
The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Control de Calidad / Estudio de Asociación del Genoma Completo / Enfermedad de Alzheimer / Técnicas de Genotipaje / Secuenciación Completa del Genoma Límite: Female / Humans / Male Idioma: En Revista: Genomics Asunto de la revista: GENETICA Año: 2019 Tipo del documento: Article

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Control de Calidad / Estudio de Asociación del Genoma Completo / Enfermedad de Alzheimer / Técnicas de Genotipaje / Secuenciación Completa del Genoma Límite: Female / Humans / Male Idioma: En Revista: Genomics Asunto de la revista: GENETICA Año: 2019 Tipo del documento: Article