Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data.

Flickinger, Matthew; Jun, Goo; Abecasis, Gonçalo R; Boehnke, Michael; Kang, Hyun Min

Flickinger, Matthew; Jun, Goo; Abecasis, Gonçalo R; Boehnke, Michael; Kang, Hyun Min.

Afiliación

Flickinger M; Department of Biostatistics and Center for Statistical Genetics, University of Michigan, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA.
Jun G; Department of Biostatistics and Center for Statistical Genetics, University of Michigan, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, 7000 Fannin Street, Houston,
Abecasis GR; Department of Biostatistics and Center for Statistical Genetics, University of Michigan, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA.
Boehnke M; Department of Biostatistics and Center for Statistical Genetics, University of Michigan, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA. Electronic address: boehnke@umich.edu.
Kang HM; Department of Biostatistics and Center for Statistical Genetics, University of Michigan, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA. Electronic address: hmkang@umich.edu.

Am J Hum Genet ; 97(2): 284-90, 2015 Aug 06.

Article en En | MEDLINE | ID: mdl-26235984

ABSTRACT

ABSTRACT

DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%-20%), contamination-adjusted calls eliminate 48%-77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%.

Asunto(s)

Contaminación de ADN; Técnicas de Genotipaje/métodos; Técnicas de Genotipaje/normas; Modelos Genéticos; Análisis de Secuencia de ADN/métodos; Análisis de Secuencia de ADN/normas

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Análisis de Secuencia de ADN / Contaminación de ADN / Técnicas de Genotipaje / Modelos Genéticos Tipo de estudio: Evaluation_studies / Prognostic_studies Idioma: En Revista: Am J Hum Genet Año: 2015 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google