Your browser doesn't support javascript.
loading
Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data.
Teng, Mingxiang; Irizarry, Rafael A.
Affiliation
  • Teng M; Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.
  • Irizarry RA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA.
Genome Res ; 27(11): 1930-1938, 2017 11.
Article in En | MEDLINE | ID: mdl-29025895
The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics' public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Base Composition / DNA / Sequence Analysis, DNA Language: En Journal: Genome Res Journal subject: BIOLOGIA MOLECULAR / GENETICA Year: 2017 Type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Base Composition / DNA / Sequence Analysis, DNA Language: En Journal: Genome Res Journal subject: BIOLOGIA MOLECULAR / GENETICA Year: 2017 Type: Article Affiliation country: United States