Search | VHL Regional Portal

Cell-type specificity of ChIP-predicted transcription factor binding sites.

Håndstad, Tony; Rye, Morten; Mocnik, Rok; Drabløs, Finn; Sætrom, Pål.

BMC Genomics ; 13: 372, 2012 Aug 03.

Article in English | MEDLINE | ID: mdl-22863112

ABSTRACT

BACKGROUND: Context-dependent transcription factor (TF) binding is one reason for differences in gene expression patterns between different cellular states. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) identifies genome-wide TF binding sites for one particular context-the cells used in the experiment. But can such ChIP-seq data predict TF binding in other cellular contexts and is it possible to distinguish context-dependent from ubiquitous TF binding? RESULTS: We compared ChIP-seq data on TF binding for multiple TFs in two different cell types and found that on average only a third of ChIP-seq peak regions are common to both cell types. Expectedly, common peaks occur more frequently in certain genomic contexts, such as CpG-rich promoters, whereas chromatin differences characterize cell-type specific TF binding. We also find, however, that genotype differences between the cell types can explain differences in binding. Moreover, ChIP-seq signal intensity and peak clustering are the strongest predictors of common peaks. Compared with strong peaks located in regions containing peaks for multiple transcription factors, weak and isolated peaks are less common between the cell types and are less associated with data that indicate regulatory activity. CONCLUSIONS: Together, the results suggest that experimental noise is prevalent among weak peaks, whereas strong and clustered peaks represent high-confidence binding events that often occur in other cellular contexts. Nevertheless, 30-40% of the strongest and most clustered peaks show context-dependent regulation. We show that by combining signal intensity with additional data-ranging from context independent information such as binding site conservation and position weight matrix scores to context dependent chromatin structure-we can predict whether a ChIP-seq peak is likely to be present in other cellular contexts.

Subject(s)

Binding Sites/genetics , Chromatin Immunoprecipitation/methods , High-Throughput Nucleotide Sequencing/methods , Transcription Factors/genetics , Transcription Factors/metabolism , Base Sequence , Cell Line, Tumor , Chromatin/genetics , Chromatin/metabolism , DNA/genetics , DNA/metabolism , Gene Expression , Gene Regulatory Networks , Genotype , HeLa Cells , Histones/genetics , Humans , Polymorphism, Single Nucleotide , Regulatory Sequences, Nucleic Acid , Sequence Analysis, DNA

The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding.

Kornacker, Karl; Rye, Morten Beck; Håndstad, Tony; Drabløs, Finn.

BMC Bioinformatics ; 13: 176, 2012 Jul 24.

Article in English | MEDLINE | ID: mdl-22827163

ABSTRACT

BACKGROUND: Chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-Seq) is the most frequently used method to identify the binding sites of transcription factors. Active binding sites can be seen as peaks in enrichment profiles when the sequencing reads are mapped to a reference genome. However, the profiles are normally noisy, making it challenging to identify all significantly enriched regions in a reliable way and with an acceptable false discovery rate. RESULTS: We present the Triform algorithm, an improved approach to automatic peak finding in ChIP-Seq enrichment profiles for transcription factors. The method uses model-free statistics to identify peak-like distributions of sequencing reads, taking advantage of improved peak definition in combination with known characteristics of ChIP-Seq data. CONCLUSIONS: Triform outperforms several existing methods in the identification of representative peak profiles in curated benchmark data sets. We also show that Triform in many cases is able to identify peaks that are more consistent with biological function, compared with other methods. Finally, we show that Triform can be used to generate novel information on transcription factor binding in repeat regions, which represents a particular challenge in many ChIP-Seq experiments. The Triform algorithm has been implemented in R, and is available via http://tare.medisin.ntnu.no/triform.

Subject(s)

Algorithms , Chromatin Immunoprecipitation/methods , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Transcription Factors/metabolism , Binding Sites , Sensitivity and Specificity

Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements.

Rye, Morten; Sætrom, Pål; Håndstad, Tony; Drabløs, Finn.

BMC Biol ; 9: 80, 2011 Nov 24.

Article in English | MEDLINE | ID: mdl-22115494

ABSTRACT

BACKGROUND: Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases. RESULTS: Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding. CONCLUSION: By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.

Subject(s)

Chromatin Immunoprecipitation/methods , Chromatin/metabolism , High-Throughput Nucleotide Sequencing/methods , Histones/metabolism , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Binding Sites , Cell Line , Chromatin/chemistry , Chromatin/genetics , Histones/chemistry , Histones/genetics , Humans , Lysine/metabolism , Methylation , Multigene Family , Protein Binding , Transcription Factors/analysis , Transcription Factors/genetics

A ChIP-Seq benchmark shows that sequence conservation mainly improves detection of strong transcription factor binding sites.

Håndstad, Tony; Rye, Morten Beck; Drabløs, Finn; Sætrom, Pål.

PLoS One ; 6(4): e18430, 2011 Apr 14.

Article in English | MEDLINE | ID: mdl-21533218

ABSTRACT

BACKGROUND: Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS) is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial. RESULTS: Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods. CONCLUSIONS: Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites.

Subject(s)

Chromatin Immunoprecipitation , Transcription Factors/metabolism , Bayes Theorem , Binding Sites , Conserved Sequence , Promoter Regions, Genetic , ROC Curve

Motif kernel generated by genetic programming improves remote homology and fold detection.

Håndstad, Tony; Hestnes, Arne J H; Saetrom, Pål.

BMC Bioinformatics ; 8: 23, 2007 Jan 25.

Article in English | MEDLINE | ID: mdl-17254344

ABSTRACT

BACKGROUND: Protein remote homology detection is a central problem in computational biology. Most recent methods train support vector machines to discriminate between related and unrelated sequences and these studies have introduced several types of kernels. One successful approach is to base a kernel on shared occurrences of discrete sequence motifs. Still, many protein sequences fail to be classified correctly for a lack of a suitable set of motifs for these sequences. RESULTS: We introduce the GPkernel, which is a motif kernel based on discrete sequence motifs where the motifs are evolved using genetic programming. All proteins can be grouped according to evolutionary relations and structure, and the method uses this inherent structure to create groups of motifs that discriminate between different families of evolutionary origin. When tested on two SCOP benchmarks, the superfamily and fold recognition problems, the GPkernel gives significantly better results compared to related methods of remote homology detection. CONCLUSION: The GPkernel gives particularly good results on the more difficult fold recognition problem compared to the other methods. This is mainly because the method creates motif sets that describe similarities among subgroups of both the related and unrelated proteins. This rich set of motifs give a better description of the similarities and differences between different folds than do previous motif-based methods.

Subject(s)

Algorithms , Amino Acid Motifs , Pattern Recognition, Automated/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Software , Protein Folding , Sequence Alignment , Sequence Homology, Amino Acid

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL