Search | VHL Regional Portal

A synergistic DNA logic predicts genome-wide chromatin accessibility.

Hashimoto, Tatsunori; Sherwood, Richard I; Kang, Daniel D; Rajagopal, Nisha; Barkal, Amira A; Zeng, Haoyang; Emons, Bart J M; Srinivasan, Sharanya; Jaakkola, Tommi; Gifford, David K.

Genome Res ; 26(10): 1430-1440, 2016 10.

Article in English | MEDLINE | ID: mdl-27456004

ABSTRACT

Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution. We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.

Subject(s)

Chromatin Assembly and Disassembly , Chromatin/genetics , Models, Genetic , Animals , Chromatin/metabolism , Genome, Human , Humans , Machine Learning

GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding.

Zeng, Haoyang; Hashimoto, Tatsunori; Kang, Daniel D; Gifford, David K.

Bioinformatics ; 32(4): 490-6, 2016 Feb 15.

Article in English | MEDLINE | ID: mdl-26476779

ABSTRACT

MOTIVATION: The majority of disease-associated variants identified in genome-wide association studies reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of genome-wide association studies. RESULTS: We present GERV (generative evaluation of regulatory variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer-based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif and associated co-factor motifs. We show that GERV outperforms existing methods in predicting single-nucleotide polymorphisms associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked single-nucleotide polymorphisms and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis. AVAILABILITY AND IMPLEMENTATION: The implementation of GERV and related data are available at http://gerv.csail.mit.edu/.

Subject(s)

Algorithms , Computational Biology/methods , Models, Statistical , Polymorphism, Single Nucleotide/genetics , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Binding Sites , Chromatin Immunoprecipitation , Genome, Human , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation , Protein Binding

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL