Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 55
Filter
1.
Cell ; 155(3): 713-24, 2013 Oct 24.
Article in English | MEDLINE | ID: mdl-24243024

ABSTRACT

Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.


Subject(s)
Artificial Intelligence , Sequence Analysis, DNA , Transcription Factors/metabolism , Binding Sites , Cell Line , Chromatin Immunoprecipitation , Gene Regulatory Networks , Humans , Regulatory Sequences, Nucleic Acid
2.
Cell ; 148(6): 1293-307, 2012 Mar 16.
Article in English | MEDLINE | ID: mdl-22424236

ABSTRACT

Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.


Subject(s)
Genome, Human , Genomics , Precision Medicine , Diabetes Mellitus, Type 2/genetics , Female , Gene Expression Profiling , Humans , Male , Metabolomics , Middle Aged , Mutation , Proteomics , Respiratory Syncytial Viruses/isolation & purification , Rhinovirus/isolation & purification
3.
Genome Res ; 33(5): 741-749, 2023 May.
Article in English | MEDLINE | ID: mdl-37156622

ABSTRACT

Recombinant plasmid vectors are versatile tools that have facilitated discoveries in molecular biology, genetics, proteomics, and many other fields. As the enzymatic and bacterial processes used to create recombinant DNA can introduce errors, sequence validation is an essential step in plasmid assembly. Sanger sequencing is the current standard for plasmid validation; however, this method is limited by an inability to sequence through complex secondary structure and lacks scalability when applied to full-plasmid sequencing of multiple plasmids owing to read-length limits. Although high-throughput sequencing does provide full-plasmid sequencing at scale, it is impractical and costly when used outside of library-scale validation. Here, we present Oxford nanopore-based rapid analysis of multiplexed plasmids (OnRamp), an alternative method for routine plasmid validation that combines the advantages of high-throughput sequencing's full-plasmid coverage and scalability with Sanger's affordability and accessibility by leveraging nanopore's long-read sequencing technology. We include customized wet-laboratory protocols for plasmid preparation along with a pipeline designed for analysis of read data obtained using these protocols. This analysis pipeline is deployed on the OnRamp web app, which generates alignments between actual and predicted plasmid sequences, quality scores, and read-level views. OnRamp is designed to be broadly accessible regardless of programming experience to facilitate more widespread adoption of long-read sequencing for routine plasmid validation. Here we describe the OnRamp protocols and pipeline and show our ability to obtain full sequences from pooled plasmids while detecting sequence variation even in regions of high secondary structure at less than half the cost of equivalent Sanger sequencing.


Subject(s)
Genome, Bacterial , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , Plasmids/genetics , High-Throughput Nucleotide Sequencing/methods , Proteomics
4.
Nucleic Acids Res ; 50(1): e6, 2022 01 11.
Article in English | MEDLINE | ID: mdl-34648033

ABSTRACT

Understanding the functional consequences of genetic variation in the non-coding regions of the human genome remains a challenge. We introduce h ere a computational tool, TURF, to prioritize regulatory variants with tissue-specific function by leveraging evidence from functional genomics experiments, including over 3000 functional genomics datasets from the ENCODE project provided in the RegulomeDB database. TURF is able to generate prediction scores at both organism and tissue/organ-specific levels for any non-coding variant on the genome. We present that TURF has an overall top performance in prediction by using validated variants from MPRA experiments. We also demonstrate how TURF can pick out the regulatory variants with tissue-specific function over a candidate list from associate studies. Furthermore, we found that various GWAS traits showed the enrichment of regulatory variants predicted by TURF scores in the trait-relevant organs, which indicates that these variants can be a valuable source for future studies.


Subject(s)
Genome, Human , Genomics/methods , Software , Cell Line , Data Analysis , Humans
5.
Genome Res ; 30(7): 1040-1046, 2020 07.
Article in English | MEDLINE | ID: mdl-32660981

ABSTRACT

Transcription is tightly regulated by cis-regulatory DNA elements where transcription factors (TFs) can bind. Thus, identification of TF binding sites (TFBSs) is key to understanding gene expression and whole regulatory networks within a cell. The standard approaches used for TFBS prediction, such as position weight matrices (PWMs) and chromatin immunoprecipitation followed by sequencing (ChIP-seq), are widely used but have their drawbacks, including high false-positive rates and limited antibody availability, respectively. Several computational footprinting algorithms have been developed to detect TFBSs by investigating chromatin accessibility patterns; however, these also have limitations. We have developed a footprinting method to predict TF footprints in active chromatin elements (TRACE) to improve the prediction of TFBS footprints. TRACE incorporates DNase-seq data and PWMs within a multivariate hidden Markov model (HMM) to detect footprint-like regions with matching motifs. TRACE is an unsupervised method that accurately annotates binding sites for specific TFs automatically with no requirement for pregenerated candidate binding sites or ChIP-seq training data. Compared with published footprinting algorithms, TRACE has the best overall performance with the distinct advantage of targeting multiple motifs in a single model.


Subject(s)
Chromatin/metabolism , DNA Footprinting/methods , Sequence Analysis, DNA , Transcription Factors/metabolism , Binding Sites , Cell Line , Deoxyribonucleases , Humans , K562 Cells , Markov Chains , Nucleotide Motifs
6.
Cell ; 132(2): 311-22, 2008 Jan 25.
Article in English | MEDLINE | ID: mdl-18243105

ABSTRACT

Mapping DNase I hypersensitive (HS) sites is an accurate method of identifying the location of genetic regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions. We employed high-throughput sequencing and whole-genome tiled array strategies to identify DNase I HS sites within human primary CD4+ T cells. Combining these two technologies, we have created a comprehensive and accurate genome-wide open chromatin map. Surprisingly, only 16%-21% of the identified 94,925 DNase I HS sites are found in promoters or first exons of known genes, but nearly half of the most open sites are in these regions. In conjunction with expression, motif, and chromatin immunoprecipitation data, we find evidence of cell-type-specific characteristics, including the ability to identify transcription start sites and locations of different chromatin marks utilized in these cells. In addition, and unexpectedly, our analyses have uncovered detailed features of nucleosome structure.


Subject(s)
Chromatin/genetics , Genome, Human/genetics , Algorithms , Area Under Curve , Binding Sites , CD4-Positive T-Lymphocytes/cytology , Cell Nucleus/metabolism , Chromatin Immunoprecipitation , Chromosome Mapping/methods , Chromosomes, Human , Deoxyribonuclease I/chemistry , Deoxyribonuclease I/pharmacology , Genome, Human/immunology , Histones/chemistry , Humans , Nucleosomes/chemistry , Oligonucleotide Array Sequence Analysis , Promoter Regions, Genetic , ROC Curve , Sensitivity and Specificity , Sequence Analysis, DNA , Transcription Factors/metabolism
7.
Proc Natl Acad Sci U S A ; 117(48): 30799-30804, 2020 12 01.
Article in English | MEDLINE | ID: mdl-33199612

ABSTRACT

Eukaryotic genomes are pervasively transcribed, yet most transcribed sequences lack conservation or known biological functions. In Arabidopsis thaliana, RNA polymerase V (Pol V) produces noncoding transcripts, which base pair with small interfering RNA (siRNA) and allow specific establishment of RNA-directed DNA methylation (RdDM) on transposable elements. Here, we show that Pol V transcribes much more broadly than previously expected, including subsets of both heterochromatic and euchromatic regions. At already established RdDM targets, Pol V and siRNA work together to maintain silencing. In contrast, some euchromatic sequences do not give rise to siRNA but are covered by low levels of Pol V transcription, which is needed to establish RdDM de novo if a transposon is reactivated. We propose a model where Pol V surveils the genome to make it competent to silence newly activated or integrated transposons. This indicates that pervasive transcription of nonconserved sequences may serve an essential role in maintenance of genome integrity.


Subject(s)
DNA-Directed RNA Polymerases/metabolism , Genome , RNA, Untranslated , Transcription, Genetic , Arabidopsis/genetics , Arabidopsis/metabolism , Arabidopsis Proteins/metabolism , DNA Transposable Elements , Gene Expression Regulation, Plant , Gene Silencing , Models, Biological , Multiprotein Complexes/metabolism , Substrate Specificity
8.
BMC Bioinformatics ; 23(1): 317, 2022 Aug 04.
Article in English | MEDLINE | ID: mdl-35927613

ABSTRACT

MOTIVATION: Aberrant DNA methylation in transcription factor binding sites has been shown to lead to anomalous gene regulation that is strongly associated with human disease. However, the majority of methylation-sensitive positions within transcription factor binding sites remain unknown. Here we introduce SEMplMe, a computational tool to generate predictions of the effect of methylation on transcription factor binding strength in every position within a transcription factor's motif. RESULTS: SEMplMe uses ChIP-seq and whole genome bisulfite sequencing to predict effects of methylation within binding sites. SEMplMe validates known methylation sensitive and insensitive positions within a binding motif, identifies cell type specific transcription factor binding driven by methylation, and outperforms SELEX-based predictions for CTCF. These predictions can be used to identify aberrant sites of DNA methylation contributing to human disease. AVAILABILITY AND IMPLEMENTATION: SEMplMe is available from https://github.com/Boyle-Lab/SEMplMe .


Subject(s)
DNA Methylation , Transcription Factors , Binding Sites , Gene Expression Regulation , Humans , Protein Binding , Transcription Factors/metabolism
9.
Am J Hum Genet ; 102(1): 103-115, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29290336

ABSTRACT

Atrial fibrillation (AF) is a common cardiac arrhythmia and a major risk factor for stroke, heart failure, and premature death. The pathogenesis of AF remains poorly understood, which contributes to the current lack of highly effective treatments. To understand the genetic variation and biology underlying AF, we undertook a genome-wide association study (GWAS) of 6,337 AF individuals and 61,607 AF-free individuals from Norway, including replication in an additional 30,679 AF individuals and 278,895 AF-free individuals. Through genotyping and dense imputation mapping from whole-genome sequencing, we tested almost nine million genetic variants across the genome and identified seven risk loci, including two novel loci. One novel locus (lead single-nucleotide variant [SNV] rs12614435; p = 6.76 × 10-18) comprised intronic and several highly correlated missense variants situated in the I-, A-, and M-bands of titin, which is the largest protein in humans and responsible for the passive elasticity of heart and skeletal muscle. The other novel locus (lead SNV rs56202902; p = 1.54 × 10-11) covered a large, gene-dense chromosome 1 region that has previously been linked to cardiac conduction. Pathway and functional enrichment analyses suggested that many AF-associated genetic variants act through a mechanism of impaired muscle cell differentiation and tissue formation during fetal heart development.


Subject(s)
Atrial Fibrillation/genetics , Genetic Loci , Genetic Predisposition to Disease , Genome-Wide Association Study , Heart/embryology , Regulatory Sequences, Nucleic Acid/genetics , Humans , Inheritance Patterns/genetics , Multifactorial Inheritance/genetics , Organ Specificity/genetics , Physical Chromosome Mapping , Quantitative Trait Loci/genetics , Reproducibility of Results , Risk Factors
10.
Bioinformatics ; 36(2): 364-372, 2020 01 15.
Article in English | MEDLINE | ID: mdl-31373606

ABSTRACT

MOTIVATION: Genome-wide association studies have revealed that 88% of disease-associated single-nucleotide polymorphisms (SNPs) reside in noncoding regions. However, noncoding SNPs remain understudied, partly because they are challenging to prioritize for experimental validation. To address this deficiency, we developed the SNP effect matrix pipeline (SEMpl). RESULTS: SEMpl estimates transcription factor-binding affinity by observing differences in chromatin immunoprecipitation followed by deep sequencing signal intensity for SNPs within functional transcription factor-binding sites (TFBSs) genome-wide. By cataloging the effects of every possible mutation within the TFBS motif, SEMpl can predict the consequences of SNPs to transcription factor binding. This knowledge can be used to identify potential disease-causing regulatory loci. AVAILABILITY AND IMPLEMENTATION: SEMpl is available from https://github.com/Boyle-Lab/SEM_CPP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Binding Sites , Chromatin Immunoprecipitation , Protein Binding , Transcription Factors
11.
BMC Bioinformatics ; 21(1): 416, 2020 Sep 22.
Article in English | MEDLINE | ID: mdl-32962625

ABSTRACT

BACKGROUND: Comparative genomics studies are growing in number partly because of their unique ability to provide insight into shared and divergent biology between species. Of particular interest is the use of phylogenetic methods to infer the evolutionary history of cis-regulatory sequence features, which contribute strongly to phenotypic divergence and are frequently gained and lost in eutherian genomes. Understanding the mechanisms by which cis-regulatory element turnover generate emergent phenotypes is crucial to our understanding of adaptive evolution. Ancestral reconstruction methods can place species-specific cis-regulatory features in their evolutionary context, thus increasing our understanding of the process of regulatory sequence turnover. However, applying these methods to gain and loss of cis-regulatory features historically required complex workflows, preventing widespread adoption by the broad scientific community. RESULTS: MapGL simplifies phylogenetic inference of the evolutionary history of short genomic sequence features by combining the necessary steps into a single piece of software with a simple set of inputs and outputs. We show that MapGL can reliably disambiguate the mechanisms underlying differential regulatory sequence content across a broad range of phylogenetic topologies and evolutionary distances. Thus, MapGL provides the necessary context to evaluate how genomic sequence gain and loss contribute to species-specific divergence. CONCLUSIONS: MapGL makes phylogenetic inference of species-specific sequence gain and loss easy for both expert and non-expert users, making it a powerful tool for gaining novel insights into genome evolution.


Subject(s)
Evolution, Molecular , Genome/genetics , Genomics/methods , Regulatory Sequences, Nucleic Acid , Software , Animals , Humans , Mammals/genetics , Phenotype , Phylogeny
12.
Trends Genet ; 33(1): 34-45, 2017 01.
Article in English | MEDLINE | ID: mdl-27939749

ABSTRACT

One of the formative goals of genetics research is to understand how genetic variation leads to phenotypic differences and human disease. Genome-wide association studies (GWASs) bring us closer to this goal by linking variation with disease faster than ever before. Despite this, GWASs alone are unable to pinpoint disease-causing single nucleotide polymorphisms (SNPs). Noncoding SNPs, which represent the majority of GWAS SNPs, present a particular challenge. To address this challenge, an array of computational tools designed to prioritize and predict the function of noncoding GWAS SNPs have been developed. However, fewer than 40% of GWAS publications from 2015 utilized these tools. We discuss several leading methods for annotating noncoding variants and how they can be integrated into research pipelines in hopes that they will be broadly applied in future GWAS analyses.


Subject(s)
Computational Biology , Genome-Wide Association Study , Polymorphism, Single Nucleotide/genetics , Regulatory Sequences, Nucleic Acid/genetics , Genetic Predisposition to Disease , Humans , Molecular Sequence Annotation
13.
Nature ; 512(7515): 400-5, 2014 Aug 28.
Article in English | MEDLINE | ID: mdl-25164749

ABSTRACT

Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments. Integration of regulatory binding and cellular-resolution expression data produced a spatiotemporally resolved metazoan transcription factor binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of transcription factors, characterizing the genomic coverage and clustering of regulatory binding, the binding preferences of, and biological processes regulated by, transcription factors, the global transcription factor co-associations and genomic subdomains that suggest shared patterns of regulation, and identifying key transcription factors and transcription factor co-associations for fate specification of individual lineages and cell types.


Subject(s)
Caenorhabditis elegans/growth & development , Caenorhabditis elegans/genetics , Gene Expression Regulation, Developmental/genetics , Genome, Helminth/genetics , Spatio-Temporal Analysis , Transcription Factors/metabolism , Animals , Binding Sites , Caenorhabditis elegans/cytology , Caenorhabditis elegans/embryology , Caenorhabditis elegans Proteins/metabolism , Cell Lineage , Chromatin Immunoprecipitation , Genomics , Larva/cytology , Larva/genetics , Larva/growth & development , Larva/metabolism , Protein Binding
14.
Nature ; 515(7527): 371-375, 2014 Nov 20.
Article in English | MEDLINE | ID: mdl-25409826

ABSTRACT

To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.


Subject(s)
Conserved Sequence/genetics , Genome/genetics , Genomics , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Animals , Cell Line , Chromatin/genetics , Chromatin/metabolism , Enhancer Elements, Genetic/genetics , Humans , Mice , Polymorphism, Single Nucleotide/genetics
15.
Nature ; 512(7515): 453-6, 2014 Aug 28.
Article in English | MEDLINE | ID: mdl-25164757

ABSTRACT

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Subject(s)
Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Evolution, Molecular , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Transcription Factors/metabolism , Animals , Binding Sites , Caenorhabditis elegans/growth & development , Chromatin Immunoprecipitation , Conserved Sequence/genetics , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Genome/genetics , Humans , Molecular Sequence Annotation , Nucleotide Motifs/genetics , Organ Specificity/genetics , Transcription Factors/genetics
16.
Nucleic Acids Res ; 46(4): 1878-1894, 2018 02 28.
Article in English | MEDLINE | ID: mdl-29361190

ABSTRACT

The mouse is widely used as system to study human genetic mechanisms. However, extensive rewiring of transcriptional regulatory networks often confounds translation of findings between human and mouse. Site-specific gain and loss of individual transcription factor binding sites (TFBS) has caused functional divergence of orthologous regulatory loci, and so we must look beyond this positional conservation to understand common themes of regulatory control. Fortunately, transcription factor co-binding patterns shared across species often perform conserved regulatory functions. These can be compared to 'regulatory sentences' that retain the same meanings regardless of sequence and species context. By analyzing TFBS co-occupancy patterns observed in four human and mouse cell types, we learned a regulatory grammar: the rules by which TFBS are combined into meaningful regulatory sentences. Different parts of this grammar associate with specific sets of functional annotations regardless of sequence conservation and predict functional signatures more accurately than positional conservation. We further show that both species-specific and conserved portions of this grammar are involved in gene expression divergence and human disease risk. These findings expand our understanding of transcriptional regulatory mechanisms, suggesting that phenotypic divergence and disease risk are driven by a complex interplay between deeply conserved and species-specific transcriptional regulatory pathways.


Subject(s)
Gene Expression Regulation , Mice/genetics , Transcription Factors/metabolism , Animals , Base Sequence , Binding Sites , Chromatin , Conserved Sequence , Disease/genetics , Evolution, Molecular , Genetic Loci , Humans , Immune System , Polymorphism, Single Nucleotide , Species Specificity
17.
Hum Mutat ; 40(9): 1292-1298, 2019 09.
Article in English | MEDLINE | ID: mdl-31228310

ABSTRACT

Here we present a computational model, Score of Unified Regulatory Features (SURF), that predicts functional variants in enhancer and promoter elements. SURF is trained on data from massively parallel reporter assays and predicts the effect of variants on reporter expression levels. It achieved the top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" challenge. We also show that features queried through RegulomeDB, which are direct annotations from functional genomics data, help improve prediction accuracy beyond transfer learning features from DNA sequence-based deep learning models. Some of the most important features include DNase footprints, especially when coupled with complementary ChIP-seq data. Furthermore, we found our model achieved good performance in predicting allele-specific transcription factor binding events. As an extension to the current scoring system in RegulomeDB, we expect our computational model to prioritize variants in regulatory regions, thus help the understanding of functional variants in noncoding regions that lead to disease.


Subject(s)
Enhancer Elements, Genetic , Genetic Variation , Genomics/methods , Promoter Regions, Genetic , Deep Learning , Genetic Predisposition to Disease , Genome, Human , Humans , Models, Genetic , Sequence Analysis, DNA/methods
18.
Hum Mutat ; 40(9): 1280-1291, 2019 09.
Article in English | MEDLINE | ID: mdl-31106481

ABSTRACT

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.


Subject(s)
DNA/chemistry , Epigenomics/methods , Point Mutation , Binding Sites , Cell Line , Chromatin/genetics , DNA/metabolism , Enhancer Elements, Genetic , Genetic Predisposition to Disease , Humans , Machine Learning , Promoter Regions, Genetic , Transcription Factors/metabolism
19.
Trends Genet ; 32(4): 238-249, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26962025

ABSTRACT

The ENCODE project represents a major leap from merely describing and comparing genomic sequences to surveying them for direct indicators of function. The astounding quantity of data produced by the ENCODE consortium can serve as a map to locate specific landmarks, guide hypothesis generation, and lead us to principles and mechanisms underlying genome biology. Despite its broad appeal, the size and complexity of the repository can be intimidating to prospective users. We present here some background about the ENCODE data, survey the resources available for accessing them, and describe a few simple principles to help prospective users choose the data type(s) that best suit their needs, where to get them, and how to use them to their best advantage.


Subject(s)
Genomics , Databases, Genetic , Humans , Internet , Polymorphism, Single Nucleotide
20.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955619

ABSTRACT

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Subject(s)
DNA/genetics , Encyclopedias as Topic , Gene Regulatory Networks/genetics , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Alleles , Cell Line , GATA1 Transcription Factor/metabolism , Gene Expression Profiling , Genomics , Humans , K562 Cells , Organ Specificity , Phosphorylation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Interaction Maps , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Selection, Genetic/genetics , Transcription Initiation Site
SELECTION OF CITATIONS
SEARCH DETAIL