Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 80
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Article in English | MEDLINE | ID: mdl-37001506

ABSTRACT

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Subject(s)
Epigenome , Quantitative Trait Loci , Genome-Wide Association Study , Genomics , Phenotype , Polymorphism, Single Nucleotide
2.
Cell ; 177(2): 231-242, 2019 04 04.
Article in English | MEDLINE | ID: mdl-30951667

ABSTRACT

The Extracellular RNA Communication Consortium (ERCC) was launched to accelerate progress in the new field of extracellular RNA (exRNA) biology and to establish whether exRNAs and their carriers, including extracellular vesicles (EVs), can mediate intercellular communication and be utilized for clinical applications. Phase 1 of the ERCC focused on exRNA/EV biogenesis and function, discovery of exRNA biomarkers, development of exRNA/EV-based therapeutics, and construction of a robust set of reference exRNA profiles for a variety of biofluids. Here, we present progress by ERCC investigators in these areas, and we discuss collaborative projects directed at development of robust methods for EV/exRNA isolation and analysis and tools for sharing and computational analysis of exRNA profiling data.


Subject(s)
Cell-Free Nucleic Acids/genetics , Cell-Free Nucleic Acids/metabolism , Extracellular Vesicles/genetics , Biomarkers , Humans , Knowledge Bases , MicroRNAs/genetics , RNA/genetics
3.
Cell ; 177(2): 463-477.e15, 2019 04 04.
Article in English | MEDLINE | ID: mdl-30951672

ABSTRACT

To develop a map of cell-cell communication mediated by extracellular RNA (exRNA), the NIH Extracellular RNA Communication Consortium created the exRNA Atlas resource (https://exrna-atlas.org). The Atlas version 4P1 hosts 5,309 exRNA-seq and exRNA qPCR profiles from 19 studies and a suite of analysis and visualization tools. To analyze variation between profiles, we apply computational deconvolution. The analysis leads to a model with six exRNA cargo types (CT1, CT2, CT3A, CT3B, CT3C, CT4), each detectable in multiple biofluids (serum, plasma, CSF, saliva, urine). Five of the cargo types associate with known vesicular and non-vesicular (lipoprotein and ribonucleoprotein) exRNA carriers. To validate utility of this model, we re-analyze an exercise response study by deconvolution to identify physiologically relevant response pathways that were not detected previously. To enable wide application of this model, as part of the exRNA Atlas resource, we provide tools for deconvolution and analysis of user-provided case-control studies.


Subject(s)
Cell Communication/physiology , RNA/metabolism , Adult , Body Fluids/chemistry , Cell-Free Nucleic Acids/metabolism , Circulating MicroRNA/metabolism , Extracellular Vesicles/metabolism , Female , Humans , Male , Reproducibility of Results , Sequence Analysis, RNA/methods , Software
4.
Nature ; 583(7818): 699-710, 2020 07.
Article in English | MEDLINE | ID: mdl-32728249

ABSTRACT

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.


Subject(s)
DNA/genetics , Databases, Genetic , Genome/genetics , Genomics , Molecular Sequence Annotation , Registries , Regulatory Sequences, Nucleic Acid/genetics , Animals , Chromatin/genetics , Chromatin/metabolism , DNA/chemistry , DNA Footprinting , DNA Methylation/genetics , DNA Replication Timing , Deoxyribonuclease I/metabolism , Genome, Human , Histones/metabolism , Humans , Mice , Mice, Transgenic , RNA-Binding Proteins/genetics , Transcription, Genetic/genetics , Transposases/metabolism
5.
Nat Methods ; 17(8): 807-814, 2020 08.
Article in English | MEDLINE | ID: mdl-32737473

ABSTRACT

Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters.


Subject(s)
Epigenesis, Genetic/physiology , Pattern Recognition, Automated/methods , Animals , Cell Line , Drosophila , Histones/genetics , Histones/metabolism , Humans , Mice , Mice, Transgenic , Reproducibility of Results
7.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Article in English | MEDLINE | ID: mdl-25164755

ABSTRACT

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Subject(s)
Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Gene Expression Profiling , Transcriptome/genetics , Animals , Caenorhabditis elegans/embryology , Caenorhabditis elegans/growth & development , Chromatin/genetics , Cluster Analysis , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Histones/metabolism , Humans , Larva/genetics , Larva/growth & development , Models, Genetic , Molecular Sequence Annotation , Promoter Regions, Genetic/genetics , Pupa/genetics , Pupa/growth & development , RNA, Untranslated/genetics , Sequence Analysis, RNA
8.
Nature ; 512(7515): 453-6, 2014 Aug 28.
Article in English | MEDLINE | ID: mdl-25164757

ABSTRACT

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Subject(s)
Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Evolution, Molecular , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Transcription Factors/metabolism , Animals , Binding Sites , Caenorhabditis elegans/growth & development , Chromatin Immunoprecipitation , Conserved Sequence/genetics , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Genome/genetics , Humans , Molecular Sequence Annotation , Nucleotide Motifs/genetics , Organ Specificity/genetics , Transcription Factors/genetics
9.
Trends Genet ; 32(5): 251-253, 2016 05.
Article in English | MEDLINE | ID: mdl-27005445

ABSTRACT

The emergence of collective creative enterprise such as large scientific consortia is a unique feature in modern scientific research. We analyzed the temporal co-authorship network structures of ENCODE and modENCODE consortia. Our analysis revealed that the consortium members work closely as a community whereas non-members collaborate in the scale of a few laboratories. We also identified a few brokers playing an important role to facilitate collaborations with outside researchers.


Subject(s)
Cooperative Behavior , Peer Review, Research/trends , Humans
10.
Bioinformatics ; 34(1): 1-8, 2018 01 01.
Article in English | MEDLINE | ID: mdl-28961734

ABSTRACT

Motivation: Analysis of RNA sequencing (RNA-Seq) data in human saliva is challenging. Lack of standardization and unification of the bioinformatic procedures undermines saliva's diagnostic potential. Thus, it motivated us to perform this study. Results: We applied principal pipelines for bioinformatic analysis of small RNA-Seq data of saliva of 98 healthy Korean volunteers including either direct or indirect mapping of the reads to the human genome using Bowtie1. Analysis of alignments to exogenous genomes by another pipeline revealed that almost all of the reads map to bacterial genomes. Thus, salivary exRNA has fundamental properties that warrant the design of unique additional steps while performing the bioinformatic analysis. Our pipelines can serve as potential guidelines for processing of RNA-Seq data of human saliva. Availability and implementation: Processing and analysis results of the experimental data generated by the exceRpt (v4.6.3) small RNA-seq pipeline (github.gersteinlab.org/exceRpt) are available from exRNA atlas (exrna-atlas.org). Alignment to exogenous genomes and their quantification results were used in this paper for the analyses of small RNAs of exogenous origin. Contact: dtww@ucla.edu.


Subject(s)
Computational Biology/methods , Sequence Analysis, RNA/methods , Software , High-Throughput Nucleotide Sequencing/methods , Humans , RNA , Saliva/chemistry
11.
J Proteome Res ; 17(10): 3431-3444, 2018 10 05.
Article in English | MEDLINE | ID: mdl-30125121

ABSTRACT

Cellular control of gene expression is a complex process that is subject to multiple levels of regulation, but ultimately it is the protein produced that determines the biosynthetic state of the cell. One way that a cell can regulate the protein output from each gene is by expressing alternate isoforms with distinct amino acid sequences. These isoforms may exhibit differences in localization and binding interactions that can have profound functional implications. High-throughput liquid chromatography tandem mass spectrometry proteomics (LC-MS/MS) relies on enzymatic digestion and has lower coverage and sensitivity than transcriptomic profiling methods such as RNA-seq. Digestion results in predictable fragmentation of a protein, which can limit the generation of peptides capable of distinguishing between isoforms. Here we exploit transcript-level expression from RNA-seq to set prior likelihoods and enable protein isoform abundances to be directly estimated from LC-MS/MS, an approach derived from the principle that most genes appear to be expressed as a single dominant isoform in a given cell type or tissue. Through this deep integration of RNA-seq and LC-MS/MS data from the same sample, we show that a principal isoform can be identified in >80% of gene products in homogeneous HEK293 cell culture and >70% of proteins detected in complex human brain tissue. We demonstrate that the incorporation of translatome data from ribosome profiling further refines this process. Defining isoforms in experiments with matched RNA-seq/translatome and proteomic data increases the functional relevance of such data sets and will further broaden our understanding of multilevel control of gene expression.


Subject(s)
Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , Proteome/metabolism , Proteomics/methods , Algorithms , Alternative Splicing , Chromatography, Liquid/methods , HEK293 Cells , Humans , Protein Biosynthesis/genetics , Protein Isoforms/genetics , Protein Isoforms/metabolism , Proteome/genetics , Reproducibility of Results , Ribosomes/genetics , Ribosomes/metabolism , Tandem Mass Spectrometry/methods
12.
BMC Genomics ; 19(1): 331, 2018 May 05.
Article in English | MEDLINE | ID: mdl-29728066

ABSTRACT

BACKGROUND: Evolving interest in comprehensively profiling the full range of small RNAs present in small tissue biopsies and in circulating biofluids, and how the profile differs with disease, has launched small RNA sequencing (RNASeq) into more frequent use. However, known biases associated with small RNASeq, compounded by low RNA inputs, have been both a significant concern and a hurdle to widespread adoption. As RNASeq is becoming a viable choice for the discovery of small RNAs in low input samples and more labs are employing it, there should be benchmark datasets to test and evaluate the performance of new sequencing protocols and operators. In a recent publication from the National Institute of Standards and Technology, Pine et al., 2018, the investigators used a commercially available set of three tissues and tested performance across labs and platforms. RESULTS: In this paper, we further tested the performance of low RNA input in three commonly used and commercially available RNASeq library preparation kits; NEB Next, NEXTFlex, and TruSeq small RNA library preparation. We evaluated the performance of the kits at two different sites, using three different tissues (brain, liver, and placenta) with high (1 µg) and low RNA (10 ng) input from tissue samples, or 5.0, 3.0, 2.0, 1.0, 0.5, and 0.2 ml starting volumes of plasma. As there has been a lack of robust validation platforms for differentially expressed miRNAs, we also compared low input RNASeq data with their expression profiles on three different platforms (Abcam Fireplex, HTG EdgeSeq, and Qiagen miRNome). CONCLUSIONS: The concordance of RNASeq results on these three platforms was dependent on the RNA expression level; the higher the expression, the better the reproducibility. The results provide an extensive analysis of small RNASeq kit performance using low RNA input, and replication of these data on three downstream technologies.


Subject(s)
Gene Library , RNA/metabolism , Brain/metabolism , Female , High-Throughput Nucleotide Sequencing , Humans , Liver/metabolism , MicroRNAs/analysis , MicroRNAs/chemistry , Placenta/metabolism , Pregnancy , Principal Component Analysis , RNA/chemistry , Reagent Kits, Diagnostic , Sequence Analysis, RNA
13.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955619

ABSTRACT

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Subject(s)
DNA/genetics , Encyclopedias as Topic , Gene Regulatory Networks/genetics , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Alleles , Cell Line , GATA1 Transcription Factor/metabolism , Gene Expression Profiling , Genomics , Humans , K562 Cells , Organ Specificity , Phosphorylation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Interaction Maps , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Selection, Genetic/genetics , Transcription Initiation Site
14.
Nature ; 489(7414): 101-8, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955620

ABSTRACT

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.


Subject(s)
DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic/genetics , Transcriptome/genetics , Alleles , Cell Line , DNA, Intergenic/genetics , Enhancer Elements, Genetic , Exons/genetics , Gene Expression Profiling , Genes/genetics , Genomics , Humans , Polyadenylation/genetics , Protein Isoforms/genetics , RNA/biosynthesis , RNA/genetics , RNA Editing/genetics , RNA Splicing/genetics , Repetitive Sequences, Nucleic Acid/genetics , Sequence Analysis, RNA
15.
Stroke ; 48(4): 828-834, 2017 04.
Article in English | MEDLINE | ID: mdl-28289238

ABSTRACT

BACKGROUND AND PURPOSE: There is increasing interest in extracellular RNAs (ex-RNAs), with numerous reports of associations between selected microRNAs (miRNAs) and a variety of cardiovascular disease phenotypes. Previous studies of ex-RNAs in relation to risk for cardiovascular disease have investigated small numbers of patients and assayed only candidate miRNAs. No human studies have investigated links between novel ex-RNAs and stroke. METHODS: We conducted unbiased next-generation sequencing using plasma from 40 participants of the FHS (Framingham Heart Study; Offspring Cohort Exam 8) followed by high-throughput polymerase chain reaction of 471 ex-RNAs. The reverse transcription quantitative polymerase chain reaction included 331 of the most abundant miRNAs, 43 small nucleolar RNAs, and 97 piwi-interacting RNAs in 2763 additional FHS participants and explored the relations of ex-RNAs and prevalent (n=63) and incident (n=51) stroke and coronary heart disease (prevalent=286, incident=69). RESULTS: After adjustment for multiple cardiovascular disease risk factors, 7 ex-RNAs were associated with stroke prevalence or incidence; there were no ex-RNA associated with prevalent or incident coronary heart disease. Statistically significant ex-RNA associations with stroke were specific, with no overlap between prevalent and incident events. CONCLUSIONS: This is the largest study of ex-RNAs in relation to stroke using an unbiased approach in an observational cohort and the first large study to examine human small noncoding RNAs beyond miRNAs. These results demonstrate that when studied in a large observational cohort, extracellular miRNAs are associated with stroke risk.


Subject(s)
Coronary Disease/blood , MicroRNAs/blood , RNA, Small Interfering/blood , RNA, Small Nucleolar/blood , Stroke/blood , Aged , Cohort Studies , Coronary Disease/epidemiology , Female , High-Throughput Nucleotide Sequencing , Humans , Incidence , Male , Massachusetts/epidemiology , Middle Aged , Prevalence , Stroke/epidemiology
16.
Proc Natl Acad Sci U S A ; 111(37): 13361-6, 2014 Sep 16.
Article in English | MEDLINE | ID: mdl-25157146

ABSTRACT

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.


Subject(s)
Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Phylogeny , Pseudogenes/genetics , Animals , Evolution, Molecular , Genetic Association Studies , Humans , Molecular Sequence Annotation , Promoter Regions, Genetic/genetics , Sequence Homology, Nucleic Acid
17.
PLoS Comput Biol ; 11(4): e1004132, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25884877

ABSTRACT

The topology of the gene-regulatory network has been extensively analyzed. Now, given the large amount of available functional genomic data, it is possible to go beyond this and systematically study regulatory circuits in terms of logic elements. To this end, we present Loregic, a computational method integrating gene expression and regulatory network data, to characterize the cooperativity of regulatory factors. Loregic uses all 16 possible two-input-one-output logic gates (e.g. AND or XOR) to describe triplets of two factors regulating a common target. We attempt to find the gate that best matches each triplet's observed gene expression pattern across many conditions. We make Loregic available as a general-purpose tool (github.com/gersteinlab/loregic). We validate it with known yeast transcription-factor knockout experiments. Next, using human ENCODE ChIP-Seq and TCGA RNA-Seq data, we are able to demonstrate how Loregic characterizes complex circuits involving both proximally and distally regulating transcription factors (TFs) and also miRNAs. Furthermore, we show that MYC, a well-known oncogenic driving TF, can be modeled as acting independently from other TFs (e.g., using OR gates) but antagonistically with repressing miRNAs. Finally, we inter-relate Loregic's gate logic with other aspects of regulation, such as indirect binding via protein-protein interactions, feed-forward loop motifs and global regulatory hierarchy.


Subject(s)
Gene Regulatory Networks/genetics , Genes, Regulator/genetics , Logistic Models , Models, Genetic , Transcription Factors/genetics , Transcriptional Activation/genetics , Algorithms , Animals , Computer Simulation , Gene Expression Regulation/genetics , Humans , Leukemia/genetics , MicroRNAs/genetics
18.
Nat Rev Genet ; 11(8): 559-71, 2010 Aug.
Article in English | MEDLINE | ID: mdl-20628352

ABSTRACT

Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.


Subject(s)
DNA, Intergenic/genetics , Genome, Human , Genomics/methods , Animals , Chromosome Mapping , Conserved Sequence , DNA Transposable Elements , Genomics/trends , Humans , Pseudogenes , Regulatory Elements, Transcriptional , Sequence Alignment , Sequence Analysis, DNA , Tandem Repeat Sequences
19.
Genome Res ; 22(9): 1658-67, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955978

ABSTRACT

Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.


Subject(s)
Gene Expression Regulation , Genomics , Transcription Factors/metabolism , Transcription, Genetic , Base Composition , Binding Sites/genetics , Cell Line , Chromatin/genetics , Chromatin/metabolism , Computational Biology/methods , Histones/genetics , Humans , Models, Biological , Promoter Regions, Genetic , Protein Binding/genetics , Transcription Initiation Site
20.
Genome Res ; 22(9): 1813-31, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955991

ABSTRACT

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.


Subject(s)
Chromatin Immunoprecipitation/methods , Databases, Genetic , High-Throughput Nucleotide Sequencing/methods , Animals , Genome/genetics , Genomics/methods , Guidelines as Topic , Histones/metabolism , Humans , Internet , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL