Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
Front Plant Sci ; 10: 1434, 2019.
Article in English | MEDLINE | ID: mdl-31798605

ABSTRACT

The genome is reprogrammed during development to produce diverse cell types, largely through altered expression and activity of key transcription factors. The accessibility and critical functions of epidermal cells have made them a model for connecting transcriptional events to development in a range of model systems. In Arabidopsis thaliana and many other plants, fertilization triggers differentiation of specialized epidermal seed coat cells that have a unique morphology caused by large extracellular deposits of polysaccharides. Here, we used DNase I-seq to generate regulatory landscapes of A. thaliana seeds at two critical time points in seed coat maturation (4 and 7 DPA), enriching for seed coat cells with the INTACT method. We found over 3,000 developmentally dynamic regulatory DNA elements and explored their relationship with nearby gene expression. The dynamic regulatory elements were enriched for motifs for several transcription factors families; most notably the TCP family at the earlier time point and the MYB family at the later one. To assess the extent to which the observed regulatory sites in seeds added to previously known regulatory sites in A. thaliana, we compared our data to 11 other data sets generated with 7-day-old seedlings for diverse tissues and conditions. Surprisingly, over a quarter of the regulatory, i.e. accessible, bases observed in seeds were novel. Notably, plant regulatory landscapes from different tissues, cell types, or developmental stages were more dynamic than those generated from bulk tissue in response to environmental perturbations, highlighting the importance of extending studies of regulatory DNA to single tissues and cell types during development.

2.
Bioinformatics ; 35(22): 4767-4769, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31161210

ABSTRACT

SUMMARY: The Illumina Infinium EPIC BeadChip is a new high-throughput array for DNA methylation analysis, extending the earlier 450k array by over 400 000 new sites. Previously, a method named eFORGE was developed to provide insights into cell type-specific and cell-composition effects for 450k data. Here, we present a significantly updated and improved version of eFORGE that can analyze both EPIC and 450k array data. New features include analysis of chromatin states, transcription factor motifs and DNase I footprints, providing tools for epigenome-wide association study interpretation and epigenome editing. AVAILABILITY AND IMPLEMENTATION: eFORGE v2.0 is implemented as a web tool available from https://eforge.altiusinstitute.org and https://eforge-tf.altiusinstitute.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
DNA Methylation , Epigenomics , Chromatin , CpG Islands , Deoxyribonuclease I , Oligonucleotide Array Sequence Analysis , Software
3.
Methods Mol Biol ; 1418: 267-81, 2016.
Article in English | MEDLINE | ID: mdl-27008020

ABSTRACT

The bulk of modern genomics research includes, in part, analyses of large data sets, such as those derived from high resolution, high-throughput experiments, that make computations challenging. The BEDOPS toolkit offers a broad spectrum of fundamental analysis capabilities to query, operate on, and compare quantitatively genomic data sets of any size and number. The toolkit facilitates the construction of complex analysis pipelines that remain efficient in both memory and time by chaining together combinations of its complementary components. The principal utilities accept raw or compressed data in a flexible format, and they provide built-in features to expedite parallel computations.


Subject(s)
Computational Biology/methods , Genomics/methods , Software , Algorithms , Data Compression , Genome , High-Throughput Nucleotide Sequencing , Molecular Sequence Annotation , Sequence Analysis/methods
4.
Nature ; 515(7527): 365-70, 2014 Nov 20.
Article in English | MEDLINE | ID: mdl-25409825

ABSTRACT

The basic body plan and major physiological axes have been highly conserved during mammalian evolution, yet only a small fraction of the human genome sequence appears to be subject to evolutionary constraint. To quantify cis- versus trans-acting contributions to mammalian regulatory evolution, we performed genomic DNase I footprinting of the mouse genome across 25 cell and tissue types, collectively defining ∼8.6 million transcription factor (TF) occupancy sites at nucleotide resolution. Here we show that mouse TF footprints conjointly encode a regulatory lexicon that is ∼95% similar with that derived from human TF footprints. However, only ∼20% of mouse TF footprints have human orthologues. Despite substantial turnover of the cis-regulatory landscape, nearly half of all pairwise regulatory interactions connecting mouse TF genes have been maintained in orthologous human cell types through evolutionary innovation of TF recognition sequences. Furthermore, the higher-level organization of mouse TF-to-TF connections into cellular network architectures is nearly identical with human. Our results indicate that evolutionary selection on mammalian gene regulation is targeted chiefly at the level of trans-regulatory circuitry, enabling and potentiating cis-regulatory plasticity.


Subject(s)
Conserved Sequence/genetics , Evolution, Molecular , Mammals/genetics , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , Animals , DNA Footprinting , Gene Expression Regulation, Developmental/genetics , Gene Regulatory Networks/genetics , Humans , Mice
5.
Cell Rep ; 8(6): 2015-2030, 2014 Sep 25.
Article in English | MEDLINE | ID: mdl-25220462

ABSTRACT

Our understanding of gene regulation in plants is constrained by our limited knowledge of plant cis-regulatory DNA and its dynamics. We mapped DNase I hypersensitive sites (DHSs) in A. thaliana seedlings and used genomic footprinting to delineate ∼ 700,000 sites of in vivo transcription factor (TF) occupancy at nucleotide resolution. We show that variation associated with 72 diverse quantitative phenotypes localizes within DHSs. TF footprints encode an extensive cis-regulatory lexicon subject to recent evolutionary pressures, and widespread TF binding within exons may have shaped codon usage patterns. The architecture of A. thaliana TF regulatory networks is strikingly similar to that of animals in spite of diverged regulatory repertoires. We analyzed regulatory landscape dynamics during heat shock and photomorphogenesis, disclosing thousands of environmentally sensitive elements and enabling mapping of key TF regulatory circuits underlying these fundamental responses. Our results provide an extensive resource for the study of A. thaliana gene regulation and functional biology.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis/genetics , Transcription Factors/genetics , Arabidopsis/growth & development , Arabidopsis/metabolism , Arabidopsis Proteins/metabolism , Chromatin/metabolism , Chromosome Mapping , Codon , Deoxyribonuclease I/metabolism , Exons , Gene Regulatory Networks , Genome, Plant , Genome-Wide Association Study , Light , Plant Development/genetics , Protein Binding , Regulatory Elements, Transcriptional/genetics , Seedlings/genetics , Transcription Factors/metabolism
6.
Cell ; 154(4): 888-903, 2013 Aug 15.
Article in English | MEDLINE | ID: mdl-23953118

ABSTRACT

Cellular-state information between generations of developing cells may be propagated via regulatory regions. We report consistent patterns of gain and loss of DNase I-hypersensitive sites (DHSs) as cells progress from embryonic stem cells (ESCs) to terminal fates. DHS patterns alone convey rich information about cell fate and lineage relationships distinct from information conveyed by gene expression. Developing cells share a proportion of their DHS landscapes with ESCs; that proportion decreases continuously in each cell type as differentiation progresses, providing a quantitative benchmark of developmental maturity. Developmentally stable DHSs densely encode binding sites for transcription factors involved in autoregulatory feedback circuits. In contrast to normal cells, cancer cells extensively reactivate silenced ESC DHSs and those from developmental programs external to the cell lineage from which the malignancy derives. Our results point to changes in regulatory DNA landscapes as quantitative indicators of cell-fate transitions, lineage relationships, and dysfunction.


Subject(s)
Cell Lineage , Gene Expression Regulation, Developmental , Animals , Cell Differentiation , Cell Transformation, Neoplastic , Chromatin/metabolism , Embryonic Stem Cells/metabolism , Enhancer Elements, Genetic , Feedback , Humans , Mice , Stem Cells/metabolism
7.
Nat Genet ; 45(8): 852-9, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23793028

ABSTRACT

The precise splicing of genes confers an enormous transcriptional complexity to the human genome. The majority of gene splicing occurs cotranscriptionally, permitting epigenetic modifications to affect splicing outcomes. Here we show that select exonic regions are demarcated within the three-dimensional structure of the human genome. We identify a subset of exons that exhibit DNase I hypersensitivity and are accompanied by 'phantom' signals in chromatin immunoprecipitation and sequencing (ChIP-seq) that result from cross-linking with proximal promoter- or enhancer-bound factors. The capture of structural features by ChIP-seq is confirmed by chromatin interaction analysis that resolves local intragenic loops that fold exons close to cognate promoters while excluding intervening intronic sequences. These interactions of exons with promoters and enhancers are enriched for alternative splicing events, an effect reflected in cell type-specific periexonic DNase I hypersensitivity patterns. Collectively, our results connect local genome topography, chromatin structure and cis-regulatory landscapes with the generation of human transcriptional complexity by cotranscriptional splicing.


Subject(s)
Exons , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , Alternative Splicing , Chromatin Immunoprecipitation , Computational Biology , Databases, Nucleic Acid , Deoxyribonuclease I/metabolism , Enhancer Elements, Genetic , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Nucleic Acid Conformation , Organ Specificity/genetics
8.
Cell ; 151(1): 153-66, 2012 Sep 28.
Article in English | MEDLINE | ID: mdl-23021222

ABSTRACT

Regulatory T (Treg) cells, whose identity and function are defined by the transcription factor Foxp3, are indispensable for immune homeostasis. It is unclear whether Foxp3 exerts its Treg lineage specification function through active modification of the chromatin landscape and establishment of new enhancers or by exploiting a pre-existing enhancer landscape. Analysis of the chromatin accessibility of Foxp3-bound enhancers in Treg and Foxp3-negative T cells showed that Foxp3 was bound overwhelmingly to preaccessible enhancers occupied by its cofactors in precursor cells or a structurally related predecessor. Furthermore, the bulk of Foxp3-bound Treg cell enhancers lacking in Foxp3(-) CD4(+) cells became accessible upon T cell receptor activation prior to Foxp3 expression, and only a small subset associated with several functionally important genes were exclusively Treg cell specific. Thus, in a late cellular differentiation process, Foxp3 defines Treg cell functionality in an "opportunistic" manner by largely exploiting the preformed enhancer network instead of establishing a new enhancer landscape.


Subject(s)
Forkhead Transcription Factors/metabolism , T-Lymphocytes, Regulatory/cytology , Animals , CD4-Positive T-Lymphocytes/metabolism , Cell Differentiation , Chromatin/metabolism , Enhancer Elements, Genetic , Female , Forkhead Box Protein O1 , Lymphocyte Activation , Mice , Specific Pathogen-Free Organisms , T-Lymphocytes, Regulatory/metabolism
9.
Cell ; 150(6): 1274-86, 2012 Sep 14.
Article in English | MEDLINE | ID: mdl-22959076

ABSTRACT

The combinatorial cross-regulation of hundreds of sequence-specific transcription factors (TFs) defines a regulatory network that underlies cellular identity and function. Here we use genome-wide maps of in vivo DNaseI footprints to assemble an extensive core human regulatory network comprising connections among 475 sequence-specific TFs and to analyze the dynamics of these connections across 41 diverse cell and tissue types. We find that human TF networks are highly cell selective and are driven by cohorts of factors that include regulators with previously unrecognized roles in control of cellular identity. Moreover, we identify many widely expressed factors that impact transcriptional regulatory networks in a cell-selective manner. Strikingly, in spite of their inherent diversity, all cell-type regulatory networks independently converge on a common architecture that closely resembles the topology of living neuronal networks. Together, our results provide an extensive description of the circuitry, dynamics, and organizing principles of the human TF regulatory network.


Subject(s)
Gene Regulatory Networks , Transcription Factors/metabolism , Animals , DNA Footprinting , Deoxyribonuclease I/metabolism , Gene Expression Regulation , Genome-Wide Association Study , Humans , Organ Specificity
10.
Nature ; 489(7414): 75-82, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955617

ABSTRACT

DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.


Subject(s)
Chromatin/genetics , Chromatin/metabolism , DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , DNA Footprinting , DNA Methylation , DNA-Binding Proteins/metabolism , Deoxyribonuclease I/metabolism , Evolution, Molecular , Genomics , Humans , Mutation Rate , Promoter Regions, Genetic/genetics , Transcription Factors/metabolism , Transcription Initiation Site , Transcription, Genetic
11.
Nature ; 489(7414): 83-90, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955618

ABSTRACT

Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.


Subject(s)
DNA Footprinting , DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , DNA Methylation , DNA-Binding Proteins/metabolism , Deoxyribonuclease I/metabolism , Genomic Imprinting , Genomics , Humans , Polymorphism, Single Nucleotide/genetics , Transcription Initiation Site
12.
Science ; 337(6099): 1190-5, 2012 Sep 07.
Article in English | MEDLINE | ID: mdl-22955828

ABSTRACT

Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure-related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.


Subject(s)
DNA/genetics , Disease/genetics , Genetic Variation , Polymorphism, Single Nucleotide , Regulatory Elements, Transcriptional , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Alleles , Chromatin/metabolism , Chromatin/ultrastructure , Crohn Disease/genetics , Deoxyribonuclease I/metabolism , Electrocardiography , Fetal Development , Fetus/metabolism , Gene Regulatory Networks , Genome, Human , Genome-Wide Association Study , Humans , Multiple Sclerosis/genetics , Phenotype , Promoter Regions, Genetic , Transcription Factors/chemistry , Transcription Factors/genetics
13.
Genome Res ; 22(9): 1689-97, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955981

ABSTRACT

The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.


Subject(s)
Genetic Variation , Genomics , Regulatory Elements, Transcriptional , Regulatory Sequences, Nucleic Acid , Cell Line , Cell Line, Tumor , Chromosome Mapping , Deoxyribonuclease I/metabolism , Evolution, Molecular , Genetic Heterogeneity , Genome, Human , Genome-Wide Association Study , Humans , Neoplasms/genetics , Nucleotide Motifs , Polymorphism, Genetic , Population Groups/genetics , Selection, Genetic , Transcriptional Activation
14.
Bioinformatics ; 28(14): 1919-20, 2012 Jul 15.
Article in English | MEDLINE | ID: mdl-22576172

ABSTRACT

UNLABELLED: The large and growing number of genome-wide datasets highlights the need for high-performance feature analysis and data comparison methods, in addition to efficient data storage and retrieval techniques. We introduce BEDOPS, a software suite for common genomic analysis tasks which offers improved flexibility, scalability and execution time characteristics over previously published packages. The suite includes a utility to compress large inputs into a lossless format that can provide greater space savings and faster data extractions than alternatives. AVAILABILITY: http://code.google.com/p/bedops/ includes binaries, source and documentation.


Subject(s)
Data Compression/methods , Genomics/methods , Software
15.
Cell ; 146(4): 645-58, 2011 Aug 19.
Article in English | MEDLINE | ID: mdl-21854988

ABSTRACT

The human mitochondrial genome comprises a distinct genetic system transcribed as precursor polycistronic transcripts that are subsequently cleaved to generate individual mRNAs, tRNAs, and rRNAs. Here, we provide a comprehensive analysis of the human mitochondrial transcriptome across multiple cell lines and tissues. Using directional deep sequencing and parallel analysis of RNA ends, we demonstrate wide variation in mitochondrial transcript abundance and precisely resolve transcript processing and maturation events. We identify previously undescribed transcripts, including small RNAs, and observe the enrichment of several nuclear RNAs in mitochondria. Using high-throughput in vivo DNaseI footprinting, we establish the global profile of DNA-binding protein occupancy across the mitochondrial genome at single-nucleotide resolution, revealing regulatory features at mitochondrial transcription initiation sites and functional insights into disease-associated variants. This integrated analysis of the mitochondrial transcriptome reveals unexpected complexity in the regulation, expression, and processing of mitochondrial RNA and provides a resource for future studies of mitochondrial function (accessed at http://mitochondria.matticklab.com).


Subject(s)
Gene Expression Profiling , Mitochondria/genetics , RNA/analysis , Cell Nucleus/metabolism , DNA Footprinting , DNA-Binding Proteins/analysis , Deoxyribonuclease I/metabolism , Gene Expression Regulation , Genome, Mitochondrial , High-Throughput Nucleotide Sequencing , Humans , Locus Control Region , Mitochondrial Proteins/analysis , Nucleic Acid Conformation , RNA/metabolism , RNA, Mitochondrial , Sequence Analysis, RNA
16.
Nat Methods ; 6(4): 283-9, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19305407

ABSTRACT

The orchestrated binding of transcriptional activators and repressors to specific DNA sequences in the context of chromatin defines the regulatory program of eukaryotic genomes. We developed a digital approach to assay regulatory protein occupancy on genomic DNA in vivo by dense mapping of individual DNase I cleavages from intact nuclei using massively parallel DNA sequencing. Analysis of >23 million cleavages across the Saccharomyces cerevisiae genome revealed thousands of protected regulatory protein footprints, enabling de novo derivation of factor binding motifs and the identification of hundreds of new binding sites for major regulators. We observed striking correspondence between single-nucleotide resolution DNase I cleavage patterns and protein-DNA interactions determined by crystallography. The data also yielded a detailed view of larger chromatin features including positioned nucleosomes flanking factor binding regions. Digital genomic footprinting should be a powerful approach to delineate the cis-regulatory framework of any organism with an available genome sequence.


Subject(s)
DNA Footprinting/methods , DNA/chemistry , DNA/genetics , Protein Interaction Mapping/methods , Sequence Analysis, DNA/methods , Transcription Factors/chemistry , Transcription Factors/genetics , Algorithms , Amino Acid Sequence , Base Sequence , Binding Sites , Molecular Sequence Data , Protein Binding
17.
Genome Biol ; 9(12): R168, 2008.
Article in English | MEDLINE | ID: mdl-19055709

ABSTRACT

BACKGROUND: Conserved non-coding sequences in the human genome are approximately tenfold more abundant than known genes, and have been hypothesized to mark the locations of cis-regulatory elements. However, the global contribution of conserved non-coding sequences to the transcriptional regulation of human genes is currently unknown. Deeply conserved elements shared between humans and teleost fish predominantly flank genes active during morphogenesis and are enriched for positive transcriptional regulatory elements. However, such deeply conserved elements account for <1% of the conserved non-coding sequences in the human genome, which are predominantly mammalian. RESULTS: We explored the regulatory potential of a large sample of these 'common' conserved non-coding sequences using a variety of classic assays, including chromatin remodeling, and enhancer/repressor and promoter activity. When tested across diverse human model cell types, we find that the fraction of experimentally active conserved non-coding sequences within any given cell type is low (approximately 5%), and that this proportion increases only modestly when considered collectively across cell types. CONCLUSIONS: The results suggest that classic assays of cis-regulatory potential are unlikely to expose the functional potential of the substantial majority of mammalian conserved non-coding sequences in the human genome.


Subject(s)
Conserved Sequence/genetics , Genome, Human , Regulatory Sequences, Nucleic Acid , Animals , Cell Line , Evolution, Molecular , Genome , Humans , Mice
18.
Nucleic Acids Res ; 35(14): 4809-19, 2007.
Article in English | MEDLINE | ID: mdl-17621584

ABSTRACT

We applied a computational pipeline based on comparative genomics to bacteria, and identified 22 novel candidate RNA motifs. We predicted six to be riboswitches, which are mRNA elements that regulate gene expression on binding a specific metabolite. In separate studies, we confirmed that two of these are novel riboswitches. Three other riboswitch candidates are upstream of either a putative transporter gene in the order Lactobacillales, citric acid cycle genes in Burkholderiales or molybdenum cofactor biosynthesis genes in several phyla. The remaining riboswitch candidate, the widespread Genes for the Environment, for Membranes and for Motility (GEMM) motif, is associated with genes important for natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. Among the other motifs, one has a genetic distribution similar to a previously published candidate riboswitch, ykkC/yxkD, but has a different structure. We identified possible non-coding RNAs in five phyla, and several additional cis-regulatory RNAs, including one in epsilon-proteobacteria (upstream of purD, involved in purine biosynthesis), and one in Cyanobacteria (within an ATP synthase operon). These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered.


Subject(s)
Genomics/methods , RNA, Bacterial/chemistry , Regulatory Sequences, Ribonucleic Acid , Sequence Analysis, RNA/methods , Base Sequence , Computational Biology , Consensus Sequence , Genome, Bacterial , Molecular Sequence Data , Nucleic Acid Conformation , RNA, Messenger/chemistry , RNA, Untranslated/chemistry
19.
PLoS Comput Biol ; 3(7): e126, 2007 Jul.
Article in English | MEDLINE | ID: mdl-17616982

ABSTRACT

Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair-level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth.


Subject(s)
Computational Biology/methods , RNA, Untranslated/analysis , Sequence Homology, Nucleic Acid , Artificial Intelligence , Base Sequence , Conserved Sequence , Databases, Nucleic Acid , Genes, Regulator , Genome, Bacterial , Molecular Sequence Data , Nucleic Acid Conformation , Pattern Recognition, Automated , RNA, Bacterial/analysis
20.
Nature ; 447(7146): 799-816, 2007 Jun 14.
Article in English | MEDLINE | ID: mdl-17571346

ABSTRACT

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.


Subject(s)
Genome, Human/genetics , Genomics , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic/genetics , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation , Conserved Sequence/genetics , DNA Replication , Evolution, Molecular , Exons/genetics , Genetic Variation/genetics , Heterozygote , Histones/metabolism , Humans , Pilot Projects , Protein Binding , RNA, Messenger/genetics , RNA, Untranslated/genetics , Transcription Factors/metabolism , Transcription Initiation Site
SELECTION OF CITATIONS
SEARCH DETAIL
...