Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 10 de 10
1.
Res Sq ; 2023 Jul 19.
Article En | MEDLINE | ID: mdl-37503119

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

2.
bioRxiv ; 2023 Apr 06.
Article En | MEDLINE | ID: mdl-37066421

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

3.
Genome Res ; 29(11): 1900-1909, 2019 11.
Article En | MEDLINE | ID: mdl-31645363

MicroRNAs (miRNAs) play a critical role as posttranscriptional regulators of gene expression. The ENCODE Project profiled the expression of miRNAs in an extensive set of organs during a time-course of mouse embryonic development and captured the expression dynamics of 785 miRNAs. We found distinct organ-specific and developmental stage-specific miRNA expression clusters, with an overall pattern of increasing organ-specific expression as embryonic development proceeds. Comparative analysis of conserved miRNAs in mouse and human revealed stronger clustering of expression patterns by organ type rather than by species. An analysis of messenger RNA expression clusters compared with miRNA expression clusters identifies the potential role of specific miRNA expression clusters in suppressing the expression of mRNAs specific to other developmental programs in the organ in which these miRNAs are expressed during embryonic development. Our results provide the most comprehensive time-course of miRNA expression as part of an integrated ENCODE reference data set for mouse embryonic development.


Embryonic Development/genetics , MicroRNAs/genetics , Animals , Female , Gene Expression Regulation, Developmental , Mice , Pregnancy , RNA, Messenger/genetics
4.
Genome Biol ; 17(1): 151, 2016 07 08.
Article En | MEDLINE | ID: mdl-27391956

BACKGROUND: A comparison of transcriptional profiles derived from different tissues in a given species or among different species assumes that commonalities reflect evolutionarily conserved programs and that differences reflect species or tissue responses to environmental conditions or developmental program staging. Apparently conflicting results have been published regarding whether organ-specific transcriptional patterns dominate over species-specific patterns, or vice versa, making it unclear to what extent the biology of a given organism can be extrapolated to another. These studies have in common that they treat the transcriptomes monolithically, implicitly ignoring that each gene is likely to have a specific pattern of transcriptional variation across organs and species. RESULTS: We use linear models to quantify this pattern. We find a continuum in the spectrum of expression variation: the expression of some genes varies considerably across species and little across organs, and simply reflects evolutionary distance. At the other extreme are genes whose expression varies considerably across organs and little across species; these genes are much more likely to be associated with diseases than are genes whose expression varies predominantly across species. CONCLUSIONS: Whether transcriptomes, when considered globally, cluster preferentially according to one component or the other may not be a property of the transcriptomes, but rather a consequence of the dominant behavior of a subset of genes. Therefore, the values of the components of the variance of expression for each gene could become a useful resource when planning, interpreting, and extrapolating experimental data from mouse to humans.


Evolution, Molecular , Gene Expression Regulation, Developmental/genetics , Organ Specificity/genetics , Transcriptome/genetics , Animals , Gene Expression Profiling , Humans , Mice , Oligonucleotide Array Sequence Analysis , Sequence Analysis, RNA , Species Specificity
5.
Nat Commun ; 6: 5903, 2015 Jan 13.
Article En | MEDLINE | ID: mdl-25582907

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.


Evolution, Molecular , Gene Expression Regulation , Transcriptome , Alternative Splicing , Animals , Biological Evolution , Cell Line , Epigenesis, Genetic , Gene Expression Profiling , Gene Library , Genome , Histones/chemistry , Humans , Mice , Mice, Inbred C57BL , Models, Genetic , Oligonucleotides, Antisense , Phenotype , Sequence Analysis, RNA
6.
Nature ; 515(7527): 355-64, 2014 Nov 20.
Article En | MEDLINE | ID: mdl-25409824

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.


Genome/genetics , Genomics , Mice/genetics , Molecular Sequence Annotation , Animals , Cell Lineage/genetics , Chromatin/genetics , Chromatin/metabolism , Conserved Sequence/genetics , DNA Replication/genetics , Deoxyribonuclease I/metabolism , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Genome-Wide Association Study , Humans , RNA/genetics , Regulatory Sequences, Nucleic Acid/genetics , Species Specificity , Transcription Factors/metabolism , Transcriptome/genetics
7.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Article En | MEDLINE | ID: mdl-25164755

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Gene Expression Profiling , Transcriptome/genetics , Animals , Caenorhabditis elegans/embryology , Caenorhabditis elegans/growth & development , Chromatin/genetics , Cluster Analysis , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Histones/metabolism , Humans , Larva/genetics , Larva/growth & development , Models, Genetic , Molecular Sequence Annotation , Promoter Regions, Genetic/genetics , Pupa/genetics , Pupa/growth & development , RNA, Untranslated/genetics , Sequence Analysis, RNA
8.
Nature ; 489(7414): 101-8, 2012 Sep 06.
Article En | MEDLINE | ID: mdl-22955620

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.


DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic/genetics , Transcriptome/genetics , Alleles , Cell Line , DNA, Intergenic/genetics , Enhancer Elements, Genetic , Exons/genetics , Gene Expression Profiling , Genes/genetics , Genomics , Humans , Polyadenylation/genetics , Protein Isoforms/genetics , RNA/biosynthesis , RNA/genetics , RNA Editing/genetics , RNA Splicing/genetics , Repetitive Sequences, Nucleic Acid/genetics , Sequence Analysis, RNA
9.
PLoS One ; 7(1): e28213, 2012.
Article En | MEDLINE | ID: mdl-22238572

The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5' and 3' transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.


Cells/metabolism , Gene Regulatory Networks/physiology , RNA/physiology , Transcriptome/physiology , Algorithms , Chimerin Proteins/chemistry , Chimerin Proteins/genetics , Chromosomes, Human, Pair 1/genetics , Female , Gene Expression Profiling , Gene Regulatory Networks/genetics , Humans , Male , Microarray Analysis/methods , Models, Biological , Nucleic Acid Amplification Techniques/methods , RNA/genetics , RNA Isoforms/chemistry , RNA Isoforms/genetics , RNA Isoforms/metabolism , Transcription, Genetic/genetics , Validation Studies as Topic
10.
Nature ; 471(7339): 473-9, 2011 Mar 24.
Article En | MEDLINE | ID: mdl-21179090

Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.


Drosophila melanogaster/growth & development , Drosophila melanogaster/genetics , Gene Expression Profiling , Gene Expression Regulation, Developmental/genetics , Transcription, Genetic/genetics , Alternative Splicing/genetics , Animals , Base Sequence , Drosophila Proteins/genetics , Drosophila melanogaster/embryology , Exons/genetics , Female , Genes, Insect/genetics , Genome, Insect/genetics , Male , MicroRNAs/genetics , Oligonucleotide Array Sequence Analysis , Protein Isoforms/genetics , RNA Editing/genetics , RNA, Messenger/analysis , RNA, Messenger/genetics , RNA, Small Untranslated/analysis , RNA, Small Untranslated/genetics , Sequence Analysis , Sex Characteristics
...