Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 31
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Article En | MEDLINE | ID: mdl-37001506

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Epigenome , Quantitative Trait Loci , Genome-Wide Association Study , Genomics , Phenotype , Polymorphism, Single Nucleotide
3.
Genome Res ; 30(7): 1047-1059, 2020 07.
Article En | MEDLINE | ID: mdl-32759341

We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.


Transcription, Genetic , Cell Line , Endothelial Cells/metabolism , Epithelial Cells/metabolism , Female , Gene Expression Profiling , Gynecomastia/genetics , Gynecomastia/metabolism , Humans , Male , Mesoderm/cytology , Mesoderm/metabolism , Neoplasms/genetics , Organ Specificity , Sequence Analysis, RNA
4.
Nature ; 583(7818): 699-710, 2020 07.
Article En | MEDLINE | ID: mdl-32728249

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.


DNA/genetics , Databases, Genetic , Genome/genetics , Genomics , Molecular Sequence Annotation , Registries , Regulatory Sequences, Nucleic Acid/genetics , Animals , Chromatin/genetics , Chromatin/metabolism , DNA/chemistry , DNA Footprinting , DNA Methylation/genetics , DNA Replication Timing , Deoxyribonuclease I/metabolism , Genome, Human , Histones/metabolism , Humans , Mice , Mice, Transgenic , RNA-Binding Proteins/genetics , Transcription, Genetic/genetics , Transposases/metabolism
5.
Genome Res ; 29(11): 1900-1909, 2019 11.
Article En | MEDLINE | ID: mdl-31645363

MicroRNAs (miRNAs) play a critical role as posttranscriptional regulators of gene expression. The ENCODE Project profiled the expression of miRNAs in an extensive set of organs during a time-course of mouse embryonic development and captured the expression dynamics of 785 miRNAs. We found distinct organ-specific and developmental stage-specific miRNA expression clusters, with an overall pattern of increasing organ-specific expression as embryonic development proceeds. Comparative analysis of conserved miRNAs in mouse and human revealed stronger clustering of expression patterns by organ type rather than by species. An analysis of messenger RNA expression clusters compared with miRNA expression clusters identifies the potential role of specific miRNA expression clusters in suppressing the expression of mRNAs specific to other developmental programs in the organ in which these miRNAs are expressed during embryonic development. Our results provide the most comprehensive time-course of miRNA expression as part of an integrated ENCODE reference data set for mouse embryonic development.


Embryonic Development/genetics , MicroRNAs/genetics , Animals , Female , Gene Expression Regulation, Developmental , Mice , Pregnancy , RNA, Messenger/genetics
7.
Nucleic Acids Res ; 46(D1): D794-D801, 2018 01 04.
Article En | MEDLINE | ID: mdl-29126249

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.


DNA/genetics , Databases, Genetic , Gene Components , Genomics , High-Throughput Nucleotide Sequencing , Metadata , Animals , Caenorhabditis elegans/genetics , Data Display , Datasets as Topic , Drosophila melanogaster/genetics , Forecasting , Genome, Human , Humans , Mice/genetics , User-Computer Interface
8.
Sci Data ; 4: 170112, 2017 08 29.
Article En | MEDLINE | ID: mdl-28850106

In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.


Gene Expression Profiling , Genome , Animals , Gene Expression Regulation , Humans , Mice , Promoter Regions, Genetic , Species Specificity
9.
Vet Anaesth Analg ; 44(4): 727-737, 2017 Jul.
Article En | MEDLINE | ID: mdl-28624496

OBJECTIVE: To determine the effect of fentanyl on the induction dose of propofol and minimum infusion rate required to prevent movement in response to noxious stimulation (MIRNM) in dogs. STUDY DESIGN: Crossover experimental design. ANIMALS: Six healthy, adult intact male Beagle dogs, mean±standard deviation 12.6±0.4 kg. METHODS: Dogs were administered 0.9% saline (treatment P), fentanyl (5 µg kg-1) (treatment PLDF) or fentanyl (10 µg kg-1) (treatment PHDF) intravenously over 5 minutes. Five minutes later, anesthesia was induced with propofol (2 mg kg-1, followed by 1 mg kg-1 every 15 seconds to achieve intubation) and maintained for 90 minutes by constant rate infusions (CRIs) of propofol alone or with fentanyl: P, propofol (0.5 mg kg-1 minute-1); PLDF, propofol (0.35 mg kg-1 minute-1) and fentanyl (0.1 µg kg-1 minute-1); PHDF, propofol (0.3 mg kg-1 minute-1) and fentanyl (0.2 µg kg-1 minute-1). Propofol CRI was increased or decreased based on the response to stimulation (50 V, 50 Hz, 10 mA), with 20 minutes between adjustments. Data were analyzed using a mixed-model anova and presented as mean±standard error. RESULTS: ropofol induction doses were 6.16±0.31, 3.67±0.21 and 3.33±0.42 mg kg-1 for P, PLDF and PHDF, respectively. Doses for PLDF and PHDF were significantly decreased from P (p<0.05) but not different between treatments. Propofol MIRNM was 0.60±0.04, 0.29±0.02 and 0.22±0.02 mg kg-1 minute-1 for P, PLDF and PHDF, respectively. MIRNM in PLDF and PHDF was significantly decreased from P. MIRNM in PLDF and PHDF were not different, but their respective percent decreases of 51±3 and 63±2% differed (p=0.035). CONCLUSIONS AND CLINICAL RELEVANCE: Fentanyl, at the doses studied, caused statistically significant and clinically important decreases in the propofol induction dose and MIRNM.


Anesthesia, Intravenous/veterinary , Anesthetics, Intravenous , Fentanyl/pharmacology , Propofol , Anesthesia, Intravenous/methods , Anesthetics, Combined/administration & dosage , Anesthetics, Combined/pharmacology , Anesthetics, Intravenous/administration & dosage , Animals , Dogs , Infusions, Intravenous/veterinary , Male , Movement/drug effects , Propofol/administration & dosage
11.
Genome Biol ; 17(1): 151, 2016 07 08.
Article En | MEDLINE | ID: mdl-27391956

BACKGROUND: A comparison of transcriptional profiles derived from different tissues in a given species or among different species assumes that commonalities reflect evolutionarily conserved programs and that differences reflect species or tissue responses to environmental conditions or developmental program staging. Apparently conflicting results have been published regarding whether organ-specific transcriptional patterns dominate over species-specific patterns, or vice versa, making it unclear to what extent the biology of a given organism can be extrapolated to another. These studies have in common that they treat the transcriptomes monolithically, implicitly ignoring that each gene is likely to have a specific pattern of transcriptional variation across organs and species. RESULTS: We use linear models to quantify this pattern. We find a continuum in the spectrum of expression variation: the expression of some genes varies considerably across species and little across organs, and simply reflects evolutionary distance. At the other extreme are genes whose expression varies considerably across organs and little across species; these genes are much more likely to be associated with diseases than are genes whose expression varies predominantly across species. CONCLUSIONS: Whether transcriptomes, when considered globally, cluster preferentially according to one component or the other may not be a property of the transcriptomes, but rather a consequence of the dominant behavior of a subset of genes. Therefore, the values of the components of the variance of expression for each gene could become a useful resource when planning, interpreting, and extrapolating experimental data from mouse to humans.


Evolution, Molecular , Gene Expression Regulation, Developmental/genetics , Organ Specificity/genetics , Transcriptome/genetics , Animals , Gene Expression Profiling , Humans , Mice , Oligonucleotide Array Sequence Analysis , Sequence Analysis, RNA , Species Specificity
13.
Genome Biol ; 17: 74, 2016 Apr 23.
Article En | MEDLINE | ID: mdl-27107712

Obtaining RNA-seq measurements involves a complex data analytical process with a large number of competing algorithms as options. There is much debate about which of these methods provides the best approach. Unfortunately, it is currently difficult to evaluate their performance due in part to a lack of sensitive assessment metrics. We present a series of statistical summaries and plots to evaluate the performance in terms of specificity and sensitivity, available as a R/Bioconductor package ( http://bioconductor.org/packages/rnaseqcomp ). Using two independent datasets, we assessed seven competing pipelines. Performance was generally poor, with two methods clearly underperforming and RSEM slightly outperforming the rest.


Algorithms , Sequence Analysis, RNA/methods , Animals , Humans , Reference Values , Sensitivity and Specificity , Sequence Analysis, RNA/standards
14.
Nat Commun ; 6: 5903, 2015 Jan 13.
Article En | MEDLINE | ID: mdl-25582907

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.


Evolution, Molecular , Gene Expression Regulation , Transcriptome , Alternative Splicing , Animals , Biological Evolution , Cell Line , Epigenesis, Genetic , Gene Expression Profiling , Gene Library , Genome , Histones/chemistry , Humans , Mice , Mice, Inbred C57BL , Models, Genetic , Oligonucleotides, Antisense , Phenotype , Sequence Analysis, RNA
15.
Proc Natl Acad Sci U S A ; 111(48): 17224-9, 2014 Dec 02.
Article En | MEDLINE | ID: mdl-25413365

Although the similarities between humans and mice are typically highlighted, morphologically and genetically, there are many differences. To better understand these two species on a molecular level, we performed a comparison of the expression profiles of 15 tissues by deep RNA sequencing and examined the similarities and differences in the transcriptome for both protein-coding and -noncoding transcripts. Although commonalities are evident in the expression of tissue-specific genes between the two species, the expression for many sets of genes was found to be more similar in different tissues within the same species than between species. These findings were further corroborated by associated epigenetic histone mark analyses. We also find that many noncoding transcripts are expressed at a low level and are not detectable at appreciable levels across individuals. Moreover, the majority lack obvious sequence homologs between species, even when we restrict our attention to those which are most highly reproducible across biological replicates. Overall, our results indicate that there is considerable RNA expression diversity between humans and mice, well beyond what was described previously, likely reflecting the fundamental physiological differences between these two organisms.


DNA, Intergenic/genetics , Gene Expression Profiling/methods , Organ Specificity/genetics , Proteins/genetics , Animals , Epigenomics/methods , Evolution, Molecular , High-Throughput Nucleotide Sequencing , Humans , Mice, Inbred C57BL , Sequence Analysis, RNA , Species Specificity , Transcriptome/genetics
16.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Article En | MEDLINE | ID: mdl-25164755

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Gene Expression Profiling , Transcriptome/genetics , Animals , Caenorhabditis elegans/embryology , Caenorhabditis elegans/growth & development , Chromatin/genetics , Cluster Analysis , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Histones/metabolism , Humans , Larva/genetics , Larva/growth & development , Models, Genetic , Molecular Sequence Annotation , Promoter Regions, Genetic/genetics , Pupa/genetics , Pupa/growth & development , RNA, Untranslated/genetics , Sequence Analysis, RNA
17.
Cell ; 157(2): 382-394, 2014 Apr 10.
Article En | MEDLINE | ID: mdl-24725405

Missense mutations in the p53 tumor suppressor inactivate its antiproliferative properties but can also promote metastasis through a gain-of-function activity. We show that sustained expression of mutant p53 is required to maintain the prometastatic phenotype of a murine model of pancreatic cancer, a highly metastatic disease that frequently displays p53 mutations. Transcriptional profiling and functional screening identified the platelet-derived growth factor receptor b (PDGFRb) as both necessary and sufficient to mediate these effects. Mutant p53 induced PDGFRb through a cell-autonomous mechanism involving inhibition of a p73/NF-Y complex that represses PDGFRb expression in p53-deficient, noninvasive cells. Blocking PDGFRb signaling by RNA interference or by small molecule inhibitors prevented pancreatic cancer cell invasion in vitro and metastasis formation in vivo. Finally, high PDGFRb expression correlates with poor disease-free survival in pancreatic, colon, and ovarian cancer patients, implicating PDGFRb as a prognostic marker and possible target for attenuating metastasis in p53 mutant tumors.


Carcinoma, Pancreatic Ductal/metabolism , Neoplasm Metastasis , Pancreatic Neoplasms/metabolism , Receptor, Platelet-Derived Growth Factor beta/metabolism , Tumor Suppressor Protein p53/metabolism , Animals , Carcinoma, Pancreatic Ductal/pathology , Disease Models, Animal , Gene Expression Profiling , Humans , Mice , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/pathology , Tumor Suppressor Protein p53/genetics
18.
Nature ; 512(7515): 393-9, 2014 Aug 28.
Article En | MEDLINE | ID: mdl-24670639

Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.


Drosophila melanogaster/genetics , Gene Expression Profiling , Transcriptome/genetics , Alternative Splicing/genetics , Animals , Drosophila melanogaster/anatomy & histology , Drosophila melanogaster/cytology , Female , Male , Molecular Sequence Annotation , Nerve Tissue/metabolism , Organ Specificity , Poly A/genetics , Polyadenylation , Promoter Regions, Genetic/genetics , RNA, Long Noncoding/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sex Characteristics , Stress, Physiological/genetics
19.
Bioinformatics ; 29(1): 15-21, 2013 Jan 01.
Article En | MEDLINE | ID: mdl-23104886

MOTIVATION: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.


Sequence Alignment/methods , Software , Algorithms , Cluster Analysis , Gene Expression Profiling , Genome, Human , Humans , RNA Splicing , Sequence Analysis, RNA/methods
20.
Nature ; 489(7414): 101-8, 2012 Sep 06.
Article En | MEDLINE | ID: mdl-22955620

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.


DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic/genetics , Transcriptome/genetics , Alleles , Cell Line , DNA, Intergenic/genetics , Enhancer Elements, Genetic , Exons/genetics , Gene Expression Profiling , Genes/genetics , Genomics , Humans , Polyadenylation/genetics , Protein Isoforms/genetics , RNA/biosynthesis , RNA/genetics , RNA Editing/genetics , RNA Splicing/genetics , Repetitive Sequences, Nucleic Acid/genetics , Sequence Analysis, RNA
...