RESUMO
Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.
Assuntos
Proteínas de Drosophila , Embrião não Mamífero/embriologia , Desenvolvimento Embrionário/fisiologia , Elementos Facilitadores Genéticos/fisiologia , Análise de Sequência de DNA , Fatores de Transcrição , Animais , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster , Estudo de Associação Genômica Ampla , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding how these high-order interactions drive gene expression presents a substantial statistical challenge. Building on random forests (RFs) and random intersection trees (RITs) and through extensive, biologically inspired simulations, we developed the iterative random forest algorithm (iRF). iRF trains a feature-weighted ensemble of decision trees to detect stable, high-order interactions with the same order of computational cost as the RF. We demonstrate the utility of iRF for high-order interaction discovery in two prediction problems: enhancer activity in the early Drosophila embryo and alternative splicing of primary transcripts in human-derived cell lines. In Drosophila, among the 20 pairwise transcription factor interactions iRF identifies as stable (returned in more than half of bootstrap replicates), 80% have been previously reported as physical interactions. Moreover, third-order interactions, e.g., between Zelda (Zld), Giant (Gt), and Twist (Twi), suggest high-order relationships that are candidates for follow-up experiments. In human-derived cells, iRF rediscovered a central role of H3K36me3 in chromatin-mediated splicing regulation and identified interesting fifth- and sixth-order interactions, indicative of multivalent nucleosomes with specific roles in splicing regulation. By decoupling the order of interactions from the computational cost of identification, iRF opens additional avenues of inquiry into the molecular mechanisms underlying genome biology.
Assuntos
Drosophila/genética , Modelos Genéticos , Algoritmos , Processamento Alternativo , Animais , Biologia Computacional , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Estudo de Associação Genômica AmplaRESUMO
BACKGROUND: Daphnia species reproduce by cyclic parthenogenesis involving both sexual and asexual reproduction. The sex of the offspring is environmentally determined and mediated via endocrine signalling by the mother. Interestingly, male and female Daphnia can be genetically identical, yet display large differences in behaviour, morphology, lifespan and metabolic activity. Our goal was to integrate multiple omics datasets, including gene expression, splicing, histone modification and DNA methylation data generated from genetically identical female and male Daphnia pulex under controlled laboratory settings with the aim of achieving a better understanding of the underlying epigenetic factors that may contribute to the phenotypic differences observed between the two genders. RESULTS: In this study we demonstrate that gene expression level is positively correlated with increased DNA methylation, and histone H3 trimethylation at lysine 4 (H3K4me3) at predicted promoter regions. Conversely, elevated histone H3 trimethylation at lysine 27 (H3K27me3), distributed across the entire transcript length, is negatively correlated with gene expression level. Interestingly, male Daphnia are dominated with epigenetic modifications that globally promote elevated gene expression, while female Daphnia are dominated with epigenetic modifications that reduce gene expression globally. For examples, CpG methylation (positively correlated with gene expression level) is significantly higher in almost all differentially methylated sites in male compared to female Daphnia. Furthermore, H3K4me3 modifications are higher in male compared to female Daphnia in more than 3/4 of the differentially regulated promoters. On the other hand, H3K27me3 is higher in female compared to male Daphnia in more than 5/6 of differentially modified sites. However, both sexes demonstrate roughly equal number of genes that are up-regulated in one gender compared to the other sex. Since, gene expression analyses typically assume that most genes are expressed at equal level among samples and different conditions, and thus cannot detect global changes affecting most genes. CONCLUSIONS: The epigenetic differences between male and female in Daphnia pulex are vast and dominated by changes that promote elevated gene expression in male Daphnia. Furthermore, the differences observed in both gene expression changes and epigenetic modifications between the genders relate to pathways that are physiologically relevant to the observed phenotypic differences.
Assuntos
Metilação de DNA , Daphnia/genética , Epigênese Genética , Epigenômica/métodos , Regiões Promotoras Genéticas/genética , Animais , Daphnia/anatomia & histologia , Daphnia/metabolismo , Feminino , Expressão Gênica , Histonas/genética , Histonas/metabolismo , Lisina/metabolismo , Masculino , Metilação , Fenótipo , Fatores SexuaisRESUMO
This study quantified eight small-molecule neurotransmitters collected simultaneously from prefrontal cortex of C57BL/6J mice (n = 23) during wakefulness and during isoflurane anesthesia (1.3%). Using isoflurane anesthesia as an independent variable enabled evaluation of the hypothesis that isoflurane anesthesia differentially alters concentrations of multiple neurotransmitters and their interactions. Machine learning was applied to reveal higher order interactions among neurotransmitters. Using a between-subjects design, microdialysis was performed during wakefulness and during anesthesia. Concentrations (nM) of acetylcholine, adenosine, dopamine, GABA, glutamate, histamine, norepinephrine, and serotonin in the dialysis samples are reported (means ± SD). Relative to wakefulness, acetylcholine concentration was lower during isoflurane anesthesia (1.254 ± 1.118 vs. 0.401 ± 0.134, P = 0.009), and concentrations of adenosine (29.456 ± 29.756 vs. 101.321 ± 38.603, P < 0.001), dopamine (0.0578 ± 0.0384 vs. 0.113 ± 0.084, P = 0.036), and norepinephrine (0.126 ± 0.080 vs. 0.219 ± 0.066, P = 0.010) were higher during anesthesia. Isoflurane reconfigured neurotransmitter interactions in prefrontal cortex, and the state of isoflurane anesthesia was reliably predicted by prefrontal cortex concentrations of adenosine, norepinephrine, and acetylcholine. A novel finding to emerge from machine learning analyses is that neurotransmitter concentration profiles in mouse prefrontal cortex undergo functional reconfiguration during isoflurane anesthesia. Adenosine, norepinephrine, and acetylcholine showed high feature importance, supporting the interpretation that interactions among these three transmitters may play a key role in modulating levels of cortical and behavioral arousal.NEW & NOTEWORTHY This study discovered that interactions between neurotransmitters in mouse prefrontal cortex were altered during isoflurane anesthesia relative to wakefulness. Machine learning further demonstrated that, relative to wakefulness, higher order interactions among neurotransmitters were disrupted during isoflurane administration. These findings extend to the neurochemical domain the concept that anesthetic-induced loss of wakefulness results from a disruption of neural network connectivity.
Assuntos
Acetilcolina/metabolismo , Adenosina/metabolismo , Anestesia , Anestésicos Inalatórios/farmacologia , Isoflurano/farmacologia , Aprendizado de Máquina , Rede Nervosa , Norepinefrina/metabolismo , Córtex Pré-Frontal , Inconsciência/metabolismo , Vigília/fisiologia , Animais , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Microdiálise , Rede Nervosa/efeitos dos fármacos , Rede Nervosa/metabolismo , Rede Nervosa/fisiopatologia , Córtex Pré-Frontal/efeitos dos fármacos , Córtex Pré-Frontal/metabolismo , Córtex Pré-Frontal/fisiopatologiaRESUMO
Engineered nanoparticles (NPs) undergo physical, chemical, and biological transformation after environmental release, resulting in different properties of the "aged" versus "pristine" forms. While many studies have investigated the ecotoxicological effects of silver (Ag) NPs, the majority focus on "pristine" Ag NPs in simple exposure media, rather than investigating realistic environmental exposure scenarios with transformed NPs. Here, the effects of "pristine" and "aged" Ag NPs are systematically evaluated with different surface coatings on Daphnia magna over four generations, comparing continuous exposure versus parental only exposure to assess recovery potential for three generations. Biological endpoints including survival, growth and reproduction and genetic effects associated with Ag NP exposure are investigated. Parental exposure to "pristine" Ag NPs has an inhibitory effect on reproduction, inducing expression of antioxidant stress related genes and reducing survival. Pristine Ag NPs also induce morphological changes including tail losses and lipid accumulation associated with aging phenotypes in the heart, abdomen, and abdominal claw. These effects are epigenetic remaining two generations post-maternal exposure (F2 and F3). Exposure to identical Ag NPs (same concentrations) aged for 6 months in environmentally realistic water containing natural organic matter shows considerably reduced toxicological effects in continuously exposed generations and to the recovery generations.
Assuntos
Envelhecimento , Daphnia , Epigênese Genética , Nanopartículas Metálicas , Prata , Envelhecimento/efeitos dos fármacos , Animais , Daphnia/efeitos dos fármacos , Exposição Ambiental , Epigênese Genética/efeitos dos fármacos , Feminino , Exposição Materna , Nanopartículas Metálicas/toxicidade , Prata/toxicidade , Poluentes Químicos da Água/toxicidadeRESUMO
During terminal erythropoiesis, the splicing machinery in differentiating erythroblasts executes a robust intron retention (IR) program that impacts expression of hundreds of genes. We studied IR mechanisms in the SF3B1 splicing factor gene, which expresses â¼50% of its transcripts in late erythroblasts as a nuclear isoform that retains intron 4. RNA-seq analysis of nonsense-mediated decay (NMD)-inhibited cells revealed previously undescribed splice junctions, rare or not detected in normal cells, that connect constitutive exons 4 and 5 to highly conserved cryptic cassette exons within the intron. Minigene splicing reporter assays showed that these cassettes promote IR. Genome-wide analysis of splice junction reads demonstrated that cryptic noncoding cassettes are much more common in large (>1 kb) retained introns than they are in small retained introns or in nonretained introns. Functional assays showed that heterologous cassettes can promote retention of intron 4 in the SF3B1 splicing reporter. Although many of these cryptic exons were spliced inefficiently, they exhibited substantial binding of U2AF1 and U2AF2 adjacent to their splice acceptor sites. We propose that these exons function as decoys that engage the intron-terminal splice sites, thereby blocking cross-intron interactions required for excision. Developmental regulation of decoy function underlies a major component of the erythroblast IR program.
Assuntos
Processamento Alternativo , Eritroblastos/citologia , Fatores de Processamento de RNA/genética , Análise de Sequência de RNA/métodos , Diferenciação Celular , Células Cultivadas , Eritroblastos/química , Éxons , Humanos , Íntrons , Degradação do RNAm Mediada por Códon sem Sentido , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Sítios de Splice de RNA , Fatores de Processamento de RNA/metabolismo , Fator de Processamento U2AF/metabolismoRESUMO
Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.
Assuntos
Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Processamento Alternativo/genética , Animais , Drosophila melanogaster/anatomia & histologia , Drosophila melanogaster/citologia , Feminino , Masculino , Anotação de Sequência Molecular , Tecido Nervoso/metabolismo , Especificidade de Órgãos , Poli A/genética , Poliadenilação , Regiões Promotoras Genéticas/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Caracteres Sexuais , Estresse Fisiológico/genéticaRESUMO
The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.
Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Animais , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/crescimento & desenvolvimento , Cromatina/genética , Análise por Conglomerados , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crescimento & desenvolvimento , Modelos Genéticos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Pupa/genética , Pupa/crescimento & desenvolvimento , RNA não Traduzido/genética , Análise de Sequência de RNARESUMO
Objectives: To document medical educators' experience and initiatives in training international medical graduates (IMGs) to become general practitioners (GP). Design: Qualitative social-constructivist emergent design with descriptive and interpretive analyses. Setting: GP vocational training in Australia, Canada, Ireland, New Zealand, the Netherlands, and UK. Participants: Twenty-eight leaders of GP training. Intervention: Data collected from public documents, published literature and 27 semi-structured interviews. Main outcome measures: Tensions in training and innovations in response to these tensions. Results: Medical educators identified tension in teaching IMGs as it could be different to teaching domestic graduates in any or all aspects of a training program. They felt an ethical responsibility to support IMGs to provide quality health care in their adopted country but faced multiple challenges to achieve this. They described initiatives to address these throughout GP training. Conclusions: IMG's differing educational needs will benefit from flexible individualized adaptation of training programs.
Assuntos
Atitude do Pessoal de Saúde , Educação de Graduação em Medicina/métodos , Docentes de Medicina/psicologia , Médicos Graduados Estrangeiros/psicologia , Medicina Geral/educação , Médicos de Família/educação , Austrália , Canadá , Humanos , Entrevistas como Assunto , Irlanda , Liderança , Países Baixos , Nova Zelândia , Médicos de Família/psicologia , Migrantes , Reino UnidoRESUMO
The modENCODE (Model Organism Encyclopedia of DNA Elements) Consortium aimed to map functional elements-including transcripts, chromatin marks, regulatory factor binding sites, and origins of DNA replication-in the model organisms Drosophila melanogaster and Caenorhabditis elegans. During its five-year span, the consortium conducted more than 2,000 genome-wide assays in developmentally staged animals, dissected tissues, and homogeneous cell lines. Analysis of these data sets provided foundational insights into genome, epigenome, and transcriptome structure and the evolutionary turnover of regulatory pathways. These studies facilitated a comparative analysis with similar data types produced by the ENCODE Consortium for human cells. Genome organization differs drastically in these distant species, and yet quantitative relationships among chromatin state, transcription, and cotranscriptional RNA processing are deeply conserved. Of the many biological discoveries of the modENCODE Consortium, we highlight insights that emerged from integrative studies. We focus on operational and scientific lessons that may aid future projects of similar scale or aims in other, emerging model systems.
Assuntos
Caenorhabditis elegans/genética , Bases de Dados Factuais , Drosophila melanogaster/genética , Genômica/métodos , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Proteínas de Ligação a DNA/genética , Genoma Helmíntico , Genoma de Inseto , TranscriptomaRESUMO
In eukaryotic cells, RNAs exist as ribonucleoprotein particles (RNPs). Despite the importance of these complexes in many biological processes, including splicing, polyadenylation, stability, transportation, localization, and translation, their compositions are largely unknown. We affinity-purified 20 distinct RNA-binding proteins (RBPs) from cultured Drosophila melanogaster cells under native conditions and identified both the RNA and protein compositions of these RNP complexes. We identified "high occupancy target" (HOT) RNAs that interact with the majority of the RBPs we surveyed. HOT RNAs encode components of the nonsense-mediated decay and splicing machinery, as well as RNA-binding and translation initiation proteins. The RNP complexes contain proteins and mRNAs involved in RNA binding and post-transcriptional regulation. Genes with the capacity to produce hundreds of mRNA isoforms, ultracomplex genes, interact extensively with heterogeneous nuclear ribonuclear proteins (hnRNPs). Our data are consistent with a model in which subsets of RNPs include mRNA and protein products from the same gene, indicating the widespread existence of auto-regulatory RNPs. From the simultaneous acquisition and integrative analysis of protein and RNA constituents of RNPs, we identify extensive cross-regulatory and hierarchical interactions in post-transcriptional control.
Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Regulação da Expressão Gênica , Proteínas de Ligação a RNA/metabolismo , Animais , Proteínas de Drosophila/genética , Ribonucleoproteínas Nucleares Heterogêneas/genética , Ribonucleoproteínas Nucleares Heterogêneas/metabolismo , Splicing de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/genética , Análise de Sequência de RNA , TransfecçãoRESUMO
Natural habitats are exposed to an increasing number of environmental stressors that cause important ecological consequences. However, the multifarious nature of environmental change, the strength and the relative timing of each stressor largely limit our understanding of biological responses to environmental change. In particular, early response to unpredictable environmental change, critical to survival and fitness in later life stages, is largely uncharacterized. Here, we characterize the early transcriptional response of the keystone species Daphnia magna to twelve environmental perturbations, including biotic and abiotic stressors. We first perform a differential expression analysis aimed at identifying differential regulation of individual genes in response to stress. This preliminary analysis revealed that a few individual genes were responsive to environmental perturbations and they were modulated in a stressor and genotype-specific manner. Given the limited number of differentially regulated genes, we were unable to identify pathways involved in stress response. Hence, to gain a better understanding of the genetic and functional foundation of tolerance to multiple environmental stressors, we leveraged the correlative nature of networks and performed a weighted gene co-expression network analysis. We discovered that approximately one-third of the Daphnia genes, enriched for metabolism, cell signalling and general stress response, drives transcriptional early response to environmental stress and it is shared among genetic backgrounds. This initial response is followed by a genotype- and/or condition-specific transcriptional response with a strong genotype-by-environment interaction. Intriguingly, genotype- and condition-specific transcriptional response is found in genes not conserved beyond crustaceans, suggesting niche-specific adaptation.
Assuntos
Daphnia/genética , Redes Reguladoras de Genes , Transcrição Gênica , Animais , Sequência Conservada , Regulação da Expressão Gênica , Genoma , Genótipo , Família MultigênicaRESUMO
Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the Drosophila phylogeny by generating draft genomes for eight new species. For comparative transcriptome analysis, we generated mRNA expression profiles on 81 samples from multiple tissues and developmental stages of 15 Drosophila species, and we performed cap analysis of gene expression in D. melanogaster and D. pseudoobscura. We also describe conservation of four distinct core promoter structures composed of combinations of elements at three positions. Overall, each type of genomic feature shows a characteristic divergence rate relative to neutral models, highlighting the value of multispecies alignment in annotating a target genome that should prove useful in the annotation of other high priority genomes, especially human and other mammalian genomes that are rich in noncoding sequences. We report that the vast majority of elements in the annotation are evolutionarily conserved, indicating that the annotation will be an important springboard for functional genetic testing by the Drosophila community.
Assuntos
Biologia Computacional/métodos , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Anotação de Sequência Molecular , Transcriptoma , Animais , Análise por Conglomerados , Drosophila melanogaster/classificação , Evolução Molecular , Éxons , Feminino , Genoma de Inseto , Humanos , Masculino , Motivos de Nucleotídeos , Filogenia , Matrizes de Pontuação de Posição Específica , Regiões Promotoras Genéticas , Edição de RNA , Sítios de Splice de RNA , Splicing de RNA , Reprodutibilidade dos Testes , Sítio de Iniciação de TranscriçãoRESUMO
Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Assuntos
Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Transcrição Gênica/genética , Processamento Alternativo/genética , Animais , Sequência de Bases , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Éxons/genética , Feminino , Genes de Insetos/genética , Genoma de Inseto/genética , Masculino , MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , Isoformas de Proteínas/genética , Edição de RNA/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , Pequeno RNA não Traduzido/análise , Pequeno RNA não Traduzido/genética , Análise de Sequência , Caracteres SexuaisRESUMO
Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ~100 have been deeply characterized to determine their role in the cell. To measure the protein-coding output from these RNAs, we jointly analyzed two recent data sets produced in the ENCODE project: tandem mass spectrometry (MS/MS) data mapping expressed peptides to their encoding genomic loci, and RNA-seq data generated by ENCODE in long polyA+ and polyA- fractions in the cell lines K562 and GM12878. We used the machine-learning algorithm RuleFit3 to regress the peptide data against RNA expression data. The most important covariate for predicting translation was, surprisingly, the Cytosol polyA- fraction in both cell lines. LncRNAs are ~13-fold less likely to produce detectable peptides than similar mRNAs, indicating that ~92% of GENCODE v7 lncRNAs are not translated in these two ENCODE cell lines. Intersecting 9640 lncRNA loci with 79,333 peptides yielded 85 unique peptides matching 69 lncRNAs. Most cases were due to a coding transcript misannotated as lncRNA. Two exceptions were an unprocessed pseudogene and a bona fide lncRNA gene, both with open reading frames (ORFs) compromised by upstream stop codons. All potentially translatable lncRNA ORFs had only a single peptide match, indicating low protein abundance and/or false-positive peptide matches. We conclude that with very few exceptions, ribosomes are able to distinguish coding from noncoding transcripts and, hence, that ectopic translation and cryptic mRNAs are rare in the human lncRNAome.
Assuntos
Biossíntese de Proteínas , RNA Longo não Codificante/genética , Sequência de Aminoácidos , Sequência de Bases , Linhagem Celular , Expressão Gênica , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Células K562 , Anotação de Sequência Molecular , Dados de Sequência Molecular , Peptídeos/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Alinhamento de Sequência , Espectrometria de Massas em Tandem/métodosRESUMO
The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
Assuntos
Bases de Dados Genéticas , RNA Longo não Codificante/genética , Processamento Alternativo , Animais , Núcleo Celular/genética , Núcleo Celular/metabolismo , Análise por Conglomerados , Evolução Molecular , Éxons , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Histonas/metabolismo , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos/genética , Primatas/genética , Processamento Pós-Transcricional do RNA , Sítios de Splice de RNA , RNA Mensageiro/genética , Seleção Genética , Transcrição GênicaRESUMO
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
Assuntos
Imunoprecipitação da Cromatina/métodos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Genoma/genética , Genômica/métodos , Guias como Assunto , Histonas/metabolismo , Humanos , Internet , Fatores de Transcrição/metabolismoRESUMO
In animals, each sequence-specific transcription factor typically binds to thousands of genomic regions in vivo. Our previous studies of 20 transcription factors show that most genomic regions bound at high levels in Drosophila blastoderm embryos are known or probable functional targets, but genomic regions occupied only at low levels have characteristics suggesting that most are not involved in the cis-regulation of transcription. Here we use transgenic reporter gene assays to directly test the transcriptional activity of 104 genomic regions bound at different levels by the 20 transcription factors. Fifteen genomic regions were selected based solely on the DNA occupancy level of the transcription factor Kruppel. Five of the six most highly bound regions drive blastoderm patterns of reporter transcription. In contrast, only one of the nine lowly bound regions drives transcription at this stage and four of them are not detectably active at any stage of embryogenesis. A larger set of 89 genomic regions chosen using criteria designed to identify functional cis-regulatory regions supports the same trend: genomic regions occupied at high levels by transcription factors in vivo drive patterned gene expression, whereas those occupied only at lower levels mostly do not. These results support studies that indicate that the high cellular concentrations of sequence-specific transcription factors drive extensive, low-occupancy, nonfunctional interactions within the accessible portions of the genome.
Assuntos
DNA/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Genes Reporter/genética , Fatores de Transcrição/metabolismo , Animais , Animais Geneticamente Modificados , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Embrião não Mamífero/metabolismo , Feminino , Genoma de Inseto/genética , Fatores de Transcrição Kruppel-Like/metabolismo , Masculino , Ligação Proteica/genéticaRESUMO
Core promoters are critical regions for gene regulation in higher eukaryotes. However, the boundaries of promoter regions, the relative rates of initiation at the transcription start sites (TSSs) distributed within them, and the functional significance of promoter architecture remain poorly understood. We produced a high-resolution map of promoters active in the Drosophila melanogaster embryo by integrating data from three independent and complementary methods: 21 million cap analysis of gene expression (CAGE) tags, 1.2 million RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE) reads, and 50,000 cap-trapped expressed sequence tags (ESTs). We defined 12,454 promoters of 8037 genes. Our analysis indicates that, due to non-promoter-associated RNA background signal, previous studies have likely overestimated the number of promoter-associated CAGE clusters by fivefold. We show that TSS distributions form a complex continuum of shapes, and that promoters active in the embryo and adult have highly similar shapes in 95% of cases. This suggests that these distributions are generally determined by static elements such as local DNA sequence and are not modulated by dynamic signals such as histone modifications. Transcription factor binding motifs are differentially enriched as a function of promoter shape, and peaked promoter shape is correlated with both temporal and spatial regulation of gene expression. Our results contribute to the emerging view that core promoters are functionally diverse and control patterning of gene expression in Drosophila and mammals.