RESUMO
Short polypeptides encoded by small open reading frames (smORFs) are ubiquitously found in eukaryotic genomes and are important regulators of physiology, development, and mitochondrial processes. Here, we focus on a subset of 298 smORFs that are evolutionarily conserved between Drosophila melanogaster and humans. Many of these smORFs are conserved broadly in the bilaterian lineage, and â¼182 are conserved in plants. We observe remarkably heterogeneous spatial and temporal expression patterns of smORF transcripts-indicating wide-spread tissue-specific and stage-specific mitochondrial architectures. In addition, an analysis of annotated functional domains reveals a predicted enrichment of smORF polypeptides localizing to mitochondria. We conduct an embryonic ribosome profiling experiment and find support for translation of 137 of these smORFs during embryogenesis. We further embark on functional characterization using CRISPR knockout/activation, RNAi knockdown, and cDNA overexpression, revealing diverse phenotypes. This study underscores the importance of identifying smORF function in disease and phenotypic diversity.
Assuntos
Drosophila melanogaster , Peptídeos , Animais , Humanos , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Peptídeos/metabolismo , Genoma , Fases de Leitura Aberta/genéticaRESUMO
Transcription factors (TFs) play a key role in development and in cellular responses to the environment by activating or repressing the transcription of target genes in precise spatial and temporal patterns. In order to develop a catalog of target genes of Drosophila melanogaster TFs, the modERN consortium systematically knocked down the expression of TFs using RNAi in whole embryos followed by RNA-seq. We generated data for 45 TFs which have 18 different DNA-binding domains and are expressed in 15 of the 16 organ systems. The range of inactivation of the targeted TFs by RNAi ranged from log2fold change -3.52 to +0.49. The TFs also showed remarkable heterogeneity in the numbers of candidate target genes identified, with some generating thousands of candidates and others only tens. We present detailed analysis from five experiments, including those for three TFs that have been the focus of previous functional studies (ERR, sens, and zfh2) and two previously uncharacterized TFs (sens-2 and CG32006), as well as short vignettes for selected additional experiments to illustrate the utility of this resource. The RNA-seq datasets are available through the ENCODE DCC (http://encodeproject.org) and the Sequence Read Archive (SRA). TF and target gene expression patterns can be found here: https://insitu.fruitfly.org. These studies provide data that facilitate scientific inquiries into the functions of individual TFs in key developmental, metabolic, defensive, and homeostatic regulatory pathways, as well as provide a broader perspective on how individual TFs work together in local networks during embryogenesis.
Assuntos
Proteínas de Drosophila , Drosophila , Animais , Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Interferência de RNA , Fatores de Transcrição/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Receptores de Estrogênio/genética , Receptores de Estrogênio/metabolismo , Proteínas de Ligação a DNA/genéticaRESUMO
The gut microbiome produces vitamins, nutrients, and neurotransmitters, and helps to modulate the host immune system-and also plays a major role in the metabolism of many exogenous compounds, including drugs and chemical toxicants. However, the extent to which specific microbial species or communities modulate hazard upon exposure to chemicals remains largely opaque. Focusing on the effects of collateral dietary exposure to the widely used herbicide atrazine, we applied integrated omics and phenotypic screening to assess the role of the gut microbiome in modulating host resilience in Drosophila melanogaster. Transcriptional and metabolic responses to these compounds are sex-specific and depend strongly on the presence of the commensal microbiome. Sequencing the genomes of all abundant microbes in the fly gut revealed an enzymatic pathway responsible for atrazine detoxification unique to Acetobacter tropicalis. We find that Acetobacter tropicalis alone, in gnotobiotic animals, is sufficient to rescue increased atrazine toxicity to wild-type, conventionally reared levels. This work points toward the derivation of biotic strategies to improve host resilience to environmental chemical exposures, and illustrates the power of integrative omics to identify pathways responsible for adverse health outcomes.
Assuntos
Atrazina/toxicidade , Drosophila melanogaster/efeitos dos fármacos , Microbioma Gastrointestinal/efeitos dos fármacos , Interações entre Hospedeiro e Microrganismos/efeitos dos fármacos , Inseticidas/toxicidade , Acetobacter/genética , Acetobacter/metabolismo , Animais , Drosophila melanogaster/microbiologia , Feminino , Inativação Metabólica , MasculinoRESUMO
Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.
Assuntos
Proteínas de Drosophila , Embrião não Mamífero/embriologia , Desenvolvimento Embrionário/fisiologia , Elementos Facilitadores Genéticos/fisiologia , Análise de Sequência de DNA , Fatores de Transcrição , Animais , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster , Estudo de Associação Genômica Ampla , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
During terminal erythropoiesis, the splicing machinery in differentiating erythroblasts executes a robust intron retention (IR) program that impacts expression of hundreds of genes. We studied IR mechanisms in the SF3B1 splicing factor gene, which expresses â¼50% of its transcripts in late erythroblasts as a nuclear isoform that retains intron 4. RNA-seq analysis of nonsense-mediated decay (NMD)-inhibited cells revealed previously undescribed splice junctions, rare or not detected in normal cells, that connect constitutive exons 4 and 5 to highly conserved cryptic cassette exons within the intron. Minigene splicing reporter assays showed that these cassettes promote IR. Genome-wide analysis of splice junction reads demonstrated that cryptic noncoding cassettes are much more common in large (>1 kb) retained introns than they are in small retained introns or in nonretained introns. Functional assays showed that heterologous cassettes can promote retention of intron 4 in the SF3B1 splicing reporter. Although many of these cryptic exons were spliced inefficiently, they exhibited substantial binding of U2AF1 and U2AF2 adjacent to their splice acceptor sites. We propose that these exons function as decoys that engage the intron-terminal splice sites, thereby blocking cross-intron interactions required for excision. Developmental regulation of decoy function underlies a major component of the erythroblast IR program.
Assuntos
Processamento Alternativo , Eritroblastos/citologia , Fatores de Processamento de RNA/genética , Análise de Sequência de RNA/métodos , Diferenciação Celular , Células Cultivadas , Eritroblastos/química , Éxons , Humanos , Íntrons , Degradação do RNAm Mediada por Códon sem Sentido , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Sítios de Splice de RNA , Fatores de Processamento de RNA/metabolismo , Fator de Processamento U2AF/metabolismoRESUMO
Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.
Assuntos
Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Processamento Alternativo/genética , Animais , Drosophila melanogaster/anatomia & histologia , Drosophila melanogaster/citologia , Feminino , Masculino , Anotação de Sequência Molecular , Tecido Nervoso/metabolismo , Especificidade de Órgãos , Poli A/genética , Poliadenilação , Regiões Promotoras Genéticas/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Caracteres Sexuais , Estresse Fisiológico/genéticaRESUMO
BACKGROUND: Site-specific transcription factors (TFs) bind DNA regulatory elements to control expression of target genes, forming the core of gene regulatory networks. Despite decades of research, most studies focus on only a small number of TFs and the roles of many remain unknown. RESULTS: We present a systematic characterization of spatiotemporal gene expression patterns for all known or predicted Drosophila TFs throughout embryogenesis, the first such comprehensive study for any metazoan animal. We generated RNA expression patterns for all 708 TFs by in situ hybridization, annotated the patterns using an anatomical controlled vocabulary, and analyzed TF expression in the context of organ system development. Nearly all TFs are expressed during embryogenesis and more than half are specifically expressed in the central nervous system. Compared to other genes, TFs are enriched early in the development of most organ systems, and throughout the development of the nervous system. Of the 535 TFs with spatially restricted expression, 79% are dynamically expressed in multiple organ systems while 21% show single-organ specificity. Of those expressed in multiple organ systems, 77 TFs are restricted to a single organ system either early or late in development. Expression patterns for 354 TFs are characterized for the first time in this study. CONCLUSIONS: We produced a reference TF dataset for the investigation of gene regulatory networks in embryogenesis, and gained insight into the expression dynamics of the full complement of TFs controlling the development of each organ system.
Assuntos
Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Drosophila melanogaster/metabolismo , Fatores de Transcrição/genética , Animais , Sistema Nervoso Central/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Hibridização In Situ , Especificidade de ÓrgãosRESUMO
Extracellular domains of cell surface receptors and ligands mediate cell-cell communication, adhesion, and initiation of signaling events, but most existing protein-protein "interactome" data sets lack information for extracellular interactions. We probed interactions between receptor extracellular domains, focusing on a set of 202 proteins composed of the Drosophila melanogaster immunoglobulin superfamily (IgSF), fibronectin type III (FnIII), and leucine-rich repeat (LRR) families, which are known to be important in neuronal and developmental functions. Out of 20,503 candidate protein pairs tested, we observed 106 interactions, 83 of which were previously unknown. We "deorphanized" the 20 member subfamily of defective-in-proboscis-response IgSF proteins, showing that they selectively interact with an 11 member subfamily of previously uncharacterized IgSF proteins. Both subfamilies interact with a single common "orphan" LRR protein. We also observed interactions between Hedgehog and EGFR pathway components. Several of these interactions could be visualized in live-dissected embryos, demonstrating that this approach can identify physiologically relevant receptor-ligand pairs.
Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/citologia , Drosophila melanogaster/metabolismo , Fibronectinas/metabolismo , Imunoglobulinas/metabolismo , Mapas de Interação de Proteínas , Proteínas/metabolismo , Sequência de Aminoácidos , Animais , Proteínas de Drosophila/química , Drosophila melanogaster/embriologia , Fibronectinas/química , Proteínas de Repetições Ricas em Leucina , Ligantes , Dados de Sequência Molecular , Filogenia , Estrutura Terciária de Proteína , Receptores de Superfície Celular/química , Receptores de Superfície Celular/metabolismo , Alinhamento de SequênciaRESUMO
In animals, each sequence-specific transcription factor typically binds to thousands of genomic regions in vivo. Our previous studies of 20 transcription factors show that most genomic regions bound at high levels in Drosophila blastoderm embryos are known or probable functional targets, but genomic regions occupied only at low levels have characteristics suggesting that most are not involved in the cis-regulation of transcription. Here we use transgenic reporter gene assays to directly test the transcriptional activity of 104 genomic regions bound at different levels by the 20 transcription factors. Fifteen genomic regions were selected based solely on the DNA occupancy level of the transcription factor Kruppel. Five of the six most highly bound regions drive blastoderm patterns of reporter transcription. In contrast, only one of the nine lowly bound regions drives transcription at this stage and four of them are not detectably active at any stage of embryogenesis. A larger set of 89 genomic regions chosen using criteria designed to identify functional cis-regulatory regions supports the same trend: genomic regions occupied at high levels by transcription factors in vivo drive patterned gene expression, whereas those occupied only at lower levels mostly do not. These results support studies that indicate that the high cellular concentrations of sequence-specific transcription factors drive extensive, low-occupancy, nonfunctional interactions within the accessible portions of the genome.
Assuntos
DNA/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Genes Reporter/genética , Fatores de Transcrição/metabolismo , Animais , Animais Geneticamente Modificados , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Embrião não Mamífero/metabolismo , Feminino , Genoma de Inseto/genética , Fatores de Transcrição Kruppel-Like/metabolismo , Masculino , Ligação Proteica/genéticaRESUMO
We describe a high-throughput protocol for RNA in situ hybridization (ISH) to Drosophila embryos in a 96-well format. cDNA or genomic DNA templates are amplified by PCR and then digoxigenin-labeled ribonucleotides are incorporated into antisense RNA probes by in vitro transcription. The quality of each probe is evaluated before ISH using a RNA probe quantification (dot blot) assay. RNA probes are hybridized to fixed, mixed-staged Drosophila embryos in 96-well plates. The resulting stained embryos can be examined and photographed immediately or stored at 4 degrees C for later analysis. Starting with fixed, staged embryos, the protocol takes 6 d from probe template production through hybridization. Preparation of fixed embryos requires a minimum of 2 weeks to collect embryos representing all stages. The method has been used to determine the expression patterns of over 6,000 genes throughout embryogenesis.
Assuntos
Drosophila/genética , Desenvolvimento Embrionário/genética , Perfilação da Expressão Gênica/métodos , Hibridização In Situ/métodos , RNA/análise , Animais , Clonagem Molecular , Drosophila/embriologia , Técnicas de Cultura Embrionária , Embrião não Mamífero , Regulação da Expressão Gênica no Desenvolvimento , Reação em Cadeia da Polimerase , Sondas RNARESUMO
To fully understand animal transcription networks, it is essential to accurately measure the spatial and temporal expression patterns of transcription factors and their targets. We describe a registration technique that takes image-based data from hundreds of Drosophila blastoderm embryos, each costained for a reference gene and one of a set of genes of interest, and builds a model VirtualEmbryo. This model captures in a common framework the average expression patterns for many genes in spite of significant variation in morphology and expression between individual embryos. We establish the method's accuracy by showing that relationships between a pair of genes' expression inferred from the model are nearly identical to those measured in embryos costained for the pair. We present a VirtualEmbryo containing data for 95 genes at six time cohorts. We show that known gene-regulatory interactions can be automatically recovered from this data set and predict hundreds of new interactions.
Assuntos
Drosophila melanogaster/genética , Redes Reguladoras de Genes , Modelos Genéticos , Animais , Blastoderma , Drosophila melanogaster/metabolismo , Embrião não Mamífero/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Genes de InsetosRESUMO
Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. We used whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over 40 well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.
Assuntos
Blastoderma/metabolismo , Drosophila melanogaster/embriologia , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , DNA/metabolismo , Evolução Molecular , MicroRNAs/metabolismoRESUMO
BACKGROUND: Cell and tissue specific gene expression is a defining feature of embryonic development in multi-cellular organisms. However, the range of gene expression patterns, the extent of the correlation of expression with function, and the classes of genes whose spatial expression are tightly regulated have been unclear due to the lack of an unbiased, genome-wide survey of gene expression patterns. RESULTS: We determined and documented embryonic expression patterns for 6,003 (44%) of the 13,659 protein-coding genes identified in the Drosophila melanogaster genome with over 70,000 images and controlled vocabulary annotations. Individual expression patterns are extraordinarily diverse, but by supplementing qualitative in situ hybridization data with quantitative microarray time-course data using a hybrid clustering strategy, we identify groups of genes with similar expression. Of 4,496 genes with detectable expression in the embryo, 2,549 (57%) fall into 10 clusters representing broad expression patterns. The remaining 1,947 (43%) genes fall into 29 clusters representing restricted expression, 20% patterned as early as blastoderm, with the majority restricted to differentiated cell types, such as epithelia, nervous system, or muscle. We investigate the relationship between expression clusters and known molecular and cellular-physiological functions. CONCLUSION: Nearly 60% of the genes with detectable expression exhibit broad patterns reflecting quantitative rather than qualitative differences between tissues. The other 40% show tissue-restricted expression; the expression patterns of over 1,500 of these genes are documented here for the first time. Within each of these categories, we identified clusters of genes associated with particular cellular and developmental functions.
Assuntos
Drosophila melanogaster/embriologia , Drosophila melanogaster/genética , Desenvolvimento Embrionário/genética , Perfilação da Expressão Gênica , AnimaisRESUMO
BACKGROUND: Cell-fate specification and tissue differentiation during development are largely achieved by the regulation of gene transcription. RESULTS: As a first step to creating a comprehensive atlas of gene-expression patterns during Drosophila embryogenesis, we examined 2,179 genes by in situ hybridization to fixed Drosophila embryos. Of the genes assayed, 63.7% displayed dynamic expression patterns that were documented with 25,690 digital photomicrographs of individual embryos. The photomicrographs were annotated using controlled vocabularies for anatomical structures that are organized into a developmental hierarchy. We also generated a detailed time course of gene expression during embryogenesis using microarrays to provide an independent corroboration of the in situ hybridization results. All image, annotation and microarray data are stored in publicly available database. We found that the RNA transcripts of about 1% of genes show clear subcellular localization. Nearly all the annotated expression patterns are distinct. We present an approach for organizing the data by hierarchical clustering of annotation terms that allows us to group tissues that express similar sets of genes as well as genes displaying similar expression patterns. CONCLUSIONS: Analyzing gene-expression patterns by in situ hybridization to whole-mount embryos provides an extremely rich dataset that can be used to identify genes involved in developmental processes that have been missed by traditional genetic analysis. Systematic analysis of rigorously annotated patterns of gene expression will complement and extend the types of analyses carried out using expression microarrays.