RESUMO
Extracellular domains of cell surface receptors and ligands mediate cell-cell communication, adhesion, and initiation of signaling events, but most existing protein-protein "interactome" data sets lack information for extracellular interactions. We probed interactions between receptor extracellular domains, focusing on a set of 202 proteins composed of the Drosophila melanogaster immunoglobulin superfamily (IgSF), fibronectin type III (FnIII), and leucine-rich repeat (LRR) families, which are known to be important in neuronal and developmental functions. Out of 20,503 candidate protein pairs tested, we observed 106 interactions, 83 of which were previously unknown. We "deorphanized" the 20 member subfamily of defective-in-proboscis-response IgSF proteins, showing that they selectively interact with an 11 member subfamily of previously uncharacterized IgSF proteins. Both subfamilies interact with a single common "orphan" LRR protein. We also observed interactions between Hedgehog and EGFR pathway components. Several of these interactions could be visualized in live-dissected embryos, demonstrating that this approach can identify physiologically relevant receptor-ligand pairs.
Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/citologia , Drosophila melanogaster/metabolismo , Fibronectinas/metabolismo , Imunoglobulinas/metabolismo , Mapas de Interação de Proteínas , Proteínas/metabolismo , Sequência de Aminoácidos , Animais , Proteínas de Drosophila/química , Drosophila melanogaster/embriologia , Fibronectinas/química , Proteínas de Repetições Ricas em Leucina , Ligantes , Dados de Sequência Molecular , Filogenia , Estrutura Terciária de Proteína , Receptores de Superfície Celular/química , Receptores de Superfície Celular/metabolismo , Alinhamento de SequênciaRESUMO
A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the efforts of the Model Organism ENCyclopedia Of DNA Elements (modENCODE) and the model organism Encyclopedia of Regulatory Networks (modERN) consortia to systematically assay TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). These datasets comprise 605 TFs identifying 3.6M sites in the fly and 356 TFs identifying 0.9 M sites in the worm and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed "metapeaks", that larger metapeaks have characteristics of high occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single cell RNA-seq data in a machine learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing GFP-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell type-specific TF-target relationships.
RESUMO
To gain insight into the transcription programs activated during the formation of Drosophila larval structures, we carried out single cell RNA sequencing during two periods of Drosophila embryogenesis: stages 10-12, when most organs are first specified and initiate morphological and physiological specialization; and stages 13-16, when organs achieve their final mature architectures and begin to function. Our data confirm previous findings with regards to functional specialization of some organs - the salivary gland and trachea - and clarify the embryonic functions of another - the plasmatocytes. We also identify two early developmental trajectories in germ cells and uncover a potential role for proteolysis during germline stem cell specialization. We identify the likely cell type of origin for key components of the Drosophila matrisome and several commonly used Drosophila embryonic cell culture lines. Finally, we compare our findings with other recent related studies and with other modalities for identifying tissue-specific gene expression patterns. These data provide a useful community resource for identifying many new players in tissue-specific morphogenesis and functional specialization of developing organs.
Assuntos
Proteínas de Drosophila , Drosophila , Animais , Drosophila/metabolismo , Transcriptoma/genética , Organogênese , Proteínas de Drosophila/metabolismo , Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica no DesenvolvimentoRESUMO
Determining the composition of protein complexes is an essential step toward understanding the cell as an integrated system. Using coaffinity purification coupled to mass spectrometry analysis, we examined protein associations involving nearly 5,000 individual, FLAG-HA epitope-tagged Drosophila proteins. Stringent analysis of these data, based on a statistical framework designed to define individual protein-protein interactions, led to the generation of a Drosophila protein interaction map (DPiM) encompassing 556 protein complexes. The high quality of the DPiM and its usefulness as a paradigm for metazoan proteomes are apparent from the recovery of many known complexes, significant enrichment for shared functional attributes, and validation in human cells. The DPiM defines potential novel members for several important protein complexes and assigns functional links to 586 protein-coding genes lacking previous experimental annotation. The DPiM represents, to our knowledge, the largest metazoan protein complex map and provides a valuable resource for analysis of protein complex evolution.
Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Mapeamento de Interação de Proteínas , Animais , Proteínas de Drosophila/genética , Complexo de Endopeptidases do Proteassoma/metabolismo , Proteômica , Proteínas SNARE/metabolismoRESUMO
Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.
Assuntos
Proteínas de Drosophila , Embrião não Mamífero/embriologia , Desenvolvimento Embrionário/fisiologia , Elementos Facilitadores Genéticos/fisiologia , Análise de Sequência de DNA , Fatores de Transcrição , Animais , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster , Estudo de Associação Genômica Ampla , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Recursive splicing is a process in which large introns are removed in multiple steps by re-splicing at ratchet points--5' splice sites recreated after splicing. Recursive splicing was first identified in the Drosophila Ultrabithorax (Ubx) gene and only three additional Drosophila genes have since been experimentally shown to undergo recursive splicing. Here we identify 197 zero nucleotide exon ratchet points in 130 introns of 115 Drosophila genes from total RNA sequencing data generated from developmental time points, dissected tissues and cultured cells. The sequential nature of recursive splicing was confirmed by identification of lariat introns generated by splicing to and from the ratchet points. We also show that recursive splicing is a constitutive process, that depletion of U2AF inhibits recursive splicing, and that the sequence and function of ratchet points are evolutionarily conserved in Drosophila. Finally, we identify four recursively spliced human genes, one of which is also recursively spliced in Drosophila. Together, these results indicate that recursive splicing is commonly used in Drosophila, occurs in humans, and provides insight into the mechanisms by which some large introns are removed.
Assuntos
Drosophila melanogaster/genética , Genoma de Inseto/genética , Nucleotídeos/genética , Splicing de RNA/genética , Animais , Sequência de Bases , Células Cultivadas , Éxons/genética , Feminino , Genes de Insetos/genética , Humanos , Íntrons/genética , Masculino , Proteínas Nucleares/deficiência , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Sítios de Splice de RNA/genética , Reprodutibilidade dos Testes , Ribonucleoproteínas/deficiência , Ribonucleoproteínas/genética , Ribonucleoproteínas/metabolismo , Fator de Processamento U2AFRESUMO
Azoxymethane (AOM) is a widely used carcinogen to study chemical-induced colorectal carcinogenesis and is an agent for studying fulminant hepatic failure. The inter-strain susceptibility to acute toxicity by AOM has been reported, but its association with host genetics or gut microbiota remains largely unexplored. Here a cohort of genetically diverse Collaborative Cross (CC) mice was used to assess the contribution of host genetics and the gut microbiome to AOM-induced acute toxicity. We observed variation in AOM-induced acute liver failure across CC strains. Quantitative trait loci (QTL) analysis revealed three chromosome regions significantly associated with AOM toxicity. Genes located within these QTL, including peroxisome proliferator-activated receptor alpha (Ppara), were enriched for enzyme activator and nucleoside-triphosphatase regulator activity. We further demonstrated that the protein level of PPARα in liver tissues from sensitive strains was remarkably lower compared to levels in resistant strains, consistent with protective role of PPAR family in liver injury. We discovered that the abundance levels of gut microbial families Anaeroplasmataceae, Ruminococcaceae, Lactobacillaceae, Akkermansiaceae and Clostridiaceae were significantly higher in the sensitive strains compared to the resistant strains. Using a random forest classifier method, we determined that the relative abundance levels of these microbial families predicted AOM toxicity with the area under the receiver-operating curve (AUC) of 0.75. Combining the three genetic loci and five microbial families increased the predictive accuracy of AOM toxicity (AUC of 0.99). Moreover, we found that Ruminococcaceae and Lactobacillaceae acted as mediators between host genetics and AOM toxicity. In conclusion, this study shows that host genetics and specific microbiome members play a critical role in AOM-induced acute toxicity, which provides a framework for analysis of the health effects from environmental toxicants.
Assuntos
Azoximetano/toxicidade , Carcinógenos/toxicidade , Doença Hepática Induzida por Substâncias e Drogas/etiologia , Microbioma Gastrointestinal , Animais , Doença Hepática Induzida por Substâncias e Drogas/genética , Doença Hepática Induzida por Substâncias e Drogas/microbiologia , Camundongos de Cruzamento Colaborativo , Falência Hepática Aguda/induzido quimicamente , Falência Hepática Aguda/genética , Falência Hepática Aguda/microbiologia , Masculino , Camundongos , Locos de Características Quantitativas , Especificidade da EspécieRESUMO
During terminal erythropoiesis, the splicing machinery in differentiating erythroblasts executes a robust intron retention (IR) program that impacts expression of hundreds of genes. We studied IR mechanisms in the SF3B1 splicing factor gene, which expresses â¼50% of its transcripts in late erythroblasts as a nuclear isoform that retains intron 4. RNA-seq analysis of nonsense-mediated decay (NMD)-inhibited cells revealed previously undescribed splice junctions, rare or not detected in normal cells, that connect constitutive exons 4 and 5 to highly conserved cryptic cassette exons within the intron. Minigene splicing reporter assays showed that these cassettes promote IR. Genome-wide analysis of splice junction reads demonstrated that cryptic noncoding cassettes are much more common in large (>1 kb) retained introns than they are in small retained introns or in nonretained introns. Functional assays showed that heterologous cassettes can promote retention of intron 4 in the SF3B1 splicing reporter. Although many of these cryptic exons were spliced inefficiently, they exhibited substantial binding of U2AF1 and U2AF2 adjacent to their splice acceptor sites. We propose that these exons function as decoys that engage the intron-terminal splice sites, thereby blocking cross-intron interactions required for excision. Developmental regulation of decoy function underlies a major component of the erythroblast IR program.
Assuntos
Processamento Alternativo , Eritroblastos/citologia , Fatores de Processamento de RNA/genética , Análise de Sequência de RNA/métodos , Diferenciação Celular , Células Cultivadas , Eritroblastos/química , Éxons , Humanos , Íntrons , Degradação do RNAm Mediada por Códon sem Sentido , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Sítios de Splice de RNA , Fatores de Processamento de RNA/metabolismo , Fator de Processamento U2AF/metabolismoRESUMO
Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.
Assuntos
Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Processamento Alternativo/genética , Animais , Drosophila melanogaster/anatomia & histologia , Drosophila melanogaster/citologia , Feminino , Masculino , Anotação de Sequência Molecular , Tecido Nervoso/metabolismo , Especificidade de Órgãos , Poli A/genética , Poliadenilação , Regiões Promotoras Genéticas/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Caracteres Sexuais , Estresse Fisiológico/genéticaRESUMO
OBJECTIVE: The Collaborative Cross (CC) is a mouse population model with diverse and reproducible genetic backgrounds used to identify novel disease models and genes that contribute to human disease. Since spontaneous tumour susceptibility in CC mice remains unexplored, we assessed tumour incidence and spectrum. DESIGN: We monitored 293 mice from 18 CC strains for tumour development. Genetic association analysis and RNA sequencing were used to identify susceptibility loci and candidate genes. We analysed genomes of patients with gastric cancer to evaluate the relevance of genes identified in the CC mouse model and measured the expression levels of ISG15 by immunohistochemical staining using a gastric adenocarcinoma tissue microarray. Association of gene expression with overall survival (OS) was assessed by Kaplan-Meier analysis. RESULTS: CC mice displayed a wide range in the incidence and types of spontaneous tumours. More than 40% of CC036 mice developed gastric tumours within 1 year. Genetic association analysis identified Nfκb1 as a candidate susceptibility gene, while RNA sequencing analysis of non-tumour gastric tissues from CC036 mice showed significantly higher expression of inflammatory response genes. In human gastric cancers, the majority of human orthologues of the 166 mouse genes were preferentially altered by amplification or deletion and were significantly associated with OS. Higher expression of the CC036 inflammatory response gene signature is associated with poor OS. Finally, ISG15 protein is elevated in gastric adenocarcinomas and correlated with shortened patient OS. CONCLUSIONS: CC strains exhibit tremendous variation in tumour susceptibility, and we present CC036 as a spontaneous laboratory mouse model for studying human gastric tumourigenesis.
Assuntos
Carcinogênese/patologia , Modelos Animais de Doenças , Predisposição Genética para Doença/etiologia , Neoplasias Gástricas/etiologia , Animais , Carcinogênese/genética , Camundongos de Cruzamento Colaborativo , Feminino , Masculino , Camundongos , Neoplasias Gástricas/patologiaRESUMO
Variations in oral bacterial communities have been linked to oral cancer suggesting that the oral microbiome is an etiological factor that can influence oral cancer development. The 4-nitroquinoline 1-oxide (4-NQO)-induced murine oral and esophageal cancer model is frequently used to assess the effects of preventive and/or therapeutic agents. We used this model to assess the impact of the microbiome on tumorigenesis using axenic (germ-free) and conventionally housed mice. Increased toxicity was observed in germ-free mice, however, no difference in tumor incidence, multiplicity, and size was observed. Transcriptional profiling of liver tissue from germ-free and conventionally housed mice identified 254 differentially expressed genes including ten cytochrome p450 enzymes, the largest family of phase-1 drug metabolizing enzymes in the liver. Gene ontology revealed that differentially expressed genes were enriched for liver steatosis, inflammation, and oxidative stress in livers of germ-free mice. Our observations emphasize the importance of the microbiome in mediating chemical toxicity at least in part by altering host gene expression. Studies on the role of the microbiome in chemical-induced cancer using germ-free animal models should consider the potential difference in dose due to the microbiome-mediated changes in host metabolizing capacity, which might influence the ability to draw conclusions especially for tumor induction models that are dose dependent.
Assuntos
4-Nitroquinolina-1-Óxido/toxicidade , Carcinogênese/patologia , Carcinógenos/toxicidade , Transformação Celular Neoplásica/patologia , Neoplasias Esofágicas/patologia , Microbiota , Neoplasias Bucais/patologia , Animais , Carcinogênese/induzido quimicamente , Carcinogênese/genética , Carcinoma de Células Escamosas/induzido quimicamente , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/patologia , Transformação Celular Neoplásica/induzido quimicamente , Transformação Celular Neoplásica/genética , Modelos Animais de Doenças , Neoplasias Esofágicas/induzido quimicamente , Neoplasias Esofágicas/genética , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Neoplasias Bucais/induzido quimicamente , Neoplasias Bucais/genética , Neoplasias da Língua/induzido quimicamente , Neoplasias da Língua/genética , Neoplasias da Língua/patologiaRESUMO
Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set ofDrosophilaearly embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identified 21 principal patterns (PP). Providing a compact yet biologically interpretable representation ofDrosophilaexpression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior-posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. The performance of PP with theDrosophiladata suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica/fisiologia , Redes Reguladoras de Genes/fisiologia , Modelos Genéticos , Animais , Drosophila melanogasterRESUMO
The modENCODE (Model Organism Encyclopedia of DNA Elements) Consortium aimed to map functional elements-including transcripts, chromatin marks, regulatory factor binding sites, and origins of DNA replication-in the model organisms Drosophila melanogaster and Caenorhabditis elegans. During its five-year span, the consortium conducted more than 2,000 genome-wide assays in developmentally staged animals, dissected tissues, and homogeneous cell lines. Analysis of these data sets provided foundational insights into genome, epigenome, and transcriptome structure and the evolutionary turnover of regulatory pathways. These studies facilitated a comparative analysis with similar data types produced by the ENCODE Consortium for human cells. Genome organization differs drastically in these distant species, and yet quantitative relationships among chromatin state, transcription, and cotranscriptional RNA processing are deeply conserved. Of the many biological discoveries of the modENCODE Consortium, we highlight insights that emerged from integrative studies. We focus on operational and scientific lessons that may aid future projects of similar scale or aims in other, emerging model systems.
Assuntos
Caenorhabditis elegans/genética , Bases de Dados Factuais , Drosophila melanogaster/genética , Genômica/métodos , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Proteínas de Ligação a DNA/genética , Genoma Helmíntico , Genoma de Inseto , TranscriptomaRESUMO
In eukaryotic cells, RNAs exist as ribonucleoprotein particles (RNPs). Despite the importance of these complexes in many biological processes, including splicing, polyadenylation, stability, transportation, localization, and translation, their compositions are largely unknown. We affinity-purified 20 distinct RNA-binding proteins (RBPs) from cultured Drosophila melanogaster cells under native conditions and identified both the RNA and protein compositions of these RNP complexes. We identified "high occupancy target" (HOT) RNAs that interact with the majority of the RBPs we surveyed. HOT RNAs encode components of the nonsense-mediated decay and splicing machinery, as well as RNA-binding and translation initiation proteins. The RNP complexes contain proteins and mRNAs involved in RNA binding and post-transcriptional regulation. Genes with the capacity to produce hundreds of mRNA isoforms, ultracomplex genes, interact extensively with heterogeneous nuclear ribonuclear proteins (hnRNPs). Our data are consistent with a model in which subsets of RNPs include mRNA and protein products from the same gene, indicating the widespread existence of auto-regulatory RNPs. From the simultaneous acquisition and integrative analysis of protein and RNA constituents of RNPs, we identify extensive cross-regulatory and hierarchical interactions in post-transcriptional control.
Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Regulação da Expressão Gênica , Proteínas de Ligação a RNA/metabolismo , Animais , Proteínas de Drosophila/genética , Ribonucleoproteínas Nucleares Heterogêneas/genética , Ribonucleoproteínas Nucleares Heterogêneas/metabolismo , Splicing de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/genética , Análise de Sequência de RNA , TransfecçãoRESUMO
Alternative splicing is regulated by RNA binding proteins (RBPs) that recognize pre-mRNA sequence elements and activate or repress adjacent exons. Here, we used RNA interference and RNA-seq to identify splicing events regulated by 56 Drosophila proteins, some previously unknown to regulate splicing. Nearly all proteins affected alternative first exons, suggesting that RBPs play important roles in first exon choice. Half of the splicing events were regulated by multiple proteins, demonstrating extensive combinatorial regulation. We observed that SR and hnRNP proteins tend to act coordinately with each other, not antagonistically. We also identified a cross-regulatory network where splicing regulators affected the splicing of pre-mRNAs encoding other splicing regulators. This large-scale study substantially enhances our understanding of recent models of splicing regulation and provides a resource of thousands of exons that are regulated by 56 diverse RBPs.
Assuntos
Processamento Alternativo , Proteínas de Drosophila/genética , Drosophila/genética , Proteínas de Ligação a RNA/genética , Fatores Associados à Proteína de Ligação a TATA/genética , Animais , Proteínas de Drosophila/metabolismo , Éxons , Ribonucleoproteínas Nucleares Heterogêneas/genética , Ribonucleoproteínas Nucleares Heterogêneas/metabolismo , Interferência de RNA , Precursores de RNA/genética , Precursores de RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA , Fatores Associados à Proteína de Ligação a TATA/metabolismoRESUMO
Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.
Assuntos
Drosophila melanogaster/genética , Genoma , Animais , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Biologia Computacional , Mapeamento de Sequências Contíguas , Sequenciamento de Nucleotídeos em Larga Escala , Hibridização in Situ Fluorescente , Dados de Sequência Molecular , Cromossomos Politênicos , Mapeamento por RestriçãoRESUMO
Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Assuntos
Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Transcrição Gênica/genética , Processamento Alternativo/genética , Animais , Sequência de Bases , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Éxons/genética , Feminino , Genes de Insetos/genética , Genoma de Inseto/genética , Masculino , MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , Isoformas de Proteínas/genética , Edição de RNA/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , Pequeno RNA não Traduzido/análise , Pequeno RNA não Traduzido/genética , Análise de Sequência , Caracteres SexuaisRESUMO
Cys2-His2 zinc finger proteins (ZFPs) are the largest group of transcription factors in higher metazoans. A complete characterization of these ZFPs and their associated target sequences is pivotal to fully annotate transcriptional regulatory networks in metazoan genomes. As a first step in this process, we have characterized the DNA-binding specificities of 129 zinc finger sets from Drosophila using a bacterial one-hybrid system. This data set contains the DNA-binding specificities for at least one encoded ZFP from 70 unique genes and 23 alternate splice isoforms representing the largest set of characterized ZFPs from any organism described to date. These recognition motifs can be used to predict genomic binding sites for these factors within the fruit fly genome. Subsets of fingers from these ZFPs were characterized to define their orientation and register on their recognition sequences, thereby allowing us to define the recognition diversity within this finger set. We find that the characterized fingers can specify 47 of the 64 possible DNA triplets. To confirm the utility of our finger recognition models, we employed subsets of Drosophila fingers in combination with an existing archive of artificial zinc finger modules to create ZFPs with novel DNA-binding specificity. These hybrids of natural and artificial fingers can be used to create functional zinc finger nucleases for editing vertebrate genomes.
Assuntos
Sítios de Ligação , Proteínas de Drosophila/genética , Drosophila/genética , Motivos de Nucleotídeos , Dedos de Zinco/genética , Processamento Alternativo , Animais , Sequência de Bases , Análise por Conglomerados , Biologia Computacional/métodos , Proteínas de Drosophila/química , Proteínas de Drosophila/classificação , Modelos Moleculares , Filogenia , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Conformação ProteicaRESUMO
ChIP-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high-throughput method to understand transcriptional regulation, especially on a global scale. This has led to great interest in the underlying biochemical mechanisms that direct TF-DNA binding, with the ultimate goal of computationally predicting a TF's occupancy profile in any cellular condition. In this study, we examined the influence of various potential determinants of TF-DNA binding on a much larger scale than previously undertaken. We used a thermodynamics-based model of TF-DNA binding, called "STAP," to analyze 45 TF-ChIP data sets from Drosophila embryonic development. We built a cross-validation framework that compares a baseline model, based on the ChIP'ed ("primary") TF's motif, to more complex models where binding by secondary TFs is hypothesized to influence the primary TF's occupancy. Candidates interacting TFs were chosen based on RNA-SEQ expression data from the time point of the ChIP experiment. We found widespread evidence of both cooperative and antagonistic effects by secondary TFs, and explicitly quantified these effects. We were able to identify multiple classes of interactions, including (1) long-range interactions between primary and secondary motifs (separated by ≤150 bp), suggestive of indirect effects such as chromatin remodeling, (2) short-range interactions with specific inter-site spacing biases, suggestive of direct physical interactions, and (3) overlapping binding sites suggesting competitive binding. Furthermore, by factoring out the previously reported strong correlation between TF occupancy and DNA accessibility, we were able to categorize the effects into those that are likely to be mediated by the secondary TF's effect on local accessibility and those that utilize accessibility-independent mechanisms. Finally, we conducted in vitro pull-down assays to test model-based predictions of short-range cooperative interactions, and found that seven of the eight TF pairs tested physically interact and that some of these interactions mediate cooperative binding to DNA.