RESUMO
Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Assuntos
Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Transcrição Gênica/genética , Processamento Alternativo/genética , Animais , Sequência de Bases , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Éxons/genética , Feminino , Genes de Insetos/genética , Genoma de Inseto/genética , Masculino , MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , Isoformas de Proteínas/genética , Edição de RNA/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , Pequeno RNA não Traduzido/análise , Pequeno RNA não Traduzido/genética , Análise de Sequência , Caracteres SexuaisRESUMO
Drosophila melanogaster cell lines are important resources for cell biologists. Here, we catalog the expression of exons, genes, and unannotated transcriptional signals for 25 lines. Unannotated transcription is substantial (typically 19% of euchromatic signal). Conservatively, we identify 1405 novel transcribed regions; 684 of these appear to be new exons of neighboring, often distant, genes. Sixty-four percent of genes are expressed detectably in at least one line, but only 21% are detected in all lines. Each cell line expresses, on average, 5885 genes, including a common set of 3109. Expression levels vary over several orders of magnitude. Major signaling pathways are well represented: most differentiation pathways are "off" and survival/growth pathways "on." Roughly 50% of the genes expressed by each line are not part of the common set, and these show considerable individuality. Thirty-one percent are expressed at a higher level in at least one cell line than in any single developmental stage, suggesting that each line is enriched for genes characteristic of small sets of cells. Most remarkable is that imaginal disc-derived lines can generally be assigned, on the basis of expression, to small territories within developing discs. These mappings reveal unexpected stability of even fine-grained spatial determination. No two cell lines show identical transcription factor expression. We conclude that each line has retained features of an individual founder cell superimposed on a common "cell line" gene expression pattern.
Assuntos
Drosophila melanogaster/genética , Variação Genética , Transcrição Gênica , Animais , Linhagem Celular , Análise por Conglomerados , Éxons , Feminino , Perfilação da Expressão Gênica , Masculino , Dados de Sequência Molecular , Transdução de Sinais/genética , Fatores de Transcrição/genéticaRESUMO
The ability to design and engineer organisms demands the ability to predict kinetic responses of novel regulatory networks built from well-characterized biological components. Surprisingly, few validated kinetic models of complex regulatory networks have been derived by combining models of the network components. A major bottleneck in producing such models is the difficulty of measuring in vivo rate constants for components of complex networks. We demonstrate that a simple, genetic approach to measuring rate constants in vivo produces an accurate kinetic model of the complex network that Saccharomyces cerevisiae employs to regulate the expression of genes encoding glucose transporters. The model predicts a transient pulse of transcription of HXT4 (but not HXT2 or HXT3) in response to addition of a small amount of glucose to cells, an outcome we observed experimentally. Our model also provides a mechanistic explanation for this result: HXT2-4 are governed by a type 2, incoherent feed forward regulatory loop involving the Rgt1 and Mig2 transcriptional repressors. The efficiency with which Rgt1 and Mig2 repress expression of each HXT gene determines which of them have a pulse of transcription in response to glucose. Finally, the model correctly predicts how lesions in the feed forward loop change the kinetics of induction of HXT4 expression.
Assuntos
Glucose/metabolismo , Modelos Biológicos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Ligação a DNA/metabolismo , Regulação Fúngica da Expressão Gênica , Genes Fúngicos , Proteínas Facilitadoras de Transporte de Glucose/genética , Cinética , Redes e Vias Metabólicas , RNA Fúngico/genética , RNA Fúngico/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Proteínas Repressoras/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Transdução de Sinais , Biologia de Sistemas , Fatores de Transcrição/metabolismo , Transcrição GênicaRESUMO
Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.
Assuntos
Clonagem Molecular/métodos , Biologia Computacional/métodos , DNA Complementar/genética , Biblioteca Gênica , Genes/genética , Mamíferos/genética , Animais , DNA/biossíntese , Humanos , Camundongos , National Institutes of Health (U.S.) , Ratos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Estados UnidosRESUMO
Trace Recalling is a novel method for deconvoluting double traces that result from simultaneously sequencing two DNA templates. Trace Recalling identifies up to two bases at each position of such a trace. The resulting ambiguity sequence is aligned to the genome, identifying one template sequence. A second template sequence is then inferred from this alignment. This technique makes possible many exciting biological applications. Here we present two such applications, alternate splice finding and elucidation of multiple insertion sites in a random insertional mutagenesis library. Our results demonstrate that RT-PCR followed by Trace Recalling is a more efficient and cost effective way to find alternate splices than traditional methods. We also present a method for mapping double-insertion events in a random insertional-mutagenesis library.
Assuntos
Processamento Alternativo , DNA/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , Algoritmos , Técnicas Genéticas , Humanos , Dados de Sequência Molecular , Mutagênese Insercional , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodosRESUMO
A complete and accurate set of human protein-coding gene annotations is perhaps the single most important resource for genomic research after the human-genome sequence itself, yet the major gene catalogs remain incomplete and imperfect. Here we describe a genome-wide effort, carried out as part of the Mammalian Gene Collection (MGC) project, to identify human genes not yet in the gene catalogs. Our approach was to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR. We have identified 734 novel gene fragments (NGFs) containing 2188 exons with, at most, weak prior cDNA support. These NGFs correspond to an estimated 563 distinct genes, of which >160 are completely absent from the major gene catalogs, while hundreds of others represent significant extensions of known genes. The NGFs appear to be predominantly protein-coding genes rather than noncoding RNAs, unlike novel transcribed sequences identified by technologies such as tiling arrays and CAGE. They tend to be expressed at low levels and in a tissue-specific manner, and they are enriched for roles in motor activity, cell adhesion, connective tissue, and central nervous system development. Our results demonstrate that many important genes and gene fragments have been missed by traditional approaches to gene discovery but can be identified by their evolutionary signatures using comparative sequence data. However, they suggest that hundreds-not thousands-of protein-coding genes are completely missing from the current gene catalogs.