RESUMO
Since 1992, FlyBase (flybase.org) has been an essential online resource for the Drosophila research community. Concentrating on the most extensively studied species, Drosophila melanogaster, FlyBase includes information on genes (molecular and genetic), transgenic constructs, phenotypes, genetic and physical interactions, and reagents such as stocks and cDNAs. Access to data is provided through a number of tools, reports, and bulk-data downloads. Looking to the future, FlyBase is expanding its focus to serve a broader scientific community. In this update, we describe new features, datasets, reagent collections, and data presentations that address this goal, including enhanced orthology data, Human Disease Model Reports, protein domain search and visualization, concise gene summaries, a portal for external resources, video tutorials and the FlyBase Community Advisory Group.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Drosophila/genética , Genômica/métodos , Animais , Modelos Animais de Doenças , Estudos de Associação Genética , Humanos , NavegadorRESUMO
Release 6, the latest reference genome assembly of the fruit fly Drosophila melanogaster, was released by the Berkeley Drosophila Genome Project in 2014; it replaces their previous Release 5 genome assembly, which had been the reference genome assembly for over 7 years. With the enormous amount of information now attached to the D. melanogaster genome in public repositories and individual laboratories, the replacement of the previous assembly by the new one is a major event requiring careful migration of annotations and genome-anchored data to the new, improved assembly. In this report, we describe the attributes of the new Release 6 reference genome assembly, the migration of FlyBase genome annotations to this new assembly, how genome features on this new assembly can be viewed in FlyBase (http://flybase.org) and how users can convert coordinates for their own data to the corresponding Release 6 coordinates.
Assuntos
Bases de Dados Genéticas , Drosophila melanogaster/genética , Genoma de Inseto , Anotação de Sequência Molecular , Animais , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Modelos Genéticos , Dados de Sequência Molecular , Padrões de Referência , Alinhamento de Sequência , SoftwareRESUMO
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
Assuntos
Drosophila/classificação , Drosophila/genética , Evolução Molecular , Genoma de Inseto/genética , Genômica , Animais , Sequência de Bases , Sítios de Ligação , Sequência Conservada , Proteínas de Drosophila/genética , Éxons/genética , Regulação da Expressão Gênica/genética , Genes de Insetos/genética , MicroRNAs/genética , Dados de Sequência Molecular , Especificidade de Órgãos , Filogenia , Regiões não Traduzidas/genéticaRESUMO
FlyBase provides a centralized resource for the genetic and genomic data of Drosophila melanogaster. As FlyBase enters our fourth decade of service to the research community, we reflect on our unique aspects and look forward to our continued collaboration with the larger research and model organism communities. In this study, we emphasize the dedicated reports and tools we have constructed to meet the specialized needs of fly researchers but also to facilitate use by other research communities. We also highlight ways that we support the fly community, including an external resources page, help resources, and multiple avenues by which researchers can interact with FlyBase.
Assuntos
Bases de Dados Genéticas , Drosophila melanogaster , Animais , Drosophila melanogaster/genética , Genoma , GenômicaRESUMO
Whole genome sequencing of the model organisms has created increased demand for efficient tools to facilitate the genome annotation efforts. Accordingly, we report the further implementations and analyses stemming from our publicly available P{wHy} library for Drosophila melanogaster. A two-step regime-large scale transposon mutagenesis followed by hobo-induced nested deletions-allows mutation saturation and provides significant enhancements to existing genomic coverage. We previously showed that, for a given starting insert, deletion saturation is readily obtained over a 60-kb interval; here, we perform a breakdown analysis of efficiency to identify rate-limiting steps in the process. Transrecombination, the hobo-induced recombination between two P{wHy} half molecules, was shown to further expand the P{wHy} mutational range, pointing to a potent, iterative process of transrecombination-reconstitution-transrecombination for alternating between very large and very fine-grained deletions in a self-contained manner. A number of strains also showed partial or complete repression of P{wHy} markers, depending on chromosome location, whereby asymmetric marker silencing allowed continuous phenotypic detection, indicating that P{wHy}-based saturational mutagenesis should be useful for the study of heterochromatin/positional effects.
Assuntos
Elementos de DNA Transponíveis/genética , Drosophila melanogaster/genética , Genoma de Inseto/genética , Mutagênese Insercional , Animais , Sítios de Ligação/genética , Mapeamento Cromossômico , Bases de Dados Genéticas , Teste de Complementação Genética , Modelos Genéticos , Recombinação Genética , Deleção de SequênciaRESUMO
FlyBase (http://flybase.org/) is the primary database of genetic and genomic data for the insect family Drosophilidae. Historically, Drosophila melanogaster has been the most extensively studied species in this family, but recent determination of the genomic sequences of an additional 11 Drosophila species opens up new avenues of research for other Drosophila species. This extensive sequence resource, encompassing species with well-defined phylogenetic relationships, provides a model system for comparative genomic analyses. FlyBase has developed tools to facilitate access to and navigation through this invaluable new data collection.
Assuntos
Bases de Dados Genéticas , Drosophilidae/genética , Genoma de Inseto , Animais , Drosophila/classificação , Drosophila/genética , Drosophilidae/classificação , Genômica , Internet , Filogenia , Software , Interface Usuário-ComputadorRESUMO
FlyBase (http://flybase.org) is the primary repository of genetic and molecular data of the insect family Drosophilidae. For the most extensively studied species, Drosophila melanogaster, a wide range of data are presented in integrated formats. Data types include mutant phenotypes, molecular characterization of mutant alleles and aberrations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. There is a growing body of data for other Drosophila species; this is expected to increase dramatically over the next year, with the completion of draft-quality genomic sequences of an additional 11 Drosophila species.
Assuntos
Bases de Dados Genéticas , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Drosophila/genética , Animais , Mapeamento Cromossômico , Proteínas de Drosophila/química , Proteínas de Drosophila/fisiologia , Drosophila melanogaster/fisiologia , Genes de Insetos , Genoma , Modelos GenéticosRESUMO
For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.
Assuntos
Bases de Dados Genéticas/estatística & dados numéricos , Drosophila melanogaster/genética , Genes de Insetos , Genoma de Inseto , Proteômica/métodos , Software , Animais , Linhagem Celular , Bases de Dados Genéticas/história , Conjuntos de Dados como Assunto , Modelos Animais de Doenças , Drosophila melanogaster/metabolismo , História do Século XX , História do Século XXI , Humanos , Internet , Pesquisa Translacional BiomédicaRESUMO
The use of Drosophila melanogaster as a model for studying human disease is well established, reflected by the steady increase in both the number and proportion of fly papers describing human disease models in recent years. In this article, we highlight recent efforts to improve the availability and accessibility of the disease model information in FlyBase (http://flybase.org), the model organism database for Drosophila. FlyBase has recently introduced Human Disease Model Reports, each of which presents background information on a specific disease, a tabulation of related disease subtypes, and summaries of experimental data and results using fruit flies. Integrated presentations of relevant data and reagents described in other sections of FlyBase are incorporated into these reports, which are specifically designed to be accessible to non-fly researchers in order to promote collaboration across model organism communities working in translational science. Another key component of disease model information in FlyBase is that data are collected in a consistent format --- using the evolving Disease Ontology (an open-source standardized ontology for human-disease-associated biomedical data) - to allow robust and intuitive searches. To facilitate this, FlyBase has developed a dedicated tool for querying and navigating relevant data, which include mutations that model a disease and any associated interacting modifiers. In this article, we describe how data related to fly models of human disease are presented in individual Gene Reports and in the Human Disease Model Reports. Finally, we discuss search strategies and new query tools that are available to access the disease model data in FlyBase.
Assuntos
Pesquisa Biomédica , Bases de Dados Genéticas , Modelos Animais de Doenças , Doença , Drosophila melanogaster/fisiologia , Esclerose Lateral Amiotrófica/patologia , Animais , HumanosRESUMO
FlyBase (flybase.org) is the primary online database of genetic, genomic, and functional information about Drosophila species, with a major focus on the model organism Drosophila melanogaster. The long and rich history of Drosophila research, combined with recent surges in genomic-scale and high-throughput technologies, mean that FlyBase now houses a huge quantity of data. Researchers need to be able to rapidly and intuitively query these data, and the QuickSearch tool has been designed to meet these needs. This tool is conveniently located on the FlyBase homepage and is organized into a series of simple tabbed interfaces that cover the major data and annotation classes within the database. This unit describes the functionality of all aspects of the QuickSearch tool. With this knowledge, FlyBase users will be equipped to take full advantage of all QuickSearch features and thereby gain improved access to data relevant to their research. © 2016 by John Wiley & Sons, Inc.
Assuntos
Bases de Dados Genéticas , Genômica/métodos , Animais , Drosophila melanogaster/genética , Genoma/genéticaRESUMO
In the context of the FlyBase annotated gene models in Drosophila melanogaster, we describe the many exceptional cases we have curated from the literature or identified in the course of FlyBase analysis. These range from atypical but common examples such as dicistronic and polycistronic transcripts, noncanonical splices, trans-spliced transcripts, noncanonical translation starts, and stop-codon readthroughs, to single exceptional cases such as ribosomal frameshifting and HAC1-type intron processing. In FlyBase, exceptional genes and transcripts are flagged with Sequence Ontology terms and/or standardized comments. Because some of the rule-benders create problems for handlers of high-throughput data, we discuss plans for flagging these cases in bulk data downloads.
Assuntos
Drosophila melanogaster/genética , Anotação de Sequência Molecular , Animais , Sequência de Bases , Códon de Terminação , Bases de Dados Genéticas , Mitocôndrias/genética , Mitocôndrias/metabolismo , Modelos Genéticos , Biossíntese de Proteínas , Edição de RNA , Sítios de Splice de RNARESUMO
We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3' UTRs (up to 15-18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.
Assuntos
Drosophila melanogaster/genética , Anotação de Sequência Molecular , Regiões 3' não Traduzidas , Animais , Bases de Dados Genéticas , Éxons , Feminino , Masculino , Modelos Genéticos , Pequeno RNA não Traduzido/química , Pequeno RNA não Traduzido/metabolismo , Análise de Sequência de RNA , Sítio de Iniciação de Transcrição , TranscriptomaRESUMO
The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect >10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster.
Assuntos
Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Genes de Insetos , Genoma de Inseto , Animais , Sequência de Bases , Códon/genética , Sequência Conservada , Proteínas de Drosophila/química , Evolução Molecular , Dados de Sequência Molecular , Fases de Leitura , Alinhamento de SequênciaRESUMO
We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25-55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species--but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.
Assuntos
Cromossomos/genética , Drosophila/genética , Evolução Molecular , Genes de Insetos/genética , Genoma , Análise de Sequência de DNA/métodos , Animais , Quebra Cromossômica/genética , Inversão Cromossômica/genética , Mapeamento Cromossômico/métodos , Sequência Conservada/genética , Drosophila melanogaster/genética , Elementos Facilitadores Genéticos , Rearranjo Gênico/genética , Variação Genética/genética , Dados de Sequência Molecular , Valor Preditivo dos Testes , Sequências Repetitivas de Ácido Nucleico/genéticaRESUMO
With the available eukaryotic genome sequences, there are predictions of thousands of previously uncharacterized genes without known function or available mutational variant. Thus, there is an urgent need for efficient genetic tools for genomewide phenotypic analysis. Here we describe such a tool: a deletion-generator technology that exploits properties of a double transposable element to produce molecularly defined deletions at high density and with high efficiency. This double element, called P[wHy], is composed of a "deleter" element hobo, bracketed by two genetic markers and inserted into a "carrier" P element. We have used this P[wHy] element in Drosophila melanogaster to generate sets of nested deletions of sufficient coverage to discriminate among every transcription unit within 60 kb of the starting insertion site. Because these two types of mobile elements, carrier and deleter, can be found in other species, our strategy should be applicable to phenotypic analysis in a variety of model organisms.
Assuntos
Mapeamento Cromossômico , Drosophila melanogaster/genética , Deleção de Genes , Genoma , Deleção de Sequência , Animais , Replicação do DNA , Dados de Sequência Molecular , FenótipoRESUMO
BACKGROUND: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. RESULTS: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. CONCLUSIONS: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.