RESUMO
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
Assuntos
Estudos de Associação Genética , Animais , Biologia Computacional , Curadoria de Dados , Bases de Dados Factuais/normas , Interação Gene-Ambiente , Genômica , Humanos , Fenótipo , Padrões de Referência , Reprodutibilidade dos Testes , Terminologia como AssuntoRESUMO
During evolution, gene repatterning across eukaryotic genomes is not uniform. Some genomic regions exhibit a gene organization conserved phylogenetically, while others are recurrently involved in chromosomal rearrangement, resulting in breakpoint reuse. Both gene order conservation and breakpoint reuse can result from the existence of functional constraints on where chromosomal breakpoints occur or from the existence of regions that are susceptible to breakage. The balance between these two mechanisms is still poorly understood. Drosophila species have very dynamic genomes and, therefore, can be very informative. We compared the gene organization of the main five chromosomal elements (Muller's elements A-E) of nine Drosophila species. Under a parsimonious evolutionary scenario, we estimate that 6116 breakpoints differentiate the gene orders of the species and that breakpoint reuse is associated with approximately 80% of the orthologous landmarks. The comparison of the observed patterns of change in gene organization with those predicted under different simulated modes of evolution shows that fragile regions alone can explain the observed key patterns of Muller's element A (X chromosome) more often than for any other Muller's element. High levels of fragility plus constraints operating on approximately 15% of the genome are sufficient to explain the observed patterns of change and conservation across species. The orthologous landmarks more likely to be under constraint exhibit both a remarkable internal functional heterogeneity and a lack of common functional themes with the exception of the presence of highly conserved noncoding elements. Fragile regions rather than functional constraints have been the main determinant of the evolution of the Drosophila chromosomes.
Assuntos
Sítios Frágeis do Cromossomo/genética , Drosophila/genética , Ordem dos Genes , Genoma de Inseto , Animais , Sequência de Bases , Pontos de Quebra do Cromossomo , Inversão Cromossômica/genética , Evolução Molecular , Feminino , Expressão Gênica , Masculino , Cromossomo X/genéticaRESUMO
Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ) methodology, wherein the affected entity (E) and how it is affected (Q) are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM). These human annotations were loaded into our Ontology-Based Database (OBD) along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify gene candidates and animal models of human disease, which may shorten the lengthy path to identification and understanding of the genetic basis of human disease.
Assuntos
Modelos Animais de Doenças , Estudos de Associação Genética , Fenótipo , Alelos , Animais , Proteínas Hedgehog/genética , Humanos , Transdução de Sinais/genética , Peixe-Zebra , Proteínas de Peixe-Zebra/genéticaRESUMO
Wolbachia are vertically transmitted, obligatory intracellular bacteria that infect a great number of species of arthropods and nematodes. In insects, they are mainly known for disrupting the reproductive biology of their hosts in order to increase their transmission through the female germline. In Drosophila melanogaster, however, a strong and consistent effect of Wolbachia infection has not been found. Here we report that a bacterial infection renders D. melanogaster more resistant to Drosophila C virus, reducing the load of viruses in infected flies. We identify these resistance-inducing bacteria as Wolbachia. Furthermore, we show that Wolbachia also increases resistance of Drosophila to two other RNA virus infections (Nora virus and Flock House virus) but not to a DNA virus infection (Insect Iridescent Virus 6). These results identify a new major factor regulating D. melanogaster resistance to infection by RNA viruses and contribute to the idea that the response of a host to a particular pathogen also depends on its interactions with other microorganisms. This is also, to our knowledge, the first report of a strong beneficial effect of Wolbachia infection in D. melanogaster. The induced resistance to natural viral pathogens may explain Wolbachia prevalence in natural populations and represents a novel Wolbachia-host interaction.
Assuntos
Drosophila melanogaster/microbiologia , Drosophila melanogaster/virologia , Vírus de RNA/fisiologia , Simbiose , Viroses/imunologia , Viroses/microbiologia , Wolbachia/fisiologia , Animais , Drosophila melanogaster/efeitos dos fármacos , Drosophila melanogaster/imunologia , Feminino , Imunidade Inata/efeitos dos fármacos , Masculino , Vírus de RNA/efeitos dos fármacos , Reprodutibilidade dos Testes , Simbiose/efeitos dos fármacos , Tetraciclina/farmacologia , Wolbachia/efeitos dos fármacosRESUMO
FlyBase (http://flybase.org) is a database of Drosophila genetic and genomic information. Gene Ontology (GO) terms are used to describe three attributes of wild-type gene products: their molecular function, the biological processes in which they play a role, and their subcellular location. This article describes recent changes to the FlyBase GO annotation strategy that are improving the quality of the GO annotation data. Many of these changes stem from our participation in the GO Reference Genome Annotation Project--a multi-database collaboration producing comprehensive GO annotation sets for 12 diverse species.
Assuntos
Bases de Dados Genéticas , Proteínas de Drosophila/genética , Drosophila/genética , Genes de Insetos , Animais , Genoma de Inseto , Genômica , Vocabulário ControladoRESUMO
That closely related species often differ by chromosomal inversions was discovered by Sturtevant and Plunkett in 1926. Our knowledge of how these inversions originate is still very limited, although a prevailing view is that they are facilitated by ectopic recombination events between inverted repetitive sequences. The availability of genome sequences of related species now allows us to study in detail the mechanisms that generate interspecific inversions. We have analyzed the breakpoint regions of the 29 inversions that differentiate the chromosomes of Drosophila melanogaster and two closely related species, D. simulans and D. yakuba, and reconstructed the molecular events that underlie their origin. Experimental and computational analysis revealed that the breakpoint regions of 59% of the inversions (17/29) are associated with inverted duplications of genes or other nonrepetitive sequences. In only two cases do we find evidence for inverted repetitive sequences in inversion breakpoints. We propose that the presence of inverted duplications associated with inversion breakpoint regions is the result of staggered breaks, either isochromatid or chromatid, and that this, rather than ectopic exchange between inverted repetitive sequences, is the prevalent mechanism for the generation of inversions in the melanogaster species group. Outgroup analysis also revealed evidence for widespread breakpoint recycling. Lastly, we have found that expression domains in D. melanogaster may be disrupted in D. yakuba, bringing into question their potential adaptive significance.
Assuntos
Evolução Biológica , Inversão Cromossômica , Drosophila/genética , Genoma de Inseto , Animais , Quebra Cromossômica , Duplicação Gênica , Dados de Sequência MolecularRESUMO
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or 'ontologies'. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
Assuntos
Armazenamento e Recuperação da Informação/normas , Terminologia como Assunto , Vocabulário Controlado , Humanos , Sistema Nervoso/anatomia & histologia , Fenômenos Fisiológicos do Sistema NervosoRESUMO
Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds. The molecular entities in question are either natural products or synthetic products used to intervene in the processes of living organisms. Genome-encoded macromolecules (nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. In addition to molecular entities, ChEBI contains groups (parts of molecular entities) and classes of entities. ChEBI includes an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. ChEBI is available online at http://www.ebi.ac.uk/chebi/
Assuntos
Bases de Dados Factuais , Dicionários Químicos como Assunto , Agroquímicos/química , Produtos Biológicos/química , Indicadores e Reagentes/química , Internet , Isótopos/química , Preparações Farmacêuticas/química , Interface Usuário-Computador , Vocabulário ControladoRESUMO
We describe a second-generation deficiency kit for Drosophila melanogaster composed of molecularly mapped deletions on an isogenic background, covering approximately 77% of the Release 5.1 genome. Using a previously reported collection of FRT-bearing P-element insertions, we have generated 655 new deletions and verified a set of 209 deletion-bearing fly stocks. In addition to deletions, we demonstrate how the P elements may also be used to generate a set of custom inversions and duplications, particularly useful for balancing difficult regions of the genome carrying haplo-insufficient loci. We describe a simple computational resource that facilitates selection of appropriate elements for generating custom deletions. Finally, we provide a computational resource that facilitates selection of other mapped FRT-bearing elements that, when combined with the DrosDel collection, can theoretically generate over half a million precisely mapped deletions.
Assuntos
Aberrações Cromossômicas , Elementos de DNA Transponíveis , Drosophila melanogaster/genética , Genoma , Deleção de Sequência , Animais , Dados de Sequência MolecularRESUMO
The National Center for Biomedical Ontology is a consortium that comprises leading informaticians, biologists, clinicians, and ontologists, funded by the National Institutes of Health (NIH) Roadmap, to develop innovative technology and methods that allow scientists to record, manage, and disseminate biomedical information and knowledge in machine-processable form. The goals of the Center are (1) to help unify the divergent and isolated efforts in ontology development by promoting high quality open-source, standards-based tools to create, manage, and use ontologies, (2) to create new software tools so that scientists can use ontologies to annotate and analyze biomedical data, (3) to provide a national resource for the ongoing evaluation, integration, and evolution of biomedical ontologies and associated tools and theories in the context of driving biomedical projects (DBPs), and (4) to disseminate the tools and resources of the Center and to identify, evaluate, and communicate best practices of ontology development to the biomedical community. Through the research activities within the Center, collaborations with the DBPs, and interactions with the biomedical community, our goal is to help scientists to work more effectively in the e-science paradigm, enhancing experiment design, experiment execution, data analysis, information synthesis, hypothesis generation and testing, and understand human disease.
Assuntos
Pesquisa Biomédica/normas , National Institutes of Health (U.S.) , Software , Internet , Semântica , Estados UnidosRESUMO
Hybrid daughters of crosses between Drosophila melanogaster females and males from the D. simulans species clade are fully viable at low temperature but have agametic ovaries and are thus sterile. We report here that mutations in the D. melanogaster gene Hybrid male rescue (Hmr), along with unidentified polymorphic factors, rescue this agametic phenotype in both D. melanogaster/D. simulans and D. melanogaster/D. mauritiana F(1) female hybrids. These hybrids produced small numbers of progeny in backcrosses, their low fecundity being caused by incomplete rescue of oogenesis as well as by zygotic lethality. F(1) hybrid males from these crosses remained fully sterile. Hmr(+) is the first Drosophila gene shown to cause hybrid female sterility. These results also suggest that, while there is some common genetic basis to hybrid lethality and female sterility in D. melanogaster, hybrid females are more sensitive to fertility defects than to lethality.
Assuntos
Drosophila/genética , Hibridização Genética/fisiologia , Infertilidade Feminina/genética , Animais , Cruzamentos Genéticos , Drosophila/metabolismo , Drosophila melanogaster/genética , Feminino , Genes Letais , Infertilidade Feminina/metabolismo , MutaçãoRESUMO
Myosin VIIs provide motor function for a wide range of eukaryotic processes. We demonstrate that mutations in crinkled (ck) disrupt the Drosophila myosin VIIA heavy chain. The ck/myoVIIA protein is present at a low level throughout fly development and at the same level in heads, thoraxes, and abdomens. Severe ck alleles, likely to be molecular nulls, die as embryos or larvae, but all allelic combinations tested thus far yield a small fraction of adult "escapers" that are weak and infertile. Scanning electron microscopy shows that escapers have defects in bristles and hairs, indicating that this motor protein plays a role in the structure of the actin cytoskeleton. We generate a homology model for the structure of the ck/myosin VIIA head that indicates myosin VIIAs, like myosin IIs, have a spectrin-like, SH3 subdomain fronting their N terminus. In addition, we establish that the two myosin VIIA FERM repeats share high sequence similarity with only the first two subdomains of the three-lobed structure that is typical of canonical FERM domains. Nevertheless, the approximately 100 and approximately 75 amino acids that follow the first two lobes of the first and second FERM domains are highly conserved among myosin VIIs, suggesting that they compose a conserved myosin tail homology 7 (MyTH7) domain that may be an integral part of the FERM domain or may function independently of it. Together, our data suggest a key role for ck/myoVIIA in the formation of cellular projections and other actin-based functions required for viability.
Assuntos
Drosophila melanogaster/genética , Miosinas/genética , Sequência de Aminoácidos , Animais , Sequência Conservada , Drosophila melanogaster/metabolismo , Dineínas , Genes Letais , Modelos Moleculares , Dados de Sequência Molecular , Mutação , Miosina VIIa , Miosinas/metabolismo , Fenótipo , Estrutura Terciária de Proteína , Análise de Sequência de ProteínaRESUMO
We describe a collection of P-element insertions that have considerable utility for generating custom chromosomal aberrations in Drosophila melanogaster. We have mobilized a pair of engineered P elements, p[RS3] and p[RS5], to collect 3243 lines unambiguously mapped to the Drosophila genome sequence. The collection contains, on average, an element every 35 kb. We demonstrate the utility of the collection for generating custom chromosomal deletions that have their end points mapped, with base-pair resolution, to the genome sequence. The collection was generated in an isogenic strain, thus affording a uniform background for screens where sensitivity to genetic background is high. The entire collection, along with a computational and genetic toolbox for designing and generating custom deletions, is publicly available. Using the collection it is theoretically possible to generate >12,000 deletions between 1 bp and 1 Mb in size by simple eye color selection. In addition, a further 37,000 deletions, selectable by molecular screening, may be generated. We are now using the collection to generate a second-generation deficiency kit that is precisely mapped to the genome sequence.
Assuntos
Aberrações Cromossômicas , Elementos de DNA Transponíveis/genética , Drosophila melanogaster/genética , Animais , Técnicas Genéticas , Mutagênese Insercional/métodosRESUMO
The mantra of the 'post-genomic' era is 'gene function'. Yet surprisingly little attention has been given to how functional and other information concerning genes is to be captured, made accessible to biologists or structured in a computable form. The aim of the Gene Ontology (GO) Consortium is to provide a framework for both the description and the organisation of such information. The GO Consortium is presently concerned with three structured controlled vocabularies which can be used to describe three discrete biological domains, building structured vocabularies which can be used to describe the molecular function, biological roles and cellular locations of gene products.
Assuntos
Técnicas Genéticas , Modelos Genéticos , Animais , Bases de Dados como Assunto , Modelos Biológicos , SoftwareRESUMO
BACKGROUND: Hormones frequently guide animal development via the induction of cascades of gene activities, whose products further amplify an initial hormonal stimulus. In Drosophila the transformation of the larva into the pupa and the subsequent metamorphosis to the adult stage is triggered by changes in the titer of the steroid hormone 20-hydroxyecdysone. singed wings (swi) is the only gene known in Drosophila melanogaster for which mutations specifically interrupt the transmission of the regulatory signal from early to late ecdysone inducible genes. RESULTS: We have characterized singed wings locus, showing it to correspond to EG:171E4.2 (CG3095). swi encodes a predicted 68.5-kDa protein that contains N-terminal histidine-rich and threonine-rich domains, a cysteine-rich C-terminal region and two leucine-rich repeats. The SWI protein has a close homolog in D. melanogaster, defining a new family of SWI-like proteins, and is conserved in D. pseudoobscura. A lethal mutation, swit476, shows a severe disruption of the ecdysone pathway and is a C>Y substitution in one of the two conserved CysXCys motifs that are common to SWI and the Drosophila Toll-4 protein. CONCLUSIONS: It is not entirely clear from the present molecular analysis how the SWI protein may function in the ecdysone induced cascade. Currently all predictions agree in that SWI is very unlikely to be a nuclear protein. Thus it probably exercises its control of "late" ecdysone genes indirectly. Apparently the genetic regulation of ecdysone signaling is much more complex then was previously anticipated.
Assuntos
Proteínas de Drosophila/química , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Marcadores Genéticos/genética , Sequência de Aminoácidos/genética , Animais , Sequência de Bases/genética , Clonagem Molecular/métodos , Mapeamento de Sequências Contíguas/métodos , Feminino , Masculino , Dados de Sequência Molecular , Cromossomo X/genéticaRESUMO
Phenotype ontologies are typically constructed to serve the needs of a particular community, such as annotation of genotype-phenotype associations in mouse or human. Here we demonstrate how these ontologies can be improved through assignment of logical definitions using a core ontology of phenotypic qualities and multiple additional ontologies from the Open Biological Ontologies library. We also show how these logical definitions can be used for data integration when combined with a unified multi-species anatomy ontology.
Assuntos
Mapeamento Cromossômico/métodos , Genoma , Especificidade da Espécie , Algoritmos , Animais , Automação , Biologia Computacional/métodos , Genoma Humano , Genótipo , Humanos , Camundongos , Modelos Biológicos , Fenótipo , SoftwareRESUMO
This paper describes an approach to providing computer-interpretable logical definitions for the terms of the Human Phenotype Ontology (HPO) using PATO, the ontology of phenotypic qualities, to link terms of the HPO to the anatomic and other entities that are affected by abnormal phenotypic qualities. This approach will allow improved computerized reasoning as well as a facility to compare phenotypes between different species. The PATO mapping will also provide direct links from phenotypic abnormalities and underlying anatomic structures encoded using the Foundational Model of Anatomy, which will be a valuable resource for computational investigations of the links between anatomical components and concepts representing diseases with abnormal phenotypes and associated genes.
Assuntos
Modelos Anatômicos , Fenótipo , Animais , Engenharia Biomédica , Biologia Computacional , Humanos , Síndrome de Marfan/patologia , Camundongos , Especificidade da Espécie , Vocabulário ControladoRESUMO
WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at http://www.wikiprofessional.org.
Assuntos
Bases de Dados de Proteínas , Proteínas/genética , Software , Armazenamento e Recuperação da Informação , InternetRESUMO
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.