RESUMO
In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.
Assuntos
Biologia Computacional/métodos , Genótipo , Fenótipo , Algoritmos , Animais , Ontologias Biológicas , Bases de Dados Genéticas , Exoma , Estudos de Associação Genética , Variação Genética , Genômica , Humanos , Internet , Software , Pesquisa Translacional Biomédica , Interface Usuário-ComputadorRESUMO
PomBase (www.pombase.org), the model organism database for the fission yeast Schizosaccharomyces pombe, has undergone a complete redevelopment, resulting in a more fully integrated, better-performing service. The new infrastructure supports daily data updates as well as fast, efficient querying and smoother navigation within and between pages. New pages for publications and genotypes provide routes to all data curated from a single source and to all phenotypes associated with a specific genotype, respectively. For ontology-based annotations, improved displays balance comprehensive data coverage with ease of use. The default view now uses ontology structure to provide a concise, non-redundant summary that can be expanded to reveal underlying details and metadata. The phenotype annotation display also offers filtering options to allow users to focus on specific areas of interest. An instance of the JBrowse genome browser has been integrated, facilitating loading of and intuitive access to, genome-scale datasets. Taken together, the new data and pages, along with improvements in annotation display and querying, allow users to probe connections among different types of data to form a comprehensive view of fission yeast biology. The new PomBase implementation also provides a rich set of modular, reusable tools that can be deployed to create new, or enhance existing, organism-specific databases.
Assuntos
Bases de Dados Genéticas , Genoma Fúngico/genética , Schizosaccharomyces/genética , Internet , Software , Interface Usuário-ComputadorRESUMO
PomBase (http://www.pombase.org) is the model organism database for the fission yeast Schizosaccharomyces pombe. PomBase provides a central hub for the fission yeast community, supporting both exploratory and hypothesis-driven research. It provides users easy access to data ranging from the sequence level, to molecular and phenotypic annotations, through to the display of genome-wide high-throughput studies. Recent improvements to the site extend annotation specificity, improve usability and allow for monthly data updates. Both in-house curators and community researchers provide manually curated data to PomBase. The genome browser provides access to published high-throughput data sets and the genomes of three additional Schizosaccharomyces species (Schizosaccharomyces cryophilus, Schizosaccharomyces japonicus and Schizosaccharomyces octosporus).
Assuntos
Bases de Dados Genéticas , Schizosaccharomyces/genética , Expressão Gênica , Ontologia Genética , Genes Fúngicos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Anotação de Sequência MolecularRESUMO
Modern biomedical research depends critically on access to databases that house and disseminate genetic, genomic, molecular, and cell biological knowledge. Even as the explosion of available genome sequences and associated genome-scale data continues apace, the sustainability of professionally maintained biological databases is under threat due to policy changes by major funding agencies. Here, we focus on model organism databases to demonstrate the myriad ways in which biological databases not only act as repositories but actively facilitate advances in research. We present data that show that reducing financial support to model organism databases could prove to be not just scientifically, but also economically, unsound.
Assuntos
Pesquisa Biomédica , Bases de Dados Genéticas , Genoma Fúngico , Genômica , Biologia Molecular , Anotação de Sequência Molecular , Schizosaccharomyces/genéticaRESUMO
MOTIVATION: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. AVAILABILITY: Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/).
Assuntos
Bases de Dados Factuais , Software , Ontologias Biológicas , Internet , SchizosaccharomycesRESUMO
BACKGROUND: The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations. RESULTS: The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector-target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions. CONCLUSIONS: The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism's gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.
Assuntos
Ontologia Genética , Anotação de Sequência Molecular , Biologia Computacional/métodos , Humanos , Proteínas/genéticaRESUMO
MOTIVATION: To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast. RESULTS: The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species. AVAILABILITY: FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry Web site (http://obofoundry.org/).
Assuntos
Fenótipo , Schizosaccharomyces/genética , Ontologias Biológicas , Bases de Dados Genéticas , Ontologia GenéticaRESUMO
PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.
Assuntos
Bases de Dados Genéticas , Schizosaccharomyces/genética , Genoma Fúngico , Genômica , Internet , Anotação de Sequência Molecular , FenótipoRESUMO
Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.
RESUMO
BACKGROUND: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. RESULTS: We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. CONCLUSIONS: The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.
Assuntos
Biologia , Química , Genes , Vocabulário ControladoRESUMO
MOTIVATION: The systematic observation of phenotypes has become a crucial tool of functional genomics, and several large international projects are currently underway to identify and characterize the phenotypes that are associated with genotypes in several species. To integrate phenotype descriptions within and across species, phenotype ontologies have been developed. Applying ontologies to unify phenotype descriptions in the domain of physiology has been a particular challenge due to the high complexity of the underlying domain. RESULTS: In this study, we present the outline of a theory and its implementation for an ontology of physiology-related phenotypes. We provide a formal description of process attributes and relate them to the attributes of their temporal parts and participants. We apply our theory to create the Cellular Phenotype Ontology (CPO). The CPO is an ontology of morphological and physiological phenotypic characteristics of cells, cell components and cellular processes. Its prime application is to provide terms and uniform definition patterns for the annotation of cellular phenotypes. The CPO can be used for the annotation of observed abnormalities in domains, such as systems microscopy, in which cellular abnormalities are observed and for which no phenotype ontology has been created. AVAILABILITY AND IMPLEMENTATION: The CPO and the source code we generated to create the CPO are freely available on http://cell-phenotype.googlecode.com.
Assuntos
Fenômenos Fisiológicos Celulares , Fenótipo , Vocabulário Controlado , SemânticaRESUMO
The fission yeast Schizosaccharomyces japonicus has recently emerged as a powerful system for studying the evolution of essential cellular processes, drawing on similarities as well as key differences between S. japonicus and the related, well-established model Schizosaccharomyces pombe. We have deployed the open-source, modular code and tools originally developed for PomBase, the S. pombe model organism database (MOD), to create JaponicusDB (www.japonicusdb.org), a new MOD dedicated to S. japonicus. By providing a central resource with ready access to a growing body of experimental data, ontology-based curation, seamless browsing and querying, and the ability to integrate new data with existing knowledge, JaponicusDB supports fission yeast biologists to a far greater extent than any other source of S. japonicus data. JaponicusDB thus enables S. japonicus researchers to realize the full potential of studying a newly emerging model species and illustrates the widely applicable power and utility of harnessing reusable PomBase code to build a comprehensive, community-maintainable repository of species-relevant knowledge.
Assuntos
Schizosaccharomyces , Bases de Dados Factuais , Schizosaccharomyces/genéticaRESUMO
PomBase (www.pombase.org), the model organism database (MOD) for the fission yeast Schizosaccharomyces pombe, supports research within and beyond the S. pombe community by integrating and presenting genetic, molecular, and cell biological knowledge into intuitive displays and comprehensive data collections. With new content, novel query capabilities, and biologist-friendly data summaries and visualization, PomBase also drives innovation in the MOD community.
Assuntos
Schizosaccharomyces , Biologia , Bases de Dados Factuais , Schizosaccharomyces/genéticaRESUMO
BACKGROUND: Maintaining a bio-ontology in the long term requires improving and updating its contents so that it adequately captures what is known about biological phenomena. This paper illustrates how these processes are carried out, by studying the ways in which curators at the Gene Ontology have hitherto incorporated new knowledge into their resource. RESULTS: Five types of circumstances are singled out as warranting changes in the ontology: (1) the emergence of anomalies within GO; (2) the extension of the scope of GO; (3) divergence in how terminology is used across user communities; (4) new discoveries that change the meaning of the terms used and their relations to each other; and (5) the extension of the range of relations used to link entities or processes described by GO terms. CONCLUSION: This study illustrates the difficulties involved in applying general standards to the development of a specific ontology. Ontology curation aims to produce a faithful representation of knowledge domains as they keep developing, which requires the translation of general guidelines into specific representations of reality and an understanding of how scientific knowledge is produced and constantly updated. In this context, it is important that trained curators with technical expertise in the scientific field(s) in question are involved in supervising ontology shifts and identifying inaccuracies.
Assuntos
Genética , Bases de Conhecimento , Terminologia como Assunto , GenesRESUMO
The Gene Ontology (GO) consists of nearly 30,000 classes for describing the activities and locations of gene products. Manual maintenance of ontology of this size is a considerable effort, and errors and inconsistencies inevitably arise. Reasoners can be used to assist with ontology development, automatically placing classes in a subsumption hierarchy based on their properties. However, the historic lack of computable definitions within the GO has prevented the user of these tools. In this paper, we present preliminary results of an ongoing effort to normalize the GO by explicitly stating the definitions of compositional classes in a form that can be used by reasoners. These definitions are partitioned into mutually exclusive cross-product sets, many of which reference other OBO Foundry candidate ontologies for chemical entities, proteins, biological qualities and anatomical entities. Using these logical definitions we are gradually beginning to automate many aspects of ontology development, detecting errors and filling in missing relationships. These definitions also enhance the GO by weaving it into the fabric of a wider collection of interoperating ontologies, increasing opportunities for data integration and enhancing genomic analyses.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genética , Vocabulário Controlado , Anatomia , Animais , Biologia Celular , Genes , Humanos , Biologia MolecularRESUMO
Maximizing the impact and value of scientific research requires efficient knowledge distribution, which increasingly depends on the integration of standardized published data into online databases. To make data integration more comprehensive and efficient for fission yeast research, PomBase has pioneered a community curation effort that engages publication authors directly in FAIR-sharing of data representing detailed biological knowledge from hypothesis-driven experiments. Canto, an intuitive online curation tool that enables biologists to describe their detailed functional data using shared ontologies, forms the core of PomBase's system. With 8 years' experience, and as the author response rate reaches 50%, we review community curation progress and the insights we have gained from the project. We highlight incentives and nudges we deploy to maximize participation, and summarize project outcomes, which include increased knowledge integration and dissemination as well as the unanticipated added value arising from co-curation by publication authors and professional curators.
Assuntos
Schizosaccharomyces , Curadoria de Dados , Gerenciamento de Dados , Bases de Dados Factuais , Schizosaccharomyces/genéticaRESUMO
Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.
Assuntos
Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular , Bases de Dados Genéticas , Evolução Molecular , Genoma Fúngico , Genômica/métodos , Controle de Qualidade , Schizosaccharomyces/genética , Navegador , Fluxo de TrabalhoRESUMO
MOTIVATION: The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of uncharacterized protein structures and sequences. Consequently, many computational tools have been developed to help elucidate protein function. However, such services are spread throughout the world, often with standalone web pages. Integration of these methods is needed and so far this has not been possible as there was no common vocabulary available that could be used as a standard language. RESULTS: The Protein Feature Ontology has been developed to provide a structured controlled vocabulary for features on a protein sequence or structure and comprises approximately 100 positional terms, now integrated into the Sequence Ontology (SO) and 40 non-positional terms which describe features relating to the whole-protein sequence. In addition, post-translational modifications are described by using a pre-existing ontology, the Protein Modification Ontology (MOD). This ontology is being used to integrate over 150 distinct annotations provided by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in Europe. AVAILABILITY: The Protein Feature Ontology can be browsed by accessing the ontology lookup service at the European Bioinformatics Institute (http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS).
Assuntos
Biologia Computacional/métodos , Proteínas/química , Software , Vocabulário Controlado , Bases de Dados de Proteínas , Internet , Proteínas/metabolismo , Proteoma/genéticaRESUMO
The first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes (BP). To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences. We use a simple yet powerful metric based on Gene Ontology BP terms to define characterized and uncharacterized proteins for human, budding yeast and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe, and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalogue of proteins' biological roles.
Assuntos
Células Eucarióticas/metabolismo , Proteoma/metabolismo , Proteômica/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Schizosaccharomyces/metabolismo , Perfilação da Expressão Gênica , Ontologia Genética , Humanos , Proteoma/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Proteínas de Schizosaccharomyces pombe/genética , Especificidade da EspécieRESUMO
High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community.