RESUMEN
Echinobase (www.echinobase.org) is a third generation web resource supporting genomic research on echinoderms. The new version was built by cloning the mature Xenopus model organism knowledgebase, Xenbase, refactoring data ingestion pipelines and modifying the user interface to adapt to multispecies echinoderm content. This approach leveraged over 15 years of previous database and web application development to generate a new fully featured informatics resource in a single year. In addition to the software stack, Echinobase uses the private cloud and physical hosts that support Xenbase. Echinobase currently supports six echinoderm species, focused on those used for genomics, developmental biology and gene regulatory network analyses. Over 38 000 gene pages, 18 000 publications, new improved genome assemblies, JBrowse genome browser and BLAST + services are available and supported by the development of a new echinoderm anatomical ontology, uniformly applied formal gene nomenclature, and consistent orthology predictions. A novel feature of Echinobase is integrating support for multiple, disparate species. New genomes from the diverse echinoderm phylum will be added and supported as data becomes available. The common code development design of the integrated knowledgebases ensures parallel improvements as each resource evolves. This approach is widely applicable for developing new model organism informatics resources.
Asunto(s)
Bases de Datos Genéticas , Equinodermos/genética , Redes Reguladoras de Genes , Genoma , Interfaz Usuario-Computador , Animales , Equinodermos/clasificación , Genómica , Internet , Bases del Conocimiento , Anotación de Secuencia Molecular , Filogenia , Xenopus/genéticaRESUMEN
Proteomics methodology has expanded to include protein structural analysis, primarily through cross-linking mass spectrometry (XL-MS) and hydrogen-deuterium exchange mass spectrometry (HX-MS). However, while the structural proteomics community has effective tools for primary data analysis, there is a need for structure modeling pipelines that are accessible to the proteomics specialist. Integrative structural biology requires the aggregation of multiple distinct types of data to generate models that satisfy all inputs. Here, we describe IMProv, an app in the Mass Spec Studio that combines XL-MS data with other structural data, such as cryo-EM densities and crystallographic structures, for integrative structure modeling on high-performance computing platforms. The resource provides an easily deployed bundle that includes the open-source Integrative Modeling Platform program (IMP) and its dependencies. IMProv also provides functionality to adjust cross-link distance restraints according to the underlying dynamics of cross-linked sites, as characterized by HX-MS. A dynamics-driven conditioning of restraint values can improve structure modeling precision, as illustrated by an integrative structure of the five-membered Polycomb Repressive Complex 2. IMProv is extensible to additional types of data.
Asunto(s)
Modelos Moleculares , Proteómica/métodos , Programas Informáticos , Espectrometría de Masas , Complejo Represivo Polycomb 2/química , Conformación ProteicaRESUMEN
BACKGROUND: Ontologies of precisely defined, controlled vocabularies are essential to curate the results of biological experiments such that the data are machine searchable, can be computationally analyzed, and are interoperable across the biomedical research continuum. There is also an increasing need for methods to interrelate phenotypic data easily and accurately from experiments in animal models with human development and disease. RESULTS: Here we present the Xenopus phenotype ontology (XPO) to annotate phenotypic data from experiments in Xenopus, one of the major vertebrate model organisms used to study gene function in development and disease. The XPO implements design patterns from the Unified Phenotype Ontology (uPheno), and the principles outlined by the Open Biological and Biomedical Ontologies (OBO Foundry) to maximize interoperability with other species and facilitate ongoing ontology management. Constructed in Web Ontology Language (OWL) the XPO combines the existing uPheno library of ontology design patterns with additional terms from the Xenopus Anatomy Ontology (XAO), the Phenotype and Trait Ontology (PATO) and the Gene Ontology (GO). The integration of these different ontologies into the XPO enables rich phenotypic curation, whilst the uPheno bridging axioms allows phenotypic data from Xenopus experiments to be related to phenotype data from other model organisms and human disease. Moreover, the simple post-composed uPheno design patterns facilitate ongoing XPO development as the generation of new terms and classes of terms can be substantially automated. CONCLUSIONS: The XPO serves as an example of current best practices to help overcome many of the inherent challenges in harmonizing phenotype data between different species. The XPO currently consists of approximately 22,000 terms and is being used to curate phenotypes by Xenbase, the Xenopus Model Organism Knowledgebase, forming a standardized corpus of genotype-phenotype data that can be directly related to other uPheno compliant resources.
Asunto(s)
Ontologías Biológicas , Animales , Ontología de Genes , Humanos , Fenotipo , Xenopus laevisRESUMEN
Xenbase (www.xenbase.org) is a knowledge base for researchers and biomedical scientists that employ the amphibian Xenopus as a model organism in biomedical research to gain a deeper understanding of developmental and disease processes. Through expert curation and automated data provisioning from various sources Xenbase strives to integrate the body of knowledge on Xenopus genomics and biology together with the visualization of biologically significant interactions. Most current studies utilize next generation sequencing (NGS) but until now the results of different experiments were difficult to compare and not integrated with other Xenbase content. Xenbase has developed a suite of tools, interfaces and data processing pipelines that transforms NCBI Gene Expression Omnibus (GEO) NGS content into deeply integrated gene expression and chromatin data, mapping all aligned reads to the most recent genome builds. This content can be queried and visualized via multiple tools and also provides the basis for future automated 'gene expression as a phenotype' and gene regulatory network analyses.
Asunto(s)
Bases de Datos Genéticas , Redes Reguladoras de Genes/genética , Genómica , Programas Informáticos , Xenopus/genética , Animales , Secuenciación de Inmunoprecipitación de Cromatina , Expresión Génica/genética , Secuenciación de Nucleótidos de Alto Rendimiento , RNA-Seq , Interfaz Usuario-ComputadorRESUMEN
Xenbase (www.xenbase.org) is an online resource for researchers utilizing Xenopus laevis and Xenopus tropicalis, and for biomedical scientists seeking access to data generated with these model systems. Content is aggregated from a variety of external resources and also generated by in-house curation of scientific literature and bioinformatic analyses. Over the past two years many new types of content have been added along with new tools and functionalities to reflect the impact of high-throughput sequencing. These include new genomes for both supported species (each with chromosome scale assemblies), new genome annotations, genome segmentation, dynamic and interactive visualization for RNA-Seq data, updated ChIP-Seq mapping, GO terms, protein interaction data, ORFeome support, and improved connectivity to other biomedical and bioinformatic resources.
Asunto(s)
Bases de Datos Genéticas , Epigenómica , Genoma , Transcriptoma , Xenopus/genética , Animales , Secuencia de Bases , Sistemas CRISPR-Cas , Inmunoprecipitación de Cromatina , Biología Computacional/organización & administración , Bases de Datos de Ácidos Nucleicos , Ontología de Genes , Genómica , MicroARNs/genética , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta/genética , ARN/genética , Programas Informáticos , Interfaz Usuario-Computador , Navegador Web , Xenopus laevis/genéticaRESUMEN
Rhythms of various periodicities drive cyclical processes in organisms ranging from single cells to the largest mammals on earth, and on scales from cellular physiology to global migrations. The molecular mechanisms that generate circadian behaviours in model organisms have been well studied, but longer phase cycles and interactions between cycles with different periodicities remain poorly understood. Broadcast spawning corals are one of the best examples of an organism integrating inputs from multiple environmental parameters, including seasonal temperature, the lunar phase and hour of the day, to calibrate their annual reproductive event. We present a deep RNA-sequencing experiment utilizing multiple analyses to differentiate transcriptomic responses modulated by the interactions between the three aforementioned environmental parameters. Acropora millepora was sampled over multiple 24-hr periods throughout a full lunar month and at two seasonal temperatures. Temperature, lunar and diurnal cycles produce distinct transcriptomic responses, with interactions between all three variables identifying a core set of genes. These core genes include mef2, a developmental master regulator, and two heterogeneous nuclear ribonucleoproteins, one of which is known to post-transcriptionally interact with mef2 and with biological clock-regulating mRNAs. Interactions between diurnal and temperature differences impacted a range of core processes ranging from biological clocks to stress responses. Genes involved with developmental processes and transcriptional regulation were impacted by the lunar phase and seasonal temperature differences. Lastly, there was a diurnal and lunar phase interaction in which genes involved with RNA-processing and translational regulation were differentially regulated. These data illustrate the extraordinary levels of transcriptional variation across time in a simple radial cnidarian in response to the environment under normal conditions.
Asunto(s)
Antozoos/genética , Ritmo Circadiano , Luna , Estaciones del Año , Temperatura , Animales , Antozoos/fisiología , Australia , Relojes Biológicos/genética , Regulación de la Expresión Génica , Reproducción , TranscriptomaRESUMEN
Echinobase (www.echinobase.org) is a model organism knowledgebase serving as a resource for the community that studies echinoderms, a phylum of marine invertebrates that includes sea urchins and sea stars. Echinoderms have been important experimental models for over 100 years and continue to make important contributions to environmental, evolutionary, and developmental studies, including research on developmental gene regulatory networks. As a centralized resource, Echinobase hosts genomes and collects functional genomic data, reagents, literature, and other information for the community. This third-generation site is based on the Xenbase knowledgebase design and utilizes gene-centric pages to minimize the time and effort required to access genomic information. Summary gene pages display gene symbols and names, functional data, links to the JBrowse genome browser, and orthology to other organisms and reagents, and tabs from the Summary gene page contain more detailed information concerning mRNAs, proteins, diseases, and protein-protein interactions. The gene pages also display 1:1 orthologs between the fully supported species Strongylocentrotus purpuratus (purple sea urchin), Lytechinus variegatus (green sea urchin), Patiria miniata (bat star), and Acanthaster planci (crown-of-thorns sea star). JBrowse tracks are available for visualization of functional genomic data from both fully supported species and the partially supported species Anneissia japonica (feather star), Asterias rubens (sugar star), and L. pictus (painted sea urchin). Echinobase serves a vital role by providing researchers with annotated genomes including orthology, functional genomic data aligned to the genomes, and curated reagents and data. The Echinoderm Anatomical Ontology provides a framework for standardizing developmental data across the phylum, and knowledgebase content is formatted to be findable, accessible, interoperable, and reusable by the research community.
Asunto(s)
Bases de Datos Genéticas , Equinodermos , Animales , Equinodermos/genética , Genoma , Genómica/métodos , Erizos de Mar/genética , Bases del ConocimientoRESUMEN
Xenbase (https://www.xenbase.org/), the Xenopus model organism knowledgebase, is a web-accessible resource that integrates the diverse genomic and biological data from research on the laboratory frogs Xenopus laevis and Xenopus tropicalis. The goal of Xenbase is to accelerate discovery and empower Xenopus research, to enhance the impact of Xenopus research data, and to facilitate the dissemination of these data. Xenbase also enhances the value of Xenopus data through high-quality curation, data integration, providing bioinformatics tools optimized for Xenopus experiments, and linking Xenopus data to human data, and other model organisms. Xenbase also plays an indispensable role in making Xenopus data interoperable and accessible to the broader biomedical community in accordance with FAIR principles. Xenbase provides annotated data updates to organizations such as NCBI, UniProtKB, Ensembl, the Gene Ontology consortium, and most recently, the Alliance of Genomic Resources, a common clearing house for data from humans and model organisms. This article provides a brief overview of key and recently added features of Xenbase. New features include processing of Xenopus high-throughput sequencing data from the NCBI Gene Expression Omnibus; curation of anatomical, physiological, and expression phenotypes with the newly created Xenopus Phenotype Ontology; Xenopus Gene Ontology annotations; new anatomical drawings of the Normal Table of Xenopus development; and integration of the latest Xenopus laevis v10.1 genome annotations. Finally, we highlight areas for future development at Xenbase as we continue to support the Xenopus research community.
Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Humanos , Xenopus laevis/genética , Xenopus/genética , Biología ComputacionalRESUMEN
A keyword-based search of comprehensive databases such as PubMed may return irrelevant papers, especially if the keywords are used in multiple fields of study. In such cases, domain experts (curators) need to verify the results and remove the irrelevant articles. Automating this filtering process will save time, but it has to be done well enough to ensure few relevant papers are rejected and few irrelevant papers are accepted. A good solution would be fast, work with the limited amount of data freely available (full paper body may be missing), handle ambiguous keywords and be as domain-neutral as possible. In this paper, we evaluate a number of classification algorithms for identifying a domain-specific set of papers about echinoderm species and show that the resulting tool satisfies most of the abovementioned requirements. Echinoderms consist of a number of very different organisms, including brittle stars, sea stars (starfish), sea urchins and sea cucumbers. While their taxonomic identifiers are specific, the common names are used in many other contexts, creating ambiguity and making a keyword search prone to error. We try classifiers using Linear, Naïve Bayes, Nearest Neighbor, Tree, SVM, Bagging, AdaBoost and Neural Network learning models and compare their performance. We show how effective the resulting classifiers are in filtering irrelevant articles returned from PubMed. The methodology used is more dependent on the good selection of training data and is a practical solution that can be applied to other fields of study facing similar challenges. Database URL: The code and date reported in this paper are freely available at http://xenbaseturbofrog.org/pub/Text-Topic-Classifier/.
Asunto(s)
Algoritmos , Equinodermos , Animales , Teorema de Bayes , Bases de Datos Factuales , PubMedRESUMEN
Structural Mass Spectrometry (SMS) provides a comprehensive toolbox for the analysis of protein structure and function. It offers multiple sources of structural information that are increasingly useful for integrative structural modeling of complex protein systems. As MS-based structural workflows scale to larger systems, consistent and coherent data interpretation resources are needed to better support modeling. Unlike the proteomics community, practitioners of SMS lack adequate computational tools. Here, we review new developments in the Mass Spec Studio: an expandable ecosystem of workflows for the analysis of complementary SMS techniques with linkages to modeling. Current functionality in the Studio (version 2) supports three major SMS workflows (crosslinking, hydrogen/deuterium exchange and covalent labelling) and two pipelines for structural modeling, with a special focus on data integration. The Mass Spec Studio is an architecture focused on rapid and robust extension of functionality by a community of developers. SIGNIFICANCE: This review surveys the new data analysis capabilities within the Mass Spec Studio, a rich framework for rapid software development specifically targeting the community of structural proteomics and structural mass spectrometry. Updates to crosslinking, hydrogen/deuterium-exchange and covalent labeling apps are provided as well as a utility for translating such analyses into restraints that support integrative structural modeling. These new capabilities, together with the underlying design tools and content, provide the community with a wealth of resources to tackle complex structural problem and design new approaches to data analysis.
Asunto(s)
Ecosistema , Proteínas , Espectrometría de Masas , Proteómica , Programas InformáticosRESUMEN
At a fundamental level most genes, signaling pathways, biological functions and organ systems are highly conserved between man and all vertebrate species. Leveraging this conservation, researchers are increasingly using the experimental advantages of the amphibian Xenopus to model human disease. The online Xenopus resource, Xenbase, enables human disease modeling by curating the Xenopus literature published in PubMed and integrating these Xenopus data with orthologous human genes, anatomy, and more recently with links to the Online Mendelian Inheritance in Man resource (OMIM) and the Human Disease Ontology (DO). Here we review how Xenbase supports disease modeling and report on a meta-analysis of the published Xenopus research providing an overview of the different types of diseases being modeled in Xenopus and the variety of experimental approaches being used. Text mining of over 50,000 Xenopus research articles imported into Xenbase from PubMed identified approximately 1,000 putative disease- modeling articles. These articles were manually assessed and annotated with disease ontologies, which were then used to classify papers based on disease type. We found that Xenopus is being used to study a diverse array of disease with three main experimental approaches: cell-free egg extracts to study fundamental aspects of cellular and molecular biology, oocytes to study ion transport and channel physiology and embryo experiments focused on congenital diseases. We integrated these data into Xenbase Disease Pages to allow easy navigation to disease information on external databases. Results of this analysis will equip Xenopus researchers with a suite of experimental approaches available to model or dissect a pathological process. Ideally clinicians and basic researchers will use this information to foster collaborations necessary to interrogate the development and treatment of human diseases.
RESUMEN
Xenbase is the Xenopus model organism database ( www.xenbase.org ), a web-accessible resource that integrates the diverse genomic and biological data for Xenopus research. It hosts a variety of content including current and archived genomes for both X. laevis and X. tropicalis, bioinformatic tools for comparative genetic analyses including BLAST and GBrowse, annotated Xenopus literature, and catalogs of reagents including antibodies, ORFeome clones, morpholinos, and transgenic lines. Xenbase compiles gene-specific pages which include manually curated gene expression images, functional information including gene ontology (GO), disease associations, and links to other major data sources such as NCBI:Entrez, UniProtKB, and Ensembl. We also maintain the Xenopus Anatomy Ontology (XAO) which describes anatomy throughout embryonic development. This chapter provides a full description of the many features of Xenbase, and offers a guide on how to use various tools to perform a variety of common tasks such as identifying nucleic acid or protein sequences, finding gene expression patterns for specific genes, stages or tissues, identifying literature on a specific gene or tissue, locating useful reagents and downloading our extensive content, including Xenopus gene-Human gene disease mapping files.