ABSTRACT
Earth's biodiversity is increasingly threatened and at risk. We propose a passive lunar biorepository for long-term storage of prioritized taxa of live cryopreserved samples to safeguard Earth's biodiversity and to support future space exploration and planet terraforming. Our initial focus will be on cryopreserving animal skin samples with fibroblast cells. An exemplar system has been developed using cryopreserved fish fins from the Starry Goby, Asterropteryx semipunctata. Samples will be expanded into fibroblast cells, recryopreserved, and then tested in an Earth-based laboratory for robust packaging and sensitivity to radiation. Two key factors for this biorepository are the needs to reduce damage from radiation and to maintain the samples near -196° Celsius. Certain lunar sites near the poles may meet these criteria. If possible, further testing would occur on the International Space Station prior to storage on the Moon. To secure a positive shared future, this is an open call to participate in this decades-long program.
ABSTRACT
Morphology remains a primary source of phylogenetic information for many groups of organisms, and the only one for most fossil taxa. Organismal anatomy is not a collection of randomly assembled and independent "parts", but instead a set of dependent and hierarchically nested entities resulting from ontogeny and phylogeny. How do we make sense of these dependent and at times redundant characters? One promising approach is using ontologies-structured controlled vocabularies that summarize knowledge about different properties of anatomical entities, including developmental and structural dependencies. Here, we assess whether evolutionary patterns can explain the proximity of ontology-annotated characters within an ontology. To do so, we measure phylogenetic information across characters and evaluate if it matches the hierarchical structure given by ontological knowledge-in much the same way as across-species diversity structure is given by phylogeny. We implement an approach to evaluate the Bayesian phylogenetic information (BPI) content and phylogenetic dissonance among ontology-annotated anatomical data subsets. We applied this to data sets representing two disparate animal groups: bees (Hexapoda: Hymenoptera: Apoidea, 209 chars) and characiform fishes (Actinopterygii: Ostariophysi: Characiformes, 463 chars). For bees, we find that BPI is not substantially explained by anatomy since dissonance is often high among morphologically related anatomical entities. For fishes, we find substantial information for two clusters of anatomical entities instantiating concepts from the jaws and branchial arch bones, but among-subset information decreases and dissonance increases substantially moving to higher-level subsets in the ontology. We further applied our approach to address particular evolutionary hypotheses with an example of morphological evolution in miniature fishes. While we show that phylogenetic information does match ontology structure for some anatomical entities, additional relationships and processes, such as convergence, likely play a substantial role in explaining BPI and dissonance, and merit future investigation. Our work demonstrates how complex morphological data sets can be interrogated with ontologies by allowing one to access how information is spread hierarchically across anatomical concepts, how congruent this information is, and what sorts of processes may play a role in explaining it: phylogeny, development, or convergence. [Apidae; Bayesian phylogenetic information; Ostariophysi; Phenoscape; phylogenetic dissonance; semantic similarity.].
Subject(s)
Arthropods , Characiformes , Animals , Bayes Theorem , Fossils , PhylogenyABSTRACT
There is a growing body of research on the evolution of anatomy in a wide variety of organisms. Discoveries in this field could be greatly accelerated by computational methods and resources that enable these findings to be compared across different studies and different organisms and linked with the genes responsible for anatomical modifications. Homology is a key concept in comparative anatomy; two important types are historical homology (the similarity of organisms due to common ancestry) and serial homology (the similarity of repeated structures within an organism). We explored how to most effectively represent historical and serial homology across anatomical structures to facilitate computational reasoning. We assembled a collection of homology assertions from the literature with a set of taxon phenotypes for the skeletal elements of vertebrate fins and limbs from the Phenoscape Knowledgebase. Using seven competency questions, we evaluated the reasoning ramifications of two logical models: the Reciprocal Existential Axioms (REA) homology model and the Ancestral Value Axioms (AVA) homology model. The AVA model returned all user-expected results in addition to the search term and any of its subclasses. The AVA model also returns any superclass of the query term in which a homology relationship has been asserted. The REA model returned the user-expected results for five out of seven queries. We identify some challenges of implementing complete homology queries due to limitations of OWL reasoning. This work lays the foundation for homology reasoning to be incorporated into other ontology-based tools, such as those that enable synthetic supermatrix construction and candidate gene discovery. [Homology; ontology; anatomy; morphology; evolution; knowledgebase; phenoscape.].
Subject(s)
Classification/methods , Models, Biological , Animal Fins/anatomy & histology , Animals , Extremities/anatomy & histology , Vertebrates/anatomy & histologyABSTRACT
BACKGROUND: Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein-protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. RESULTS: According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. CONCLUSION: Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.
Subject(s)
Protein Interaction Mapping/methods , Protein Interaction Maps , Animals , Area Under Curve , Databases, Protein , Gene Regulatory Networks , Mice , Phenotype , ROC Curve , User-Computer Interface , Zebrafish/metabolismABSTRACT
[This corrects the article DOI: 10.1093/biosci/biz140.].
ABSTRACT
Data synthesis required for large-scale macroevolutionary studies is challenging with the current tools available for integration. Using a classic question regarding the frequency of paired fin loss in teleost fishes as a case study, we sought to create automated methods to facilitate the integration of broad-scale trait data with a sizable species-level phylogeny. Similar to the evolutionary pattern previously described for limbs, pelvic and pectoral fin reduction and loss are thought to have occurred independently multiple times in the evolution of fishes. We developed a bioinformatics pipeline to identify the presence and absence of pectoral and pelvic fins of 12,582 species. To do this, we integrated a synthetic morphological supermatrix of phenotypic data for the pectoral and pelvic fins for teleost fishes from the Phenoscape Knowledgebase (two presence/absence characters for 3047 taxa) with a species-level tree for teleost fishes from the Open Tree of Life project (38,419 species). The integration method detailed herein harnessed a new combined approach by utilizing data based on ontological inference, as well as phylogenetic propagation, to reduce overall data loss. Using inference enabled by ontology-based annotations, missing data were reduced from 98.0% to 85.9%, and further reduced to 34.8% by phylogenetic data propagation. These methods allowed us to extend the data to an additional 11,293 species for a total of 12,582 species with trait data. The pectoral fin appears to have been independently lost in a minimum of 19 lineages and the pelvic fin in 48. Though interpretation is limited by lack of phylogenetic resolution at the species level, it appears that following loss, both pectoral and pelvic fins were regained several (3) to many (14) times respectively. Focused investigation into putative regains of the pectoral fin, all within one clade (Anguilliformes), showed that the pectoral fin was regained at least twice following loss. Overall, this study points to specific teleost clades where strategic phylogenetic resolution and genetic investigation will be necessary to understand the pattern and frequency of pectoral fin reversals.
Subject(s)
Animal Fins/anatomy & histology , Biological Evolution , Computational Biology/methods , Fishes/anatomy & histology , Animal Fins/growth & development , Animals , Body Patterning , Fishes/growth & development , PhylogenyABSTRACT
Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in nonmodel organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved through disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology.
Subject(s)
Catfishes/genetics , Evolution, Molecular , Gene Expression , Models, Genetic , Phenotype , Animals , Computational Biology , Gene Expression/genetics , Gene Expression/physiology , SoftwareABSTRACT
The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, that is, cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships.
Subject(s)
Biological Ontologies , Computational Biology/methods , Phenotype , Amphibians/anatomy & histology , Amphibians/classification , Animals , Biological Evolution , Classification , Data Interpretation, StatisticalABSTRACT
The abundance of phenotypic diversity among species can enrich our knowledge of development and genetics beyond the limits of variation that can be observed in model organisms. The Phenoscape Knowledgebase (KB) is designed to enable exploration and discovery of phenotypic variation among species. Because phenotypes in the KB are annotated using standard ontologies, evolutionary phenotypes can be compared with phenotypes from genetic perturbations in model organisms. To illustrate the power of this approach, we review the use of the KB to find taxa showing evolutionary variation similar to that of a query gene. Matches are made between the full set of phenotypes described for a gene and an evolutionary profile, the latter of which is defined as the set of phenotypes that are variable among the daughters of any node on the taxonomic tree. Phenoscape's semantic similarity interface allows the user to assess the statistical significance of each match and flags matches that may only result from differences in annotation coverage between genetic and evolutionary studies. Tools such as this will help meet the challenge of relating the growing volume of genetic knowledge in model organisms to the diversity of phenotypes in nature. The Phenoscape KB is available at http://kb.phenoscape.org.
Subject(s)
Databases, Genetic , Genetic Association Studies/methods , Animals , Biological Evolution , Computational Biology/methods , Humans , Knowledge Bases , PhenotypeABSTRACT
Evolutionary phenotypic transitions, such as the fin-to-limb transition in vertebrates, result from modifications in related proteins and their interactions, often in response to changing environment. Identifying these alterations in protein networks is crucial for a more comprehensive understanding of these transitions. However, previous research has not attempted to compare protein-protein interaction (PPI) networks associated with evolutionary transitions, and most experimental studies concentrate on a limited set of proteins. Therefore, the goal of this work was to develop a network-based platform for investigating the fin-to-limb transition using PPI networks. Quality-enhanced protein networks, constructed by integrating PPI networks with anatomy ontology data, were leveraged to compare protein modules for paired fins (pectoral fin and pelvic fin) of fishes (zebrafish) to those of the paired limbs (forelimb and hindlimb) of mammals (mouse). This also included prediction of novel protein candidates and their validation by enrichment and homology analyses. Hub proteins such as shh and bmp4, which are crucial for module stability, were identified, and their changing roles throughout the transition were examined. Proteins with preserved roles during the fin-to-limb transition were more likely to be hub proteins. This study also addressed hypotheses regarding the role of non-preserved proteins associated with the transition.
Subject(s)
Animal Fins , Perciformes , Animals , Mice , Animal Fins/anatomy & histology , Zebrafish/anatomy & histology , Protein Interaction Maps , Biological Evolution , Perciformes/physiology , Proteins , Extremities/physiology , MammalsABSTRACT
Omic BON is a thematic Biodiversity Observation Network under the Group on Earth Observations Biodiversity Observation Network (GEO BON), focused on coordinating the observation of biomolecules in organisms and the environment. Our founding partners include representatives from national, regional, and global observing systems; standards organizations; and data and sample management infrastructures. By coordinating observing strategies, methods, and data flows, Omic BON will facilitate the co-creation of a global omics meta-observatory to generate actionable knowledge. Here, we present key elements of Omic BON's founding charter and first activities.
Subject(s)
Biodiversity , KnowledgeABSTRACT
The rich knowledge of morphological variation among organisms reported in the systematic literature has remained in free-text format, impractical for use in large-scale synthetic phylogenetic work. This noncomputable format has also precluded linkage to the large knowledgebase of genomic, genetic, developmental, and phenotype data in model organism databases. We have undertaken an effort to prototype a curated, ontology-based evolutionary morphology database that maps to these genetic databases (http://kb.phenoscape.org) to facilitate investigation into the mechanistic basis and evolution of phenotypic diversity. Among the first requirements in establishing this database was the development of a multispecies anatomy ontology with the goal of capturing anatomical data in a systematic and computable manner. An ontology is a formal representation of a set of concepts with defined relationships between those concepts. Multispecies anatomy ontologies in particular are an efficient way to represent the diversity of morphological structures in a clade of organisms, but they present challenges in their development relative to single-species anatomy ontologies. Here, we describe the Teleost Anatomy Ontology (TAO), a multispecies anatomy ontology for teleost fishes derived from the Zebrafish Anatomical Ontology (ZFA) for the purpose of annotating varying morphological features across species. To facilitate interoperability with other anatomy ontologies, TAO uses the Common Anatomy Reference Ontology as a template for its upper level nodes, and TAO and ZFA are synchronized, with zebrafish terms specified as subtypes of teleost terms. We found that the details of ontology architecture have ramifications for querying, and we present general challenges in developing a multispecies anatomy ontology, including refinement of definitions, taxon-specific relationships among terms, and representation of taxonomically variable developmental pathways.
Subject(s)
Biological Evolution , Fishes/anatomy & histology , Fishes/genetics , Animals , Classification , Computational Biology , Databases, Factual , GenomicsABSTRACT
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
ABSTRACT
Synthesizing trait observations and knowledge across the Tree of Life remains a grand challenge for biodiversity science. Species traits are widely used in ecological and evolutionary science, and new data and methods have proliferated rapidly. Yet accessing and integrating disparate data sources remains a considerable challenge, slowing progress toward a global synthesis to integrate trait data across organisms. Trait science needs a vision for achieving global integration across all organisms. Here, we outline how the adoption of key Open Science principles-open data, open source and open methods-is transforming trait science, increasing transparency, democratizing access and accelerating global synthesis. To enhance widespread adoption of these principles, we introduce the Open Traits Network (OTN), a global, decentralized community welcoming all researchers and institutions pursuing the collaborative goal of standardizing and integrating trait data across organisms. We demonstrate how adherence to Open Science principles is key to the OTN community and outline five activities that can accelerate the synthesis of trait data across the Tree of Life, thereby facilitating rapid advances to address scientific inquiries and environmental issues. Lessons learned along the path to a global synthesis of trait data will provide a framework for addressing similarly complex data science and informatics challenges.
Subject(s)
Biodiversity , Ecology , Biological Evolution , Phenotype , ResearchABSTRACT
Natural language descriptions of organismal phenotypes, a principal object of study in biology, are abundant in the biological literature. Expressing these phenotypes as logical statements using ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make these phenotype descriptions amenable to machine reasoning. Natural language processing tools have been developed to facilitate this task, and the training and evaluation of these tools depend on the availability of high quality, manually annotated gold standard data sets. We describe the development of an expert-curated gold standard data set of annotated phenotypes for evolutionary biology. The gold standard was developed for the curation of complex comparative phenotypes for the Phenoscape project. It was created by consensus among three curators and consists of entity-quality expressions of varying complexity. We use the gold standard to evaluate annotations created by human curators and those generated by the Semantic CharaParser tool. Using four annotation accuracy metrics that can account for any level of relationship between terms from two phenotype annotations, we found that machine-human consistency, or similarity, was significantly lower than inter-curator (human-human) consistency. Surprisingly, allowing curatorsaccess to external information did not significantly increase the similarity of their annotations to the gold standard or have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the gold standard increased after new relevant ontology terms had been added. Evaluation by the original authors of the character descriptions indicated that the gold standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design software to augment human curators and the use of the gold standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.
Subject(s)
Data Curation/methods , Data Mining/methods , Gene Ontology , Natural Language Processing , Phenotype , HumansABSTRACT
Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.
Subject(s)
Databases, Factual , Quantitative Trait, Heritable , Animals , Phenotype , Phylogeny , VertebratesABSTRACT
BACKGROUND: In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. DEVELOPMENT AND TESTING OF THE ONTOLOGY: Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. RESULTS AND SIGNIFICANCE: Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.
Subject(s)
Databases as Topic , Pharyngeal Muscles/anatomy & histology , Animals , Humans , Oropharynx/anatomy & histology , Search EngineABSTRACT
The diverse phenotypes of living organisms have been described for centuries, and though they may be digitized, they are not readily available in a computable form. Using over 100 morphological studies, the Phenoscape project has demonstrated that by annotating characters with community ontology terms, links between novel species anatomy and the genes that may underlie them can be made. But given the enormity of the legacy literature, how can this largely unexploited wealth of descriptive data be rendered amenable to large-scale computation? To identify the bottlenecks, we quantified the time involved in the major aspects of phenotype curation as we annotated characters from the vertebrate phylogenetic systematics literature. This involves attaching fully computable logical expressions consisting of ontology terms to the descriptions in character-by-taxon matrices. The workflow consists of: (i) data preparation, (ii) phenotype annotation, (iii) ontology development and (iv) curation team discussions and software development feedback. Our results showed that the completion of this work required two person-years by a team of two post-docs, a lead data curator, and students. Manual data preparation required close to 13% of the effort. This part in particular could be reduced substantially with better community data practices, such as depositing fully populated matrices in public repositories. Phenotype annotation required â¼40% of the effort. We are working to make this more efficient with Natural Language Processing tools. Ontology development (40%), however, remains a highly manual task requiring domain (anatomical) expertise and use of specialized software. The large overhead required for data preparation and ontology development contributed to a low annotation rate of approximately two characters per hour, compared with 14 characters per hour when activity was restricted to character annotation. Unlocking the potential of the vast stores of morphological descriptions requires better tools for efficiently processing natural language, and better community practices towards a born-digital morphology. Database URL: http://kb.phenoscape.org
Subject(s)
Anatomy, Comparative , Biological Ontologies , Data Curation/methods , Data Mining/methods , Databases, Factual , Natural Language Processing , Animals , HumansABSTRACT
Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments.