RESUMO
UNLABELLED: The association of organisms to their environments is a key issue in exploring biodiversity patterns. This knowledge has traditionally been scattered, but textual descriptions of taxa and their habitats are now being consolidated in centralized resources. However, structured annotations are needed to facilitate large-scale analyses. Therefore, we developed ENVIRONMENTS, a fast dictionary-based tagger capable of identifying Environment Ontology (ENVO) terms in text. We evaluate the accuracy of the tagger on a new manually curated corpus of 600 Encyclopedia of Life (EOL) species pages. We use the tagger to associate taxa with environments by tagging EOL text content monthly, and integrate the results into the EOL to disseminate them to a broad audience of users. AVAILABILITY AND IMPLEMENTATION: The software and the corpus are available under the open-source BSD and the CC-BY-NC-SA 3.0 licenses, respectively, at http://environments.hcmr.gr.
Assuntos
Biodiversidade , Ontologias Biológicas , Software , Animais , Mineração de Dados/métodos , Ecossistema , InternetRESUMO
Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics.
Assuntos
Conhecimento , Semântica , BiodiversidadeRESUMO
MOTIVATION: Despite substantial efforts to develop and populate the back-ends of biological databases, front-ends to these systems often rely on taxonomic expertise. This research applies techniques from human-computer interaction research to the biodiversity domain. RESULTS: We developed an interactive node-link tool, TaxonTree, illustrating the value of a carefully designed interaction model, animation, and integrated searching and browsing towards retrieval of biological names and other information. Users tested the tool using a new, large integrated dataset of animal names with phylogenetic-based and classification-based tree structures. These techniques also translated well for a tool, DoubleTree, to allow comparison of trees using coupled interaction. Our approaches will be useful not only for biological data but as general portal interfaces.