Pesquisa | BVS CLAP/SMR-OPAS/OMS

Integrating and visualizing primary data from prospective and legacy taxonomic literature.

Miller, Jeremy A; Agosti, Donat; Penev, Lyubomir; Sautter, Guido; Georgiev, Teodor; Catapano, Terry; Patterson, David; King, David; Pereira, Serrano; Vos, Rutger Aldo; Sierra, Soraya.

Biodivers Data J ; (3): e5063, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26023286

RESUMO

Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions.

Enriched biodiversity data as a resource and service.

Vos, Rutger Aldo; Biserkov, Jordan Valkov; Balech, Bachir; Beard, Niall; Blissett, Matthew; Brenninkmeijer, Christian; van Dooren, Tom; Eades, David; Gosline, George; Groom, Quentin John; Hamann, Thomas D; Hettling, Hannes; Hoehndorf, Robert; Holleman, Ayco; Hovenkamp, Peter; Kelbert, Patricia; King, David; Kirkup, Don; Lammers, Youri; DeMeulemeester, Thibaut; Mietchen, Daniel; Miller, Jeremy A; Mounce, Ross; Nicolson, Nicola; Page, Rod; Pawlik, Aleksandra; Pereira, Serrano; Penev, Lyubomir; Richards, Kevin; Sautter, Guido; Shorthouse, David Peter; Tähtinen, Marko; Weiland, Claus; Williams, Alan R; Sierra, Soraya.

Biodivers Data J ; (2): e1125, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25057255

RESUMO

BACKGROUND: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source "data enrichment" workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. RESULTS: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. CONCLUSIONS: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA