ABSTRACT
There is an increased awareness of the importance of data publication, data sharing, and open science to support research, monitoring and control of vector-borne disease (VBD). Here we describe the efforts of the Global Biodiversity Information Facility (GBIF) as well as the World Health Special Programme on Research and Training in Diseases of Poverty (TDR) to promote publication of data related to vectors of diseases. In 2020, a GBIF task group of experts was formed to provide advice and support efforts aimed at enhancing the coverage and accessibility of data on vectors of human diseases within GBIF. Various strategies, such as organizing training courses and publishing data papers, were used to increase this content. This editorial introduces the outcome of a second call for data papers partnered by the TDR, GBIF and GigaScience Press in the journal GigaByte. Biodiversity and infectious diseases are linked in complex ways. These links can involve changes from the microorganism level to that of the habitat, and there are many ways in which these factors interact to affect human health. One way to tackle disease control and possibly elimination, is to provide stakeholders with access to a wide range of data shared under the FAIR principles, so it is possible to support early detection, analyses and evaluation, and to promote policy improvements and/or development.
ABSTRACT
The unprecedented generation of large volumes of biodiversity data is consistently contributing to a wide range of disciplines, including disease ecology. Emerging infectious diseases are usually zoonoses caused by multi-host pathogens. Therefore, their understanding may require the access to biodiversity data related to the ecology and the occurrence of the species involved. Nevertheless, despite several data-mobilization initiatives, the usage of biodiversity data for research into disease dynamics has not yet been fully leveraged. To explore current contribution, trends, and to identify limitations, we characterized biodiversity data usage in scientific publications related to human health, contrasting patterns of studies citing the Global Biodiversity Information Facility (GBIF) with those obtaining data from other sources. We found that the studies mainly obtained data from scientific literature and other not aggregated or standardized sources. Most of the studies explored pathogen species and, particularly those with GBIF-mediated data, tended to explore and reuse data of multiple species (>2). Data sources varied according to the taxa and epidemiological roles of the species involved. Biodiversity data repositories were mainly used for species related to hosts, reservoirs, and vectors, and barely used as a source of pathogens data, which was usually obtained from human and animal-health related institutions. While both GBIF- and not GBIF-mediated data studies explored similar diseases and topics, they presented discipline biases and different analytical approaches. Research on emerging infectious diseases may require the access to geographical and ecological data of multiple species. The One Health challenge requires interdisciplinary collaboration and data sharing, which is facilitated by aggregated repositories and platforms. The contribution of biodiversity data to understand infectious disease dynamics should be acknowledged, strengthened, and promoted.
ABSTRACT
The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user's standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user's perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.