RESUMO
Premise: Digitized biodiversity data offer extensive information; however, obtaining and processing biodiversity data can be daunting. Complexities arise during data cleaning, such as identifying and removing problematic records. To address these issues, we created the R package Geographic And Taxonomic Occurrence R-based Scrubbing (gatoRs). Methods and Results: The gatoRs workflow includes functions that streamline downloading records from the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio). We also created functions to clean downloaded specimen records. Unlike previous R packages, gatoRs accounts for differences in download structure between GBIF and iDigBio and allows for user control via interactive cleaning steps. Conclusions: Our pipeline enables the scientific community to process biodiversity data efficiently and is accessible to the R coding novice. We anticipate that gatoRs will be useful for both established and beginning users. Furthermore, we expect our package will facilitate the introduction of biodiversity-related concepts into the classroom via the use of herbarium specimens.
RESUMO
BACKGROUND: Morphological leaf traits are frequently used to quantify, understand and predict plant and vegetation functional diversity and ecology, including environmental and climate change responses. Although morphological leaf traits are easy to measure, their coverage for characterising variation within species and across temporal scales is limited. At the same time, there are about 3100 herbaria worldwide, containing approximately 390 million plant specimens dating from the 16th to 21st century, which can potentially be used to extract morphological leaf traits. Globally, plant specimens are rapidly being digitised and images are made openly available via various biodiversity data platforms, such as iDigBio and GBIF. Based on a pilot study to identify the availability and appropriateness of herbarium specimen images for comprehensive trait data extraction, we developed a spatio-temporal dataset on intraspecific trait variability containing 128,036 morphological leaf trait measurements for seven selected species. NEW INFORMATION: After scrutinising the metadata of digitised herbarium specimen images available from iDigBio and GBIF (21.9 million and 31.6 million images for Tracheophyta; accessed date December 2020), we identified approximately 10 million images potentially appropriate for our study. From the 10 million images, we selected seven species (Salix bebbiana Sarg., Alnus incana (L.) Moench, Viola canina L., Salix glauca L., Chenopodium album L., Impatiens capensis Meerb. and Solanum dulcamara L.) , which have a simple leaf shape, are well represented in space and time and have high availability of specimens per species. We downloaded 17,383 images. Out of these, we discarded 5779 images due to quality issues. We used the remaining 11,604 images to measure the area, length, width and perimeter on 32,009 individual leaf blades using the semi-automated tool TraitEx. The resulting dataset contains 128,036 trait records.We demonstrate its comparability to trait data measured in natural environments following standard protocols by comparing trait values from the TRY database. We conclude that the herbarium specimens provide valuable information on leaf sizes. The dataset created in our study, by extracting leaf traits from the digitised herbarium specimen images of seven selected species, is a promising opportunity to improve ecological knowledge about the adaptation of size-related leaf traits to environmental changes in space and time.
RESUMO
The first two decades of the twenty-first century have seen a rapid rise in the mobilization of digital biodiversity data. This has thrust natural history museums into the forefront of biodiversity research, underscoring their central role in the modern scientific enterprise. The advent of mobilization initiatives such as the United States National Science Foundation's Advancing Digitization of Biodiversity Collections (ADBC), Australia's Atlas of Living Australia (ALA), Mexico's National Commission for the Knowledge and Use of Biodiversity (CONABIO), Brazil's Centro de Referência em Informação (CRIA) and China's National Specimen Information Infrastructure (NSII) has led to a rapid rise in data aggregators and an exponential increase in digital data for scientific research and arguably provide the best evidence of where species live. The international Global Biodiversity Information Facility (GBIF) now serves about 131 million museum specimen records, and Integrated Digitized Biocollections (iDigBio) in the USA has amassed more than 115 million. These resources expose collections to a wider audience of researchers, provide the best biodiversity data in the modern era outside of nature itself and ensure the primacy of specimen-based research. Here, we provide a brief history of worldwide data mobilization, their impact on biodiversity research, challenges for ensuring data quality, their contribution to scientific publications and evidence of the rising profiles of natural history collections.This article is part of the theme issue 'Biological collections for understanding biodiversity in the Anthropocene'.
Assuntos
Biodiversidade , Museus , Manejo de Espécimes/métodosRESUMO
Large-scale analysis of the fossil record requires aggregation of palaeontological data from individual fossil localities. Prior to computers, these synoptic datasets were compiled by hand, a laborious undertaking that took years of effort and forced palaeontologists to make difficult choices about what types of data to tabulate. The advent of desktop computers ushered in palaeontology's first digital revolution-online literature-based databases, such as the Paleobiology Database (PBDB). However, the published literature represents only a small proportion of the palaeontological data housed in museum collections. Although this issue has long been appreciated, the magnitude, and thus potential significance, of these so-called 'dark data' has been difficult to determine. Here, in the early phases of a second digital revolution in palaeontology--the digitization of museum collections-we provide an estimate of the magnitude of palaeontology's dark data. Digitization of our nine institutions' holdings of Cenozoic marine invertebrate collections from California, Oregon and Washington in the USA reveals that they represent 23 times the number of unique localities than are currently available in the PBDB. These data, and the vast quantity of similarly untapped dark data in other museum collections, will, when digitally mobilized, enhance palaeontologists' ability to make inferences about the patterns and processes of past evolutionary and ecological changes.
Assuntos
Bases de Dados Factuais/estatística & dados numéricos , Fósseis , Invertebrados , Animais , California , Museus/estatística & dados numéricos , Oregon , Paleontologia/métodos , WashingtonRESUMO
This paper describes and illustrates five major clusters of related tasks (herein referred to as task clusters) that are common to efficient and effective practices in the digitization of biological specimen data and media. Examples of these clusters come from the observation of diverse digitization processes. The staff of iDigBio (The U.S. National Science Foundation's National Resource for Advancing Digitization of Biological Collections) visited active biological and paleontological collections digitization programs for the purpose of documenting and assessing current digitization practices and tools. These observations identified five task clusters that comprise the digitization process leading up to data publication: (1) pre-digitization curation and staging, (2) specimen image capture, (3) specimen image processing, (4) electronic data capture, and (5) georeferencing locality descriptions. While not all institutions are completing each of these task clusters for each specimen, these clusters describe a composite picture of digitization of biological and paleontological specimens across the programs that were observed. We describe these clusters, three workflow patterns that dominate the implemention of these clusters, and offer a set of workflow recommendations for digitization programs.