RESUMEN
Premise: Species distribution models (SDMs) are widely utilized to guide conservation decisions. The complexity of available data and SDM methodologies necessitates considerations of how data are chosen and processed for modeling to enhance model accuracy and support biological interpretations and ecological applications. Methods: We built SDMs for the invasive aquatic plant European frog-bit using aggregated and field data that span multiple scales, data sources, and data types. We tested how model results were affected by five modeler decision points: the exclusion of (1) missing and (2) correlated data and the (3) scale (large-scale aggregated data or systematic field data), (4) source (specimens or observations), and (5) type (presence-background or presence-absence) of occurrence data. Results: Decisions about the exclusion of missing and correlated data, as well as the scale and type of occurrence data, significantly affected metrics of model performance. The source and type of occurrence data led to differences in the importance of specific explanatory variables as drivers of species distribution and predicted probability of suitable habitat. Discussion: Our findings relative to European frog-bit illustrate how specific data selection and processing decisions can influence the outcomes and interpretation of SDMs. Data-centric protocols that incorporate data exploration into model building can help ensure models are reproducible and can be accurately interpreted in light of biological questions.
RESUMEN
BACKGROUND: European frog-bit (Hydrocharismorsus-ranae L.; EFB) is a free-floating aquatic plant invasive in Canada, the United States and India. It is native to Europe and northern and western Asia and is believed to have first been introduced to North America in Ottawa, Ontario in 1932. It has since spread by way of the St. Lawrence River and connected waterways to southern Ontario and Quebec and parts of the northern United States. Invasive European frog-bit occurs in freshwater coastal wetlands and inland waters, where it can form dense mats that have the potential to limit recreational and commercial use of waterways, alter water chemistry and impact native species and ecosystems. Data on the past and present distribution of this invasive species provide geospatial information that can be used to infer the pattern of invasion and inform management and monitoring targeted at preventing secondary spread. Our EFB dataset contains 12,037 preserved specimen and observation-based occurrence records, including 9,994 presence records spanning two Canadian provinces and ten U.S. states and 2,043 absence records spanning five U.S. states. The aggregated EFB dataset provides a curated resource that has been used to guide a Michigan management strategy and provide information for ongoing efforts to develop invasion risk assessments, species distribution models and decision-support tools for conservation and management. NEW INFORMATION: Specimen-based and observation-based occurrence data were accessed through nine digital data repositories or aggregators and three primary sources. Twenty-six percent of the data are new records not previously published to a data repository or aggregator prior to this study. We removed duplicate data and excluded records with incorrect species identifications. Occurrence records without coordinates were georeferenced from recorded locality descriptions. Data were standardised according to Darwin Core. This aggregated dataset is the most complete account of EFB occurrence records in its North American invasive range.
RESUMEN
PREMISE: Heterogeneity of biodiversity data from the collections, research, and management communities presents challenges for data findability, accessibility, interoperability, and reusability. Workflows designed with data collection, standards, dissemination, and reuse in mind will generate better information across geopolitical, administrative, and institutional boundaries. Here, we present our data workflow as a case study of how we collected, shared, and used data from multiple sources. METHODS: In 2012, we initiated the collection of biodiversity data relating to Michigan prairie fens, including data on plant communities and the federally endangered Poweshiek skipperling (Oarisma poweshiek). RESULTS: Over 23,000 occurrence records were compiled in a database following Darwin Core standards. The records were linked with media and biological, chemical, and geometric measurements. We published the data as Global Biodiversity Information Facility data sets and in Symbiota SEINet portals. DISCUSSION: We highlight data collection techniques that optimized transcription time, including the use of predetermined and controlled vocabulary, Darwin Core terms, and data dictionaries. The validity and longevity of our data were supported by voucher specimens, metadata with measurement records, and published manuscripts detailing methods and data sets. Key to our data dissemination was cooperation among partners and the utilization of dynamic tools. To increase data interoperability, we need flexible and customizable data collection templates, coding, and enhanced communication among communities using biodiversity data.