Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse.

Soranno, Patricia A; Bissell, Edward G; Cheruvelil, Kendra S; Christel, Samuel T; Collins, Sarah M; Fergus, C Emi; Filstrup, Christopher T; Lapierre, Jean-Francois; Lottig, Noah R; Oliver, Samantha K; Scott, Caren E; Smith, Nicole J; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A; Gries, Corinna; Henry, Emily N; Skaff, Nick K; Stanley, Emily H; Stow, Craig A; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E

Soranno, Patricia A; Bissell, Edward G; Cheruvelil, Kendra S; Christel, Samuel T; Collins, Sarah M; Fergus, C Emi; Filstrup, Christopher T; Lapierre, Jean-Francois; Lottig, Noah R; Oliver, Samantha K; Scott, Caren E; Smith, Nicole J; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A; Gries, Corinna; Henry, Emily N; Skaff, Nick K; Stanley, Emily H; Stow, Craig A; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E.

Affiliation

Soranno PA; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Bissell EG; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Cheruvelil KS; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Christel ST; Center for Limnology, University of Wisconsin-Madison, Madison, WI 53706 USA.
Collins SM; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Fergus CE; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Filstrup CT; Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA.
Lapierre JF; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Lottig NR; Center for Limnology Trout Lake Station, University of Wisconsin-Madison, Boulder Junction, WI 54512 USA.
Oliver SK; Center for Limnology, University of Wisconsin-Madison, Madison, WI 53706 USA.
Scott CE; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Smith NJ; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Stopyak S; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Yuan S; School of Natural Sciences, Trinity College Dublin, Dublin, Ireland.
Bremigan MT; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Downing JA; Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA.
Gries C; Center for Limnology, University of Wisconsin-Madison, Madison, WI 53706 USA.
Henry EN; Oregon State University, Tillamook County, Tillamook, OR 97141 USA.
Skaff NK; Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824 USA.
Stanley EH; Center for Limnology, University of Wisconsin-Madison, Madison, WI 53706 USA.
Stow CA; NOAA Great Lakes Laboratory, Ann Arbor, MI 48108 USA.
Tan PN; Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA.
Wagner T; US Geological Survey, Pennsylvania Cooperative Fish and Wildlife Research Unit, Pennsylvania State University, University Park, PA 16802 USA.
Webster KE; School of Natural Sciences, Trinity College Dublin, Dublin, Ireland.

Gigascience ; 4: 28, 2015.

Article in En | MEDLINE | ID: mdl-26140212

ABSTRACT

Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.

Subject(s)

Database Management Systems; Ecology; Geographic Information Systems

Key words

Data harmonization; Data reuse; Data sharing; Database documentation; Ecoinformatics; Integrated database; LAGOS; Landscape limnology; Macrosystems ecology; Water quality

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Database Management Systems / Geographic Information Systems / Ecology Language: En Journal: Gigascience Year: 2015 Type: Article

Fulltext

XML

PubMed Links

Search on Google