Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
PeerJ ; 12: e16972, 2024.
Article in English | MEDLINE | ID: mdl-38495753

ABSTRACT

The article presents results of using remote sensing images and machine learning to map and assess land potential based on time-series of potential Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) composites. Land potential here refers to the potential vegetation productivity in the hypothetical absence of short-term anthropogenic influence, such as intensive agriculture and urbanization. Knowledge on this ecological land potential could support the assessment of levels of land degradation as well as restoration potentials. Monthly aggregated FAPAR time-series of three percentiles (0.05, 0.50 and 0.95 probability) at 250 m spatial resolution were derived from the 8-day GLASS FAPAR V6 product for 2000-2021 and used to determine long-term trends in FAPAR, as well as to model potential FAPAR in the absence of human pressure. CCa 3 million training points sampled from 12,500 locations across the globe were overlaid with 68 bio-physical variables representing climate, terrain, landform, and vegetation cover, as well as several variables representing human pressure including: population count, cropland intensity, nightlights and a human footprint index. The training points were used in an ensemble machine learning model that stacks three base learners (extremely randomized trees, gradient descended trees and artificial neural network) using a linear regressor as meta-learner. The potential FAPAR was then projected by removing the impact of urbanization and intensive agriculture in the covariate layers. The results of strict cross-validation show that the global distribution of FAPAR can be explained with an R2 of 0.89, with the most important covariates being growing season length, forest cover indicator and annual precipitation. From this model, a global map of potential monthly FAPAR for the recent year (2021) was produced, and used to predict gaps in actual vs. potential FAPAR. The produced global maps of actual vs. potential FAPAR and long-term trends were each spatially matched with stable and transitional land cover classes. The assessment showed large negative FAPAR gaps (actual lower than potential) for classes: urban, needle-leave deciduous trees, and flooded shrub or herbaceous cover, while strong negative FAPAR trends were found for classes: urban, sparse vegetation and rainfed cropland. On the other hand, classes: irrigated or post-flooded cropland, tree cover mixed leaf type, and broad-leave deciduous showed largely positive trends. The framework allows land managers to assess potential land degradation from two aspects: as an actual declining trend in observed FAPAR and as a difference between actual and potential vegetation FAPAR.


Subject(s)
Climate , Forests , Humans , Agriculture , Seasons
2.
PLoS One ; 19(2): e0297439, 2024.
Article in English | MEDLINE | ID: mdl-38306349

ABSTRACT

The impacts of the Anthropocene on climate and biodiversity pose societal and ecological problems that may only be solved by ecosystem restoration. Local to regional actions are required, which need to consider the prevailing present and future conditions of a certain landscape extent. Modeling approaches can be of help to support management efforts and to provide advice to policy making. We present stage one of the LaForeT-PLUC-BE model (Landscape Forestry in the Tropics-PCRaster Land Use Change-Biogeographic & Economic model; in short: LPB) and its thematic expansion module RAP (Restoration Areas Potentials). LPB-RAP is a high-resolution pixel-based scenario tool that relies on a range of explicit land use types (LUTs) to describe various forest types and the environment. It simulates and analyzes future landscape configurations under consideration of climate, population and land use change long-term. Simulated Land Use Land Cover Change (LULCC) builds on dynamic, probabilistic modeling incorporating climatic and anthropogenic determinants as well as restriction parameters to depict a sub-national regional smallholder-dominated forest landscape. The model delivers results for contrasting scenario settings by simulating without and with potential Forest and Landscape Restoration (FLR) measures. FLR potentials are depicted by up to five RAP-LUTs. The model builds on user-defined scenario inputs, such as the Shared Socioeconomic Pathways (SSP) and Representative Concentration Pathways (RCP). Model application is here exemplified for the SSP2-RCP4.5 scenario in the time frame 2018-2100 on the hectare scale in annual resolution using Esmeraldas province, Ecuador, as a case study area. The LPB-RAP model is a novel, heuristic Spatial Decision Support System (SDSS) tool for smallholder-dominated forest landscapes, supporting near-time top-down planning measures with long-term bottom-up modeling. Its application should be followed up by FLR on-site investigations and stakeholder participation across all involved scales.


Subject(s)
Conservation of Natural Resources , Ecosystem , Conservation of Natural Resources/methods , Forests , Biodiversity , Forestry/methods
3.
Parasit Vectors ; 17(1): 29, 2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38254168

ABSTRACT

BACKGROUND: Ticks are an important driver of veterinary health care, causing irritation and sometimes infection to their hosts. We explored epidemiological and geo-referenced data from > 7 million electronic health records (EHRs) from cats and dogs collected by the Small Animal Veterinary Surveillance Network (SAVSNET) in Great Britain (GB) between 2014 and 2021 to assess the factors affecting tick attachment in an individual and at a spatiotemporal level. METHODS: EHRs in which ticks were mentioned were identified by text mining; domain experts confirmed those with ticks on the animal. Tick presence/absence records were overlaid with a spatiotemporal series of climate, environment, anthropogenic and host distribution factors to produce a spatiotemporal regression matrix. An ensemble machine learning spatiotemporal model was used to fine-tune hyperparameters for Random Forest, Gradient-boosted Trees and Generalized Linear Model regression algorithms, which were then used to produce a final ensemble meta-learner to predict the probability of tick attachment across GB at a monthly interval and averaged long-term through 2014-2021 at a spatial resolution of 1 km. Individual host factors associated with tick attachment were also assessed by conditional logistic regression on a matched case-control dataset. RESULTS: In total, 11,741 consultations were identified in which a tick was recorded. The frequency of tick records was low (0.16% EHRs), suggesting an underestimation of risk. That said, increased odds for tick attachment in cats and dogs were associated with younger adult ages, longer coat length, crossbreeds and unclassified breeds. In cats, males and entire animals had significantly increased odds of recorded tick attachment. The key variables controlling the spatiotemporal risk for tick attachment were climatic (precipitation and temperature) and vegetation type (Enhanced Vegetation Index). Suitable areas for tick attachment were predicted across GB, especially in forests and grassland areas, mainly during summer, particularly in June. CONCLUSIONS: Our results can inform targeted health messages to owners and veterinary practitioners, identifying those animals, seasons and areas of higher risk for tick attachment and allowing for more tailored prophylaxis to reduce tick burden, inappropriate parasiticide treatment and potentially TBDs in companion animals and humans. Sentinel networks like SAVSNET represent a novel complementary data source to improve our understanding of tick attachment risk for companion animals and as a proxy of risk to humans.


Subject(s)
Algorithms , Pets , Adult , Humans , Male , Cats , Animals , Dogs , Female , United Kingdom/epidemiology , Risk Factors , Spatio-Temporal Analysis
4.
Data Brief ; 50: 109621, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37823063

ABSTRACT

This dataset presents global soil organic carbon stocks in mangrove forests at 30 m resolution, predicted for 2020. We used spatiotemporal ensemble machine learning to produce predictions of soil organic carbon content and bulk density (BD) to 1 m soil depth, which were then aggregated to calculate soil organic carbon stocks. This was done by using training data points of both SOC (%) and BD in mangroves from a global dataset and from recently published studies, and globally consistent predictive covariate layers. A total of 10,331 soil samples were validated to have SOC (%) measurements and were used for predictive soil mapping. We used time-series remote sensing data specific to time periods when the training data were sampled, as well as long-term (static) layers to train an ensemble of machine learning model. Ensemble models were used to improve performance, robustness and unbiasedness as opposed to just using one learner. In addition, we performed spatial cross-validation by using spatial blocking of training data points to assess model performance. We predicted SOC stocks for the 2020 time period and applied them to a 2020 mangrove extent map, presenting both mean predictions and prediction intervals to represent the uncertainty around our predictions. Predictions are available for download under CC-BY license from 10.5281/zenodo.7729491 and also as Cloud-Optimized GeoTIFFs (global mosaics).

5.
Data Brief ; 50: 109482, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37636128

ABSTRACT

Here, we present and release the Global Rainfall Erosivity Database (GloREDa), a multi-source platform containing rainfall erosivity values for almost 4000 stations globally. The database was compiled through a global collaboration between a network of researchers, meteorological services and environmental organisations from 65 countries. GloREDa is the first open access database of rainfall erosivity (R-factor) based on hourly and sub-hourly rainfall records at a global scale. This database is now stored and accessible for download in the long-term European Soil Data Centre (ESDAC) repository of the European Commission's Joint Research Centre. This will ensure the further development of the database with insertions of new records, maintenance of the data and provision of a helpdesk. In addition to the annual erosivity data, this release also includes the mean monthly erosivity data for 94% of the GloREDa stations. Based on these mean monthly R-factor values, we predict the global monthly erosivity datasets at 1 km resolution using the ensemble machine learning approach (ML) as implemented in the mlr package for R. The produced monthly raster data (GeoTIFF format) may be useful for soil erosion prediction modelling, sediment distribution analysis, climate change predictions, flood, and natural disaster assessments and can be valuable inputs for Land and Earth Systems modelling.

6.
PeerJ ; 11: e15478, 2023.
Article in English | MEDLINE | ID: mdl-37304863

ABSTRACT

The article describes the production steps and accuracy assessment of an analysis-ready, open-access European data cube consisting of 2000-2020+ Landsat data, 2017-2021+ Sentinel-2 data and a 30 m resolution digital terrain model (DTM). The main purpose of the data cube is to make annual continental-scale spatiotemporal machine learning tasks accessible to a wider user base by providing a spatially and temporally consistent multidimensional feature space. This has required systematic spatiotemporal harmonization, efficient compression, and imputation of missing values. Sentinel-2 and Landsat reflectance values were aggregated into four quarterly averages approximating the four seasons common in Europe (winter, spring, summer and autumn), as well as the 25th and 75th percentile, in order to retain intra-seasonal variance. Remaining missing data in the Landsat time-series was imputed with a temporal moving window median (TMWM) approach. An accuracy assessment shows TMWM performs relatively better in Southern Europe and lower in mountainous regions such as the Scandinavian Mountains, the Alps, and the Pyrenees. We quantify the usability of the different component data sets for spatiotemporal machine learning tasks with a series of land cover classification experiments, which show that models utilizing the full feature space (30 m DTM, 30 m Landsat, 30 m and 10 m Sentinel-2) yield the highest land cover classification accuracy, with different data sets improving the results for different land cover classes. The data sets presented in the article are part of the EcoDataCube platform, which also hosts open vegetation, soil, and land use/land cover (LULC) maps created. All data sets are available under CC-BY license as Cloud-Optimized GeoTIFFs (ca. 12 TB in size) through SpatioTemporal Asset Catalog (STAC) and the EcoDataCube data portal.


Subject(s)
Autoimmune Lymphoproliferative Syndrome , Data Compression , Humans , Europe , Seasons , Climate
7.
PeerJ ; 11: e15593, 2023.
Article in English | MEDLINE | ID: mdl-37377791

ABSTRACT

The global potential distribution of biomes (natural vegetation) was modelled using 8,959 training points from the BIOME 6000 dataset and a stack of 72 environmental covariates representing terrain and the current climatic conditions based on historical long term averages (1979-2013). An ensemble machine learning model based on stacked regularization was used, with multinomial logistic regression as the meta-learner and spatial blocking (100 km) to deal with spatial autocorrelation of the training points. Results of spatial cross-validation for the BIOME 6000 classes show an overall accuracy of 0.67 and R2logloss of 0.61, with "tropical evergreen broadleaf forest" being the class with highest gain in predictive performances (R2logloss = 0.74) and "prostrate dwarf shrub tundra" the class with the lowest (R2logloss = -0.09) compared to the baseline. Temperature-related covariates were the most important predictors, with the mean diurnal range (BIO2) being shared by all the base-learners (i.e.,random forest, gradient boosted trees and generalized linear models). The model was next used to predict the distribution of future biomes for the periods 2040-2060 and 2061-2080 under three climate change scenarios (RCP 2.6, 4.5 and 8.5). Comparisons of predictions for the three epochs (present, 2040-2060 and 2061-2080) show that increasing aridity and higher temperatures will likely result in significant shifts in natural vegetation in the tropical area (shifts from tropical forests to savannas up to 1.7 ×105 km2 by 2080) and around the Arctic Circle (shifts from tundra to boreal forests up to 2.4 ×105 km2 by 2080). Projected global maps at 1 km spatial resolution are provided as probability and hard classes maps for BIOME 6000 classes and as hard classes maps for the IUCN classes (six aggregated classes). Uncertainty maps (prediction error) are also provided and should be used for careful interpretation of the future projections.


Subject(s)
Climate Change , Ecosystem , Temperature , Logistic Models , Arctic Regions
8.
PeerJ ; 10: e13728, 2022.
Article in English | MEDLINE | ID: mdl-35910765

ABSTRACT

This article describes a data-driven framework based on spatiotemporal machine learning to produce distribution maps for 16 tree species (Abies alba Mill., Castanea sativa Mill., Corylus avellana L., Fagus sylvatica L., Olea europaea L., Picea abies L. H. Karst., Pinus halepensis Mill., Pinus nigra J. F. Arnold, Pinus pinea L., Pinus sylvestris L., Prunus avium L., Quercus cerris L., Quercus ilex L., Quercus robur L., Quercus suber L. and Salix caprea L.) at high spatial resolution (30 m). Tree occurrence data for a total of three million of points was used to train different algorithms: random forest, gradient-boosted trees, generalized linear models, k-nearest neighbors, CART and an artificial neural network. A stack of 305 coarse and high resolution covariates representing spectral reflectance, different biophysical conditions and biotic competition was used as predictors for realized distributions, while potential distribution was modelled with environmental predictors only. Logloss and computing time were used to select the three best algorithms to tune and train an ensemble model based on stacking with a logistic regressor as a meta-learner. An ensemble model was trained for each species: probability and model uncertainty maps of realized distribution were produced for each species using a time window of 4 years for a total of six distribution maps per species, while for potential distributions only one map per species was produced. Results of spatial cross validation show that the ensemble model consistently outperformed or performed as good as the best individual model in both potential and realized distribution tasks, with potential distribution models achieving higher predictive performances (TSS = 0.898, R2 logloss = 0.857) than realized distribution ones on average (TSS = 0.874, R2 logloss = 0.839). Ensemble models for Q. suber achieved the best performances in both potential (TSS = 0.968, R2 logloss = 0.952) and realized (TSS = 0.959, R2 logloss = 0.949) distribution, while P. sylvestris (TSS = 0.731, 0.785, R2 logloss = 0.585, 0.670, respectively, for potential and realized distribution) and P. nigra (TSS = 0.658, 0.686, R2 logloss = 0.623, 0.664) achieved the worst. Importance of predictor variables differed across species and models, with the green band for summer and the Normalized Difference Vegetation Index (NDVI) for fall for realized distribution and the diffuse irradiation and precipitation of the driest quarter (BIO17) being the most frequent and important for potential distribution. On average, fine-resolution models outperformed coarse resolution models (250 m) for realized distribution (TSS = +6.5%, R2 logloss = +7.5%). The framework shows how combining continuous and consistent Earth Observation time series data with state of the art machine learning can be used to derive dynamic distribution maps. The produced predictions can be used to quantify temporal trends of potential forest degradation and species composition change.


Subject(s)
Abies , Fagus , Pinus , Quercus , Europe
9.
PeerJ ; 10: e13573, 2022.
Article in English | MEDLINE | ID: mdl-35891647

ABSTRACT

A spatiotemporal machine learning framework for automated prediction and analysis of long-term Land Use/Land Cover dynamics is presented. The framework includes: (1) harmonization and preprocessing of spatial and spatiotemporal input datasets (GLAD Landsat, NPP/VIIRS) including five million harmonized LUCAS and CORINE Land Cover-derived training samples, (2) model building based on spatial k-fold cross-validation and hyper-parameter optimization, (3) prediction of the most probable class, class probabilities and model variance of predicted probabilities per pixel, (4) LULC change analysis on time-series of produced maps. The spatiotemporal ensemble model consists of a random forest, gradient boosted tree classifier, and an artificial neural network, with a logistic regressor as meta-learner. The results show that the most important variables for mapping LULC in Europe are: seasonal aggregates of Landsat green and near-infrared bands, multiple Landsat-derived spectral indices, long-term surface water probability, and elevation. Spatial cross-validation of the model indicates consistent performance across multiple years with overall accuracy (a weighted F1-score) of 0.49, 0.63, and 0.83 when predicting 43 (level-3), 14 (level-2), and five classes (level-1). Additional experiments show that spatiotemporal models generalize better to unknown years, outperforming single-year models on known-year classification by 2.7% and unknown-year classification by 3.5%. Results of the accuracy assessment using 48,365 independent test samples shows 87% match with the validation points. Results of time-series analysis (time-series of LULC probabilities and NDVI images) suggest forest loss in large parts of Sweden, the Alps, and Scotland. Positive and negative trends in NDVI in general match the land degradation and land restoration classes, with "urbanization" showing the most negative NDVI trend. An advantage of using spatiotemporal ML is that the fitted model can be used to predict LULC in years that were not included in its training dataset, allowing generalization to past and future periods, e.g. to predict LULC for years prior to 2000 and beyond 2020. The generated LULC time-series data stack (ODSE-LULC), including the training points, is publicly available via the ODSE Viewer. Functions used to prepare data and run modeling are available via the eumap library for Python.


Subject(s)
Environmental Monitoring , Urbanization , Probability , Europe , Time Factors
10.
Sci Data ; 9(1): 444, 2022 07 25.
Article in English | MEDLINE | ID: mdl-35879368

ABSTRACT

The representation of land surface processes in hydrological and climatic models critically depends on the soil water characteristics curve (SWCC) that defines the plant availability and water storage in the vadose zone. Despite the availability of SWCC datasets in the literature, significant efforts are required to harmonize reported data before SWCC parameters can be determined and implemented in modeling applications. In this work, a total of 15,259 SWCCs from 2,702 sites were assembled from published literature, harmonized, and quality-checked. The assembled SWCC data provide a global soil hydraulic properties (GSHP) database. Parameters of the van Genuchten (vG) SWCC model were estimated from the data using the R package 'soilhypfit'. In many cases, information on the wet- or dry-end of the SWCC measurements were missing, and we used pedotransfer functions (PTFs) to estimate saturated and residual water contents. The new database quantifies the differences of SWCCs across climatic regions and can be used to create global maps of soil hydraulic properties.

11.
Sci Rep ; 11(1): 6130, 2021 03 17.
Article in English | MEDLINE | ID: mdl-33731749

ABSTRACT

Soil property and class maps for the continent of Africa were so far only available at very generalised scales, with many countries not mapped at all. Thanks to an increasing quantity and availability of soil samples collected at field point locations by various government and/or NGO funded projects, it is now possible to produce detailed pan-African maps of soil nutrients, including micro-nutrients at fine spatial resolutions. In this paper we describe production of a 30 m resolution Soil Information System of the African continent using, to date, the most comprehensive compilation of soil samples ([Formula: see text]) and Earth Observation data. We produced predictions for soil pH, organic carbon (C) and total nitrogen (N), total carbon, effective Cation Exchange Capacity (eCEC), extractable-phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg), sulfur (S), sodium (Na), iron (Fe), zinc (Zn)-silt, clay and sand, stone content, bulk density and depth to bedrock, at three depths (0, 20 and 50 cm) and using 2-scale 3D Ensemble Machine Learning framework implemented in the mlr (Machine Learning in R) package. As covariate layers we used 250 m resolution (MODIS, PROBA-V and SM2RAIN products), and 30 m resolution (Sentinel-2, Landsat and DTM derivatives) images. Our fivefold spatial Cross-Validation results showed varying accuracy levels ranging from the best performing soil pH (CCC = 0.900) to more poorly predictable extractable phosphorus (CCC = 0.654) and sulphur (CCC = 0.708) and depth to bedrock. Sentinel-2 bands SWIR (B11, B12), NIR (B09, B8A), Landsat SWIR bands, and vertical depth derived from 30 m resolution DTM, were the overall most important 30 m resolution covariates. Climatic data images-SM2RAIN, bioclimatic variables and MODIS Land Surface Temperature-however, remained as the overall most important variables for predicting soil chemical variables at continental scale. This publicly available 30-m Soil Information System of Africa aims at supporting numerous applications, including soil and fertilizer policies and investments, agronomic advice to close yield gaps, environmental programs, or targeting of nutrition interventions.

12.
Nat Commun ; 11(1): 522, 2020 Jan 27.
Article in English | MEDLINE | ID: mdl-31988306

ABSTRACT

Most soil hydraulic information used in Earth System Models (ESMs) is derived from pedo-transfer functions that use easy-to-measure soil attributes to estimate hydraulic parameters. This parameterization relies heavily on soil texture, but overlooks the critical role of soil structure originated by soil biophysical activity. Soil structure omission is pervasive also in sampling and measurement methods used to train pedotransfer functions. Here we show how systematic inclusion of salient soil structural features of biophysical origin affect local and global hydrologic and climatic responses. Locally, including soil structure in models significantly alters infiltration-runoff partitioning and recharge in wet and vegetated regions. Globally, the coarse spatial resolution of ESMs and their inability to simulate intense and short rainfall events mask effects of soil structure on surface fluxes and climate. Results suggest that although soil structure affects local hydrologic response, its implications on global-scale climate remains elusive in current ESMs.

13.
PeerJ ; 6: e5518, 2018.
Article in English | MEDLINE | ID: mdl-30186691

ABSTRACT

Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using fivefold cross-validation with refitting. The results show that RFsp can obtain equally accurate and unbiased predictions as different versions of kriging. Advantages of using RFsp over kriging are that it needs no rigid statistical assumptions about the distribution and stationarity of the target variable, it is more flexible towards incorporating, combining and extending covariates of different types, and it possibly yields more informative maps characterizing the prediction error. RFsp appears to be especially attractive for building multivariate spatial prediction models that can be used as "knowledge engines" in various geoscience fields. Some disadvantages of RFsp are the exponentially growing computational intensity with increase of calibration data and covariates and the high sensitivity of predictions to input data quality. The key to the success of the RFsp framework might be the training data quality-especially quality of spatial sampling (to minimize extrapolation problems and any type of bias in data), and quality of model validation (to ensure that accuracy is not effected by overfitting). For many data sets, especially those with lower number of points and covariates and close-to-linear relationships, model-based geostatistics can still lead to more accurate predictions than RFsp.

14.
PeerJ ; 6: e5457, 2018.
Article in English | MEDLINE | ID: mdl-30155360

ABSTRACT

Potential natural vegetation (PNV) is the vegetation cover in equilibrium with climate, that would exist at a given location if not impacted by human activities. PNV is useful for raising public awareness about land degradation and for estimating land potential. This paper presents results of assessing machine learning algorithms-neural networks (nnet package), random forest (ranger), gradient boosting (gbm), K-nearest neighborhood (class) and Cubist-for operational mapping of PNV. Three case studies were considered: (1) global distribution of biomes based on the BIOME 6000 data set (8,057 modern pollen-based site reconstructions), (2) distribution of forest tree taxa in Europe based on detailed occurrence records (1,546,435 ground observations), and (3) global monthly fraction of absorbed photosynthetically active radiation (FAPAR) values (30,301 randomly-sampled points). A stack of 160 global maps representing biophysical conditions over land, including atmospheric, climatic, relief, and lithologic variables, were used as explanatory variables. The overall results indicate that random forest gives the overall best performance. The highest accuracy for predicting BIOME 6000 classes (20) was estimated to be between 33% (with spatial cross-validation) and 68% (simple random sub-setting), with the most important predictors being total annual precipitation, monthly temperatures, and bioclimatic layers. Predicting forest tree species (73) resulted in mapping accuracy of 25%, with the most important predictors being monthly cloud fraction, mean annual and monthly temperatures, and elevation. Regression models for FAPAR (monthly images) gave an R-square of 90% with the most important predictors being total annual precipitation, monthly cloud fraction, CHELSA bioclimatic layers, and month of the year, respectively. Further developments of PNV mapping could include using all GBIF records to map the global distribution of plant species at different taxonomic levels. This methodology could also be extended to dynamic modeling of PNV, so that future climate scenarios can be incorporated. Global maps of biomes, FAPAR and tree species at one km spatial resolution are available for download via http://dx.doi.org/10.7910/DVN/QQHCIK.

15.
Proc Natl Acad Sci U S A ; 114(36): 9575-9580, 2017 09 05.
Article in English | MEDLINE | ID: mdl-28827323

ABSTRACT

Human appropriation of land for agriculture has greatly altered the terrestrial carbon balance, creating a large but uncertain carbon debt in soils. Estimating the size and spatial distribution of soil organic carbon (SOC) loss due to land use and land cover change has been difficult but is a critical step in understanding whether SOC sequestration can be an effective climate mitigation strategy. In this study, a machine learning-based model was fitted using a global compilation of SOC data and the History Database of the Global Environment (HYDE) land use data in combination with climatic, landform and lithology covariates. Model results compared favorably with a global compilation of paired plot studies. Projection of this model onto a world without agriculture indicated a global carbon debt due to agriculture of 133 Pg C for the top 2 m of soil, with the rate of loss increasing dramatically in the past 200 years. The HYDE classes "grazing" and "cropland" contributed nearly equally to the loss of SOC. There were higher percent SOC losses on cropland but since more than twice as much land is grazed, slightly higher total losses were found from grazing land. Important spatial patterns of SOC loss were found: Hotspots of SOC loss coincided with some major cropping regions as well as semiarid grazing regions, while other major agricultural zones showed small losses and even net gains in SOC. This analysis has demonstrated that there are identifiable regions which can be targeted for SOC restoration efforts.


Subject(s)
Carbon Sequestration , Soil/chemistry , Agriculture/history , Databases, Factual , History, 15th Century , History, 16th Century , History, 17th Century , History, 18th Century , History, 19th Century , History, 20th Century , History, 21st Century , History, Ancient , History, Medieval , Humans , Machine Learning , Natural Resources
16.
PLoS One ; 12(2): e0169748, 2017.
Article in English | MEDLINE | ID: mdl-28207752

ABSTRACT

This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods-random forest and gradient boosting and/or multinomial logistic regression-as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10-fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.


Subject(s)
Environmental Monitoring , Geographic Information Systems , Machine Learning , Models, Theoretical , Soil/chemistry , Algorithms , Conservation of Natural Resources , Humans
17.
GeoResJ ; 14(9): 1-19, 2017 Dec.
Article in English | MEDLINE | ID: mdl-32864337

ABSTRACT

Legacy soil data have been produced over 70 years in nearly all countries of the world. Unfortunately, data, information and knowledge are still currently fragmented and at risk of getting lost if they remain in a paper format. To process this legacy data into consistent, spatially explicit and continuous global soil information, data are being rescued and compiled into databases. Thousands of soil survey reports and maps have been scanned and made available online. The soil profile data reported by these data sources have been captured and compiled into databases. The total number of soil profiles rescued in the selected countries is about 800,000. Currently, data for 117, 000 profiles are compiled and harmonized according to GlobalSoilMap specifications in a world level database (WoSIS). The results presented at the country level are likely to be an underestimate. The majority of soil data is still not rescued and this effort should be pursued. The data have been used to produce soil property maps. We discuss the pro and cons of top-down and bottom-up approaches to produce such maps and we stress their complementarity. We give examples of success stories. The first global soil property maps using rescued data were produced by a top-down approach and were released at a limited resolution of 1km in 2014, followed by an update at a resolution of 250m in 2017. By the end of 2020, we aim to deliver the first worldwide product that fully meets the GlobalSoilMap specifications.

18.
Nutr Cycl Agroecosyst ; 109(1): 77-102, 2017 Aug 02.
Article in English | MEDLINE | ID: mdl-33456317

ABSTRACT

Spatial predictions of soil macro and micro-nutrient content across Sub-Saharan Africa at 250 m spatial resolution and for 0-30 cm depth interval are presented. Predictions were produced for 15 target nutrients: organic carbon (C) and total (organic) nitrogen (N), total phosphorus (P), and extractable-phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg), sulfur (S), sodium (Na), iron (Fe), manganese (Mn), zinc (Zn), copper (Cu), aluminum (Al) and boron (B). Model training was performed using soil samples from ca. 59,000 locations (a compilation of soil samples from the AfSIS, EthioSIS, One Acre Fund, VitalSigns and legacy soil data) and an extensive stack of remote sensing covariates in addition to landform, lithologic and land cover maps. An ensemble model was then created for each nutrient from two machine learning algorithms- random forest and gradient boosting, as implemented in R packages ranger and xgboost-and then used to generate predictions in a fully-optimized computing system. Cross-validation revealed that apart from S, P and B, significant models can be produced for most targeted nutrients (R-square between 40-85%). Further comparison with OFRA field trial database shows that soil nutrients are indeed critical for agricultural development, with Mn, Zn, Al, B and Na, appearing as the most important nutrients for predicting crop yield. A limiting factor for mapping nutrients using the existing point data in Africa appears to be (1) the high spatial clustering of sampling locations, and (2) missing more detailed parent material/geological maps. Logical steps towards improving prediction accuracies include: further collection of input (training) point samples, further harmonization of measurement methods, addition of more detailed covariates specific to Africa, and implementation of a full spatiotemporal statistical modeling framework.

19.
PLoS One ; 10(6): e0125814, 2015.
Article in English | MEDLINE | ID: mdl-26110833

ABSTRACT

80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data.


Subject(s)
Environmental Monitoring/methods , Soil/chemistry , Africa , Models, Theoretical
20.
Ecol Lett ; 18(2): 200-17, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25560682

ABSTRACT

The study of islands as model systems has played an important role in the development of evolutionary and ecological theory. The 50th anniversary of MacArthur and Wilson's (December 1963) article, 'An equilibrium theory of insular zoogeography', was a recent milestone for this theme. Since 1963, island systems have provided new insights into the formation of ecological communities. Here, building on such developments, we highlight prospects for research on islands to improve our understanding of the ecology and evolution of communities in general. Throughout, we emphasise how attributes of islands combine to provide unusual research opportunities, the implications of which stretch far beyond islands. Molecular tools and increasing data acquisition now permit re-assessment of some fundamental issues that interested MacArthur and Wilson. These include the formation of ecological networks, species abundance distributions, and the contribution of evolution to community assembly. We also extend our prospects to other fields of ecology and evolution - understanding ecosystem functioning, speciation and diversification - frequently employing assets of oceanic islands in inferring the geographic area within which evolution has occurred, and potential barriers to gene flow. Although island-based theory is continually being enriched, incorporating non-equilibrium dynamics is identified as a major challenge for the future.


Subject(s)
Biological Evolution , Islands , Models, Biological , Biodiversity , Ecology , Ecosystem , Gene Flow , Genetic Speciation , Geography , Population Dynamics , Social Isolation
SELECTION OF CITATIONS
SEARCH DETAIL
...