Your browser doesn't support javascript.
loading
Spatiotemporal modelling of airborne birch and grass pollen concentration across Switzerland: A comparison of statistical, machine learning and ensemble methods.
Valipour Shokouhi, Behzad; de Hoogh, Kees; Gehrig, Regula; Eeftens, Marloes.
Affiliation
  • Valipour Shokouhi B; Swiss Tropical and Public Health Institute, Allschwil, Switzerland; University of Basel, Basel, Switzerland.
  • de Hoogh K; Swiss Tropical and Public Health Institute, Allschwil, Switzerland; University of Basel, Basel, Switzerland.
  • Gehrig R; Federal Office of Meteorology and Climatology MeteoSwiss, Switzerland.
  • Eeftens M; Swiss Tropical and Public Health Institute, Allschwil, Switzerland; University of Basel, Basel, Switzerland. Electronic address: marloes.eeftens@swisstph.ch.
Environ Res ; 263(Pt 1): 119999, 2024 Sep 20.
Article in En | MEDLINE | ID: mdl-39305973
ABSTRACT

BACKGROUND:

Statistical and machine learning models are commonly used to estimate spatial and temporal variability in exposure to environmental stressors, supporting epidemiological studies. We aimed to compare the performances, strengths and limitations of six different algorithms in the retrospective spatiotemporal modeling of daily birch and grass pollen concentrations at a spatial resolution of 1 km across Switzerland.

METHODS:

Daily birch and grass pollen concentrations were available from 14 measurement sites in Switzerland for 2000-2019. To develop the spatiotemporal models, we considered spatiotemporal, spatial and temporal predictors including meteorological factors, land-use, elevation, species distribution and Normalized Difference Vegetation Index (NDVI). We used six statistical and machine learning algorithms LASSO, Ridge, Elastic net, Random forest, XGBoost and ANNs. We optimized model structures through feature selection and grid search techniques to obtain the best predictive performance. We used train-test split and cross-validation to avoid overfitting and overoptimistic performance indicators. We then combined these six models through multiple linear regression to develop an ensemble hybrid model.

RESULTS:

The 5th-95th percentiles of birch and grass pollen concentrations were 0-151 and 0-105 grains/m3, respectively. The hybrid ensemble model achieved the best RMSE on the test dataset for both birch and grass pollen with 94.4 and 19.7 grains/m3, respectively. Nonlinear models (Random forest, XGBoost and ANNs) achieved lower test RMSE's than linear models (LASSO, Ridge, Elastic net) for both pollen types, with RMSE's ranging from 105.9 to 140.5 grains/m3 for birch and from 20.0 to 25.4 grains/m3 for grass pollen. The Random forest algorithm yielded the best spatial and temporal performance among the six evaluated modelling methods. The ensemble hybrid model outperformed the six linear and nonlinear algorithms. Country-wide pollen concentration, land use, weather, and NDVI were important predictors.

CONCLUSION:

Nonlinear algorithms outperformed linear models and accurately explained complex, nonlinear relationships between environmental factors and measured concentrations.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Environ Res Year: 2024 Document type: Article Affiliation country: Switzerland Country of publication: Netherlands

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Environ Res Year: 2024 Document type: Article Affiliation country: Switzerland Country of publication: Netherlands