Your browser doesn't support javascript.
loading
Application of machine learning (individual vs stacking) models on MERRA-2 data to predict surface PM2.5 concentrations over India.
Dhandapani, Abisheg; Iqbal, Jawed; Kumar, R Naresh.
  • Dhandapani A; Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India.
  • Iqbal J; Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India.
  • Kumar RN; Department of Civil and Environmental Engineering, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India. Electronic address: rnaresh@bitmesra.ac.in.
Chemosphere ; 340: 139966, 2023 Nov.
Article en En | MEDLINE | ID: mdl-37634588
ABSTRACT
The spatial coverage of PM2.5 monitoring is non-uniform across India due to the limited number of ground monitoring stations. Alternatively, Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), is an atmospheric reanalysis data used for estimating PM2.5. MERRA-2 does not explicitly measure PM2.5 but rather follows an empirical model. MERRA-2 data were spatiotemporally collocated with ground observation for validation across India. Significant underestimation in MERRA-2 prediction of PM2.5 was observed over many monitoring stations ranging from -20 to 60 µg m-3. The utility of Machine Learning (ML) models to overcome this challenge was assessed. MERRA-2 aerosol and meteorological parameters were the input features used to train and test the individual ML models and compare them with the stacking technique. Initially, with 10% of randomly selected data, individual model performance was assessed to identify the best model. XGBoost (XGB) was the best model (r2 = 0.73) compared to Random Forest (RF) and LightGBM (LGBM). Stacking was then applied by keeping XGB as a meta-regressor. Stacked model results (r2 = 0.77) outperformed the best standalone estimate of XGB. Stacking technique was used to predict hourly and daily PM2.5 in different regions across India and each monitoring station. The eastern region exhibited the best hourly prediction (r2 = 0.80) and substantial reduction in Mean Bias (MB = -0.03 µg m-3), followed by the northern region (r2 = 0.63 and MB = -0.10 µg m-3), which showed better output due to the frequent observation of PM2.5 >100 µg m-3. Due to sparse data availability to train the ML models, the lowest performance was for the central region (r2 = 0.46 and MB = -0.60 µg m-3). Overall, India's PM2.5 prediction was good on an hourly basis compared to a daily basis using the ML stacking technique.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Meteorología / Aprendizaje Automático Tipo de estudio: Observational_studies / Prognostic_studies / Risk_factors_studies País como asunto: Asia Idioma: En Año: 2023 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Meteorología / Aprendizaje Automático Tipo de estudio: Observational_studies / Prognostic_studies / Risk_factors_studies País como asunto: Asia Idioma: En Año: 2023 Tipo del documento: Article