Your browser doesn't support javascript.
loading
Generating High Spatial Resolution Exposure Estimates from Sparse Regulatory Monitoring Data.
Ge, Yihui; Yang, Zhenchun; Lin, Yan; Hopke, Philip K; Presto, Albert A; Wang, Meng; Rich, David Q; Zhang, Junfeng.
Afiliação
  • Ge Y; Nicholas School of the Environment, Duke University, Durham, NC 27708, United States.
  • Yang Z; Nicholas School of the Environment and Global Health Institute, Duke University, Durham, NC 27708, United States.
  • Lin Y; Nicholas School of the Environment and Global Health Institute, Duke University, Durham, NC 27708, United States.
  • Hopke PK; Department of Public Health Sciences, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA.
  • Presto AA; Institute for a Sustainable Environment, Clarkson University, Potsdam, NY 13699, USA.
  • Wang M; Center for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, United States of America.
  • Rich DQ; Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, United States of America.
  • Zhang J; University at Buffalo, School of Public Health and Health Professions, Buffalo, New York 14214, United States.
Atmos Environ (1994) ; 3132023 Nov 15.
Article em En | MEDLINE | ID: mdl-37781099
Random Forest algorithms have extensively been used to estimate ambient air pollutant concentrations. However, the accuracy of model-predicted estimates can suffer from extrapolation problems associated with limited measurement data to train the machine learning algorithms. In this study, we developed and evaluated two approaches, incorporating low-cost sensor data, that enhanced the extrapolating ability of random-forest models in areas with sparse monitoring data. Rochester, NY is the area of a pregnancy-cohort study. Daily PM2.5 concentrations from the NAMS/SLAMS sites were obtained and used as the response variable in the model, with satellite data, meteorological, and land-use variables included as predictors. To improve the base random-forest models, we used PM2.5 measurements from a pre-existing low-cost sensors network, and then conducted a two-step backward selection to gradually eliminate variables with potential emission heterogeneity from the base models. We then introduced the regression-enhanced random forest method into the model development. Finally, contemporaneous urinary 1-hydroxypyrene was used to evaluate the PM2.5 predictions generated from the two approaches. The two-step approach increased the average external validation R2 from 0.49 to 0.65, and decreased the RMSE from 3.56 µg/m3 to 2.96 µg/m3. For the regression-enhanced random forest models, the average R2 of the external validation was 0.54, and the RMSE was 3.40 µg/m3. We also observed significant and comparable relationships between urinary 1-hydroxypyrene levels and PM2.5 predictions from both improved models. This PM2.5 model estimation strategy could improve the extrapolating ability of random forest models in areas with sparse monitoring data.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Observational_studies / Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Observational_studies / Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article