Your browser doesn't support javascript.
loading
Is replacing missing values of PM2.5 constituents with estimates using machine learning better for source apportionment than exclusion or median replacement?
Kim, Youngkwon; Yi, Seung-Muk; Heo, Jongbae; Kim, Hwajin; Lee, Woojoo; Kim, Ho; Hopke, Philip K; Lee, Young Su; Shin, Hye-Jung; Park, Jungmin; Yoo, Myungsoo; Jeon, Kwonho; Park, Jieun.
Afiliação
  • Kim Y; Department of Environmental Health Sciences, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea.
  • Yi SM; Department of Environmental Health Sciences, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea; Institute of Health and Environment, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea.
  • Heo J; Busan Development Institute, Busan, 47210, Republic of Korea.
  • Kim H; Department of Environmental Health Sciences, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea.
  • Lee W; Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea.
  • Kim H; Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea.
  • Hopke PK; Center for Air Resources Engineering and Science, Clarkson University, Potsdam, NY, 13699, USA; Department of Public Health Sciences, University of Rochester, School of Medicine and Dentistry, Rochester, NY, 14642, USA.
  • Lee YS; Department of Energy and Environmental Engineering, Soonchunhyang University, Soonchunhyang-ro, Sinchang-myeon, Asan-si, Chungcheongnam-do, 31538, Republic of Korea.
  • Shin HJ; Air Quality Research Division, Department of Climate and Air Quality Research, National Institute of Environmental Research, Incheon, 22689, Republic of Korea.
  • Park J; Air Quality Research Division, Department of Climate and Air Quality Research, National Institute of Environmental Research, Incheon, 22689, Republic of Korea.
  • Yoo M; Department of Climate and Air Quality Research, National Institute of Environmental Research, Incheon, 22689, Republic of Korea.
  • Jeon K; Global Environment Research Division, Department of Climate and Air Quality Research, National Institute of Environmental Research, Incheon, 22689, Republic of Korea.
  • Park J; Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, MA, 02215, USA. Electronic address: jieun_park@hsph.harvard.edu.
Environ Pollut ; 354: 124165, 2024 Aug 01.
Article em En | MEDLINE | ID: mdl-38759749
ABSTRACT
East Asian countries have been conducting source apportionment of fine particulate matter (PM2.5) by applying positive matrix factorization (PMF) to hourly constituent concentrations. However, some of the constituent data from the supersites in South Korea was missing due to instrument maintenance and calibration. Conventional preprocessing of missing values, such as exclusion or median replacement, causes biases in the estimated source contributions by changing the PMF input. Machine learning (ML) can estimate the missing values by training on constituent data, meteorological data, and gaseous pollutants. Complete data from the Seoul Supersite in 2018 was taken, and a random 20% was set as missing. PMF was performed by replacing missing values with estimates. Percent errors of the source contributions were calculated compared to those estimated from complete data. Missing values were estimated using a random forest analysis. Estimation accuracy (r2) was as high as 0.874 for missing carbon species and low at 0.631 when ionic species and trace elements were missing. For the seven highest contributing sources, replacing the missing values of carbon species with estimates minimized the percent errors to 2.0% on average. However, replacing the missing values of the other chemical species with estimates increased the percent errors to more than 9.7% on average. Percent errors were maximal at 37% on average when missing values of ionic species and trace elements were replaced with estimates. Missing values, except for carbon species, need to be excluded. This approach reduced the percent errors to 7.4% on average, which was lower than those due to median replacement. Our results show that reducing the biases in source apportionment is possible by replacing the missing values of carbon species with estimates. To improve the biases due to missing values of the other chemical species, the estimation accuracy of the ML needs to be improved.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Monitoramento Ambiental / Poluentes Atmosféricos / Material Particulado / Aprendizado de Máquina País/Região como assunto: Asia Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Monitoramento Ambiental / Poluentes Atmosféricos / Material Particulado / Aprendizado de Máquina País/Região como assunto: Asia Idioma: En Ano de publicação: 2024 Tipo de documento: Article