Your browser doesn't support javascript.
loading
Optimizing feature selection with gradient boosting machines in PLS regression for predicting moisture and protein in multi-country corn kernels via NIR spectroscopy.
Zheng, Runyu; Jia, Yuyao; Ullagaddi, Chidanand; Allen, Cody; Rausch, Kent; Singh, Vijay; Schnable, James C; Kamruzzaman, Mohammed.
Afiliação
  • Zheng R; Department of Agricultural and Biological Engineering, University of Illinois at Urbana- Champaign, Urbana, IL, 61801, USA.
  • Jia Y; Department of Agricultural and Biological Engineering, University of Illinois at Urbana- Champaign, Urbana, IL, 61801, USA.
  • Ullagaddi C; Department of Agronomy and Horticulture, University of Nebraska - Lincoln, Lincoln, NE, USA.
  • Allen C; Department of Agricultural and Biological Engineering, University of Illinois at Urbana- Champaign, Urbana, IL, 61801, USA.
  • Rausch K; Department of Agricultural and Biological Engineering, University of Illinois at Urbana- Champaign, Urbana, IL, 61801, USA.
  • Singh V; Department of Agricultural and Biological Engineering, University of Illinois at Urbana- Champaign, Urbana, IL, 61801, USA.
  • Schnable JC; Department of Agronomy and Horticulture, University of Nebraska - Lincoln, Lincoln, NE, USA.
  • Kamruzzaman M; Department of Agricultural and Biological Engineering, University of Illinois at Urbana- Champaign, Urbana, IL, 61801, USA. Electronic address: mkamruz1@illinois.edu.
Food Chem ; 456: 140062, 2024 Oct 30.
Article em En | MEDLINE | ID: mdl-38876073
ABSTRACT
Differences in moisture and protein content impact both nutritional value and processing efficiency of corn kernels. Near-infrared (NIR) spectroscopy can be used to estimate kernel composition, but models trained on a few environments may underestimate error rates and bias. We assembled corn samples from diverse international environments and used NIR with chemometrics and partial least squares regression (PLSR) to determine moisture and protein. The potential of five feature selection methods to improve prediction accuracy was assessed by extracting sensitive wavelengths. Gradient boosting machines (GBMs), particularly CatBoost and LightGBM, were found to effectively select crucial wavelengths for moisture (1409, 1900, 1908, 1932, 1953, 2174 nm) and protein (887, 1212, 1705, 1891, 2097, 2456 nm). SHAP plots highlighted significant wavelength contributions to model prediction. These results illustrate GBMs' effectiveness in feature engineering for agricultural and food sector applications, including developing multi-country global calibration models for moisture and protein in corn kernels.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas de Plantas / Água / Espectroscopia de Luz Próxima ao Infravermelho / Zea mays Idioma: En Revista: Food Chem Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas de Plantas / Água / Espectroscopia de Luz Próxima ao Infravermelho / Zea mays Idioma: En Revista: Food Chem Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos