Optimizing feature selection with gradient boosting machines in PLS regression for predicting moisture and protein in multi-country corn kernels via NIR spectroscopy.
Food Chem
; 456: 140062, 2024 Oct 30.
Article
em En
| MEDLINE
| ID: mdl-38876073
ABSTRACT
Differences in moisture and protein content impact both nutritional value and processing efficiency of corn kernels. Near-infrared (NIR) spectroscopy can be used to estimate kernel composition, but models trained on a few environments may underestimate error rates and bias. We assembled corn samples from diverse international environments and used NIR with chemometrics and partial least squares regression (PLSR) to determine moisture and protein. The potential of five feature selection methods to improve prediction accuracy was assessed by extracting sensitive wavelengths. Gradient boosting machines (GBMs), particularly CatBoost and LightGBM, were found to effectively select crucial wavelengths for moisture (1409, 1900, 1908, 1932, 1953, 2174 nm) and protein (887, 1212, 1705, 1891, 2097, 2456 nm). SHAP plots highlighted significant wavelength contributions to model prediction. These results illustrate GBMs' effectiveness in feature engineering for agricultural and food sector applications, including developing multi-country global calibration models for moisture and protein in corn kernels.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Proteínas de Plantas
/
Água
/
Espectroscopia de Luz Próxima ao Infravermelho
/
Zea mays
Idioma:
En
Revista:
Food Chem
Ano de publicação:
2024
Tipo de documento:
Article
País de afiliação:
Estados Unidos