Predicting cow milk quality traits from routinely available milk spectra using statistical machine learning methods.

Frizzarin, M; Gormley, I C; Berry, D P; Murphy, T B; Casa, A; Lynch, A; McParland, S

Frizzarin, M; Gormley, I C; Berry, D P; Murphy, T B; Casa, A; Lynch, A; McParland, S.

Afiliação

Frizzarin M; School of Mathematics and Statistics, University College Dublin, Belfield, Dublin 4, Ireland; Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Fermoy, Co. Cork, P61 P302 Ireland.
Gormley IC; School of Mathematics and Statistics, University College Dublin, Belfield, Dublin 4, Ireland.
Berry DP; Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Fermoy, Co. Cork, P61 P302 Ireland.
Murphy TB; School of Mathematics and Statistics, University College Dublin, Belfield, Dublin 4, Ireland.
Casa A; School of Mathematics and Statistics, University College Dublin, Belfield, Dublin 4, Ireland.
Lynch A; School of Mathematics and Statistics, University College Dublin, Belfield, Dublin 4, Ireland.
McParland S; Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Fermoy, Co. Cork, P61 P302 Ireland. Electronic address: sinead.mcparland@teagasc.ie.

J Dairy Sci ; 104(7): 7438-7447, 2021 Jul.

Article em En | MEDLINE | ID: mdl-33865578

ABSTRACT

ABSTRACT

Numerous statistical machine learning methods suitable for application to highly correlated features, as those that exist for spectral data, could potentially improve prediction performance over the commonly used partial least squares approach. Milk samples from 622 individual cows with known detailed protein composition and technological trait data accompanied by mid-infrared spectra were available to assess the predictive ability of different regression and classification algorithms. The regression-based approaches were partial least squares regression (PLSR), ridge regression (RR), least absolute shrinkage and selection operator (LASSO), elastic net, principal component regression, projection pursuit regression, spike and slab regression, random forests, boosting decision trees, neural networks (NN), and a post-hoc approach of model averaging (MA). Several classification methods (i.e., partial least squares discriminant analysis (PLSDA), random forests, boosting decision trees, and support vector machines (SVM)) were also used after stratifying the traits of interest into categories. In the regression analyses, MA was the best prediction method for 6 of the 14 traits investigated [curd firmness at 60 min, αS1-casein (CN), αS2-CN, κ-CN, α-lactalbumin, and ß-lactoglobulin B], whereas NN and RR were the best algorithms for 3 traits each (rennet coagulation time, curd-firming time, and heat stability, and curd firmness at 30 min, ß-CN, and ß-lactoglobulin A, respectively), PLSR was best for pH, and LASSO was best for CN micelle size. When traits were divided into 2 classes, SVM had the greatest accuracy for the majority of the traits investigated. Although the well-established PLSR-based method performed competitively, the application of statistical machine learning methods for regression analyses reduced the root mean square error compared with PLSR from between 0.18% (κ-CN) to 3.67% (heat stability). The use of modern statistical machine learning methods for trait prediction from mid-infrared spectroscopy may improve the prediction accuracy for some traits.

Assuntos

Caseínas; Leite; Animais; Bovinos; Feminino; Lactoglobulinas; Aprendizado de Máquina; Proteínas do Leite; Fenótipo

Palavras-chave

Fourier-transform mid-infrared spectroscopy; milk quality; statistical machine learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Caseínas / Leite Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google