Your browser doesn't support javascript.
loading
Performance comparison of machine learning models used for predicting subclinical mastitis in dairy cows: Bagging, boosting, stacking, and super-learner ensembles versus single machine learning models.
Satola, A; Satola, K.
Affiliation
  • Satola A; Department of Genetics, Animal Breeding and Ethology, Faculty of Animal Science, University of Agriculture in Krakow, 30-059 Krakow, Poland. Electronic address: alicja.satola@urk.edu.pl.
  • Satola K; Independent researcher, 31-416 Krakow, Poland.
J Dairy Sci ; 107(6): 3959-3972, 2024 Jun.
Article in En | MEDLINE | ID: mdl-38310958
ABSTRACT
Mastitis has a substantial impact on the dairy industry across the world, causing dairy producers to suffer losses due to the reduced quality and quantity of produced milk. A further problem, related to this issue, is the excessive use of antibiotics that leads to the development of resistance in different bacterial strains. The growing consumer awareness oriented toward food safety and rational use of antibiotics has promoted the search for new methods of early identification of cows that may be at risk of developing the disease. Subclinical mastitis does not cause any visible changes to the udder or milk, and therefore it is more difficult to detect than clinical mastitis. The collection of large amounts of data related to milk performance of cows allows using machine learning (ML) methods to build models that could be used for classifying cows into healthy and at risk of subclinical mastitis. The data used for the purpose of this study included information from routine milk recording procedures. The dataset consisted of 19,856 records of 2,227 Polish Holstein-Friesian cows from 3 herds. The authors decided to use the approach of building ensemble ML models, in particular bagging, boosting, stacking, and super-learner models, and comparing them for accuracy of identification of disease-affected cows against single ML models based on the support vector machines, logistic regression, Gaussian Naive Bayes, k-nearest neighbors, and decision tree algorithms. The models were trained and evaluated based on the information recorded for herd 1 and using an 8020 train-test split ratio according to animal ID (to avoid data leakage). The information recorded for herds 2 and 3 was only used to evaluate on unseen data models developed using the herd 1 dataset. Among the single ML models, the support vector machines model was found to be the most accurate in predicting subclinical mastitis at subsequent test day when used both for the training set (mean F1-score of 0.760) and the testing sets containing data for herds 1, 2, and 3 (F1-score of 0.778, 0.790, and 0.741 respectively). The gradient boosting model was found to be the best performing model among the ensemble ML models (F1-score of 0.762, 0.779, 0.791, and 0.723 for the training set and the testing sets, respectively). The super-learner model, featuring the most advanced design and logistic regression in the meta layer, achieved the highest mean F1-score of 0.775 during the cross validation; however, it was characterized by a slightly worse prediction accuracy of the testing sets (mean F1-score of 0.768, 0.790, and 0.693 for herds 1, 2 and 3 respectively). The study findings confirm the promising role of ensemble ML methods, which were found to be slightly superior with respect to most of the single ML models.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Milk / Machine Learning / Mastitis, Bovine Type of study: Prognostic_studies / Risk_factors_studies Limits: Animals Language: En Journal: J Dairy Sci Year: 2024 Document type: Article Country of publication:

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Milk / Machine Learning / Mastitis, Bovine Type of study: Prognostic_studies / Risk_factors_studies Limits: Animals Language: En Journal: J Dairy Sci Year: 2024 Document type: Article Country of publication: