A unified Foot and Mouth Disease dataset for Uganda: evaluating machine learning predictive performance degradation under varying distributions.

Kapalaga, Geofrey; Kivunike, Florence N; Kerfua, Susan; Jjingo, Daudi; Biryomumaisho, Savino; Rutaisire, Justus; Ssajjakambwe, Paul; Mugerwa, Swidiq; Kiwala, Yusuf

Kapalaga, Geofrey; Kivunike, Florence N; Kerfua, Susan; Jjingo, Daudi; Biryomumaisho, Savino; Rutaisire, Justus; Ssajjakambwe, Paul; Mugerwa, Swidiq; Kiwala, Yusuf.

Affiliation

Kapalaga G; Department of Information Technology, College of Computing and Information Sciences, Makerere University, Kampala, Uganda.
Kivunike FN; Department of Information Technology, College of Computing and Information Sciences, Makerere University, Kampala, Uganda.
Kerfua S; National Livestock Resources Research Institute, Kampala, Uganda.
Jjingo D; African Center of Excellence in Bioinformatics (ACE-B), Makerere University, Kampala, Uganda.
Biryomumaisho S; Department of Computer Science, College of Computing and Information Sciences, Makerere University, Kampala, Uganda.
Rutaisire J; College of Veterinary Medicine, Animal Resources and Bio-Security, Makerere University, Kampala, Uganda.
Ssajjakambwe P; National Livestock Resources Research Institute, Kampala, Uganda.
Mugerwa S; National Livestock Resources Research Institute, Kampala, Uganda.
Kiwala Y; National Livestock Resources Research Institute, Kampala, Uganda.

Front Artif Intell ; 7: 1446368, 2024.

Article in En | MEDLINE | ID: mdl-39144542

ABSTRACT

ABSTRACT

In Uganda, the absence of a unified dataset for constructing machine learning models to predict Foot and Mouth Disease outbreaks hinders preparedness. Although machine learning models exhibit excellent predictive performance for Foot and Mouth Disease outbreaks under stationary conditions, they are susceptible to performance degradation in non-stationary environments. Rainfall and temperature are key factors influencing these outbreaks, and their variability due to climate change can significantly impact predictive performance. This study created a unified Foot and Mouth Disease dataset by integrating disparate sources and pre-processing data using mean imputation, duplicate removal, visualization, and merging techniques. To evaluate performance degradation, seven machine learning models were trained and assessed using metrics including accuracy, area under the receiver operating characteristic curve, recall, precision and F1-score. The dataset showed a significant class imbalance with more non-outbreaks than outbreaks, requiring data augmentation methods. Variability in rainfall and temperature impacted predictive performance, causing notable degradation. Random Forest with borderline SMOTE was the top-performing model in a stationary environment, achieving 92% accuracy, 0.97 area under the receiver operating characteristic curve, 0.94 recall, 0.90 precision, and 0.92 F1-score. However, under varying distributions, all models exhibited significant performance degradation, with random forest accuracy dropping to 46%, area under the receiver operating characteristic curve to 0.58, recall to 0.03, precision to 0.24, and F1-score to 0.06. This study underscores the creation of a unified Foot and Mouth Disease dataset for Uganda and reveals significant performance degradation in seven machine learning models under varying distributions. These findings highlight the need for new methods to address the impact of distribution variability on predictive performance.

Key words

Foot and Mouth Disease; class imbalance; distribution shifts; machine learning; performance degradation rates

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Front Artif Intell Year: 2024 Document type: Article Affiliation country: Uganda

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Front Artif Intell Year: 2024 Document type: Article Affiliation country: Uganda