ABSTRACT
In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for their ability to estimate the probability of rabies in animals investigated as part of an Integrated Bite Case Management program (IBCM). To balance our training data, we used Random Oversampling (ROS) and Synthetic Minority Oversampling Technique. We developed a risk stratification framework based on predicted rabies probabilities. XGB performed better at predicting rabies cases than LR. Oversampling strategies enhanced the model sensitivity making them the preferred technique to predict rare events like rabies in a biting animal. XGB-ROS classified most of the confirmed rabies cases and only a small proportion of non-cases as either high (confirmed cases = 85.2%, non-cases = 0.01%) or moderate (confirmed cases = 8.4%, non-cases = 4.0%) risk. Model-based risk stratification led to a 3.2-fold increase in epidemiologically useful data compared to a routine surveillance strategy using IBCM case definitions. Our study demonstrates the application of machine learning to strengthen zoonotic disease surveillance under resource-limited settings.
Subject(s)
Machine Learning , Rabies , Rabies/epidemiology , Rabies/veterinary , Animals , Humans , Logistic Models , Dogs , Bites and Stings/epidemiology , Bites and Stings/virology , Epidemiological MonitoringABSTRACT
In recent years, the reports of Kyasanur forest disease (KFD) breaking endemic barriers by spreading to new regions and crossing state boundaries is alarming. Effective disease surveillance and reporting systems are lacking for this emerging zoonosis, hence hindering control and prevention efforts. We compared time-series models using weather data with and without Event-Based Surveillance (EBS) information, i.e., news media reports and internet search trends, to predict monthly KFD cases in humans. We fitted Extreme Gradient Boosting (XGB) and Long Short Term Memory models at the national and regional levels. We utilized the rich epidemiological data from endemic regions by applying Transfer Learning (TL) techniques to predict KFD cases in new outbreak regions where disease surveillance information was scarce. Overall, the inclusion of EBS data, in addition to the weather data, substantially increased the prediction performance across all models. The XGB method produced the best predictions at the national and regional levels. The TL techniques outperformed baseline models in predicting KFD in new outbreak regions. Novel sources of data and advanced machine-learning approaches, e.g., EBS and TL, show great potential towards increasing disease prediction capabilities in data-scarce scenarios and/or resource-limited settings, for better-informed decisions in the face of emerging zoonotic threats.
Subject(s)
Kyasanur Forest Disease , Animals , Humans , Kyasanur Forest Disease/epidemiology , Resource-Limited Settings , Zoonoses/epidemiology , Disease Outbreaks , Machine Learning , India/epidemiologyABSTRACT
The complex, unpredictable nature of pathogen occurrence has required substantial efforts to accurately predict infectious diseases (IDs). With rising popularity of Machine Learning (ML) and Deep Learning (DL) techniques combined with their unique ability to uncover connections between large amounts of diverse data, we conducted a PRISMA systematic review to investigate advances in ID prediction for human and animal diseases using ML and DL. This review included the type of IDs modeled, ML and DL techniques utilized, geographical distribution, prediction tasks performed, input features utilized, spatial and temporal scales, error metrics used, computational efficiency, uncertainty quantification, and missing data handling methods. Among 237 relevant articles published between January 2001 and May 2021, highly contagious diseases in humans were most often represented, including COVID-19 (37.1%), influenza/influenza-like illnesses (9.3%), dengue (8.9%), and malaria (5.1%). Out of 37 diseases identified, 51.4% were zoonotic, 37.8% were human-only, and 8.1% were animal-only, with only 1.6% economically significant, non-zoonotic livestock diseases. Despite the number of zoonoses, 86.5% of articles modeled humans whereas only a few articles (5.1%) contained more than one host species. Eastern Asia (32.5%), North America (17.7%), and Southern Asia (13.1%) were the most represented locations. Frequent approaches included tree-based ML (38.4%) and feed-forward neural networks (26.6%). Articles predicted temporal incidence (66.7%), disease risk (38.0%), and/or spatial movement (31.2%). Less than 10% of studies addressed uncertainty quantification, computational efficiency, and missing data, which are essential to operational use and deployment. This study highlights trends and gaps in ML and DL for ID prediction, providing guidelines for future works to better support biopreparedness and response. To fully utilize ML and DL for improved ID forecasting, models should include the full disease ecology in a One-Health context, important food and agricultural diseases, underrepresented hotspots, and important metrics required for operational deployment.
ABSTRACT
Q Fever is a zoonotic disease of significant animal and public health concern, caused by Coxiella burnetii (C. burnetii), an obligate intracellular bacterium. This study was done to evaluate the diagnostic sensitivity (DSe) and diagnostic specificity (DSp) of three diagnostic methods to diagnose C. burnetii infection in cattle and buffaloes in Punjab, India: an indirect ELISA method applied in serum samples and a trans-Polymerase Chain Reaction (trans-PCR) technique applied in milk samples and genital swabs, using a Bayesian latent class analysis. Conditional independence was assumed between the tests, given (i) the different biological principle of ELISA and trans-PCR and (ii) the fact that the trans-PCR was performed on different tissues. The ELISA method in the serum samples showed the highest DSe of 0.97 (95% Probability Intervals (PIs): 0.93; 0.99) compared to the trans-PCR method applied in milk samples 0.76 (0.63; 0.87) and genital swabs 0.73 (0.58; 0.85). The DSps of all tests were high, with trans-PCR in genital swabs recording the highest DSp of 0.99 (0.98; 1), while the DSp of trans-PCR in milk samples and ELISA in serum samples were 0.97 (0.95; 0.99) and 0.95 (0.93; 0.97) respectively. The study results show that none of the applied tests are perfect, therefore, a testing regimen based on the diagnostic characteristic of the tests may be considered for diagnosis of C. burnetii.
Subject(s)
Bison , Cattle Diseases , Coxiella burnetii , Q Fever , Animals , Bayes Theorem , Buffaloes , Cattle , Cattle Diseases/microbiology , Coxiella burnetii/genetics , Diagnostic Tests, Routine , Enzyme-Linked Immunosorbent Assay/veterinary , India , Latent Class Analysis , Milk/microbiology , Q Fever/diagnosis , Q Fever/microbiology , Q Fever/veterinaryABSTRACT
Accurate infectious disease forecasting can inform efforts to prevent outbreaks and mitigate adverse impacts. This study compares the performance of statistical, machine learning (ML), and deep learning (DL) approaches in forecasting infectious disease incidences across different countries and time intervals. We forecasted three diverse diseases: campylobacteriosis, typhoid, and Q-fever, using a wide variety of features (n = 46) from public datasets, e.g., landscape, climate, and socioeconomic factors. We compared autoregressive statistical models to two tree-based ML models (extreme gradient boosted trees [XGB] and random forest [RF]) and two DL models (multi-layer perceptron and encoder-decoder model). The disease models were trained on data from seven different countries at the region-level between 2009-2017. Forecasting performance of all models was assessed using mean absolute error, root mean square error, and Poisson deviance across Australia, Israel, and the United States for the months of January through August of 2018. The overall model results were compared across diseases as well as various data splits, including country, regions with highest and lowest cases, and the forecasted months out (i.e., nowcasting, short-term, and long-term forecasting). Overall, the XGB models performed the best for all diseases and, in general, tree-based ML models performed the best when looking at data splits. There were a few instances where the statistical or DL models had minutely smaller error metrics for specific subsets of typhoid, which is a disease with very low case counts. Feature importance per disease was measured by using four tree-based ML models (i.e., XGB and RF with and without region name as a feature). The most important feature groups included previous case counts, region name, population counts and density, mortality causes of neonatal to under 5 years of age, sanitation factors, and elevation. This study demonstrates the power of ML approaches to incorporate a wide range of factors to forecast various diseases, regardless of location, more accurately than traditional statistical approaches.
ABSTRACT
Infectious disease surveillance is crucial for early detection and situational awareness of disease outbreaks. Digital biosurveillance monitors large volumes of open-source data to flag potential health threats. This study investigates the potential of digital surveillance in the detection of the top five priority zoonotic diseases in Kenya: Rift Valley fever (RVF), anthrax, rabies, brucellosis, and trypanosomiasis. Open-source disease events reported between August 2016 and October 2020 were collected and key event-specific information was extracted using a newly developed disease event taxonomy. A total of 424 disease reports encompassing 55 unique events belonging to anthrax (43.6%), RVF (34.6%), and rabies (21.8%) were identified. Most events were first reported by news media (78.2%) followed by international health organizations (16.4%). News media reported the events 4.1 (±4.7) days faster than the official reports. There was a positive association between official reporting and RVF events (odds ratio (OR) 195.5, 95% confidence interval (CI); 24.01-4756.43, p < 0.001) and a negative association between official reporting and local media coverage of events (OR 0.03, 95% CI; 0.00-0.17, p = 0.030). This study highlights the usefulness of local news in the detection of potentially neglected zoonotic disease events and the importance of digital biosurveillance in resource-limited settings.