Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 70
Filter
1.
J Travel Med ; 2024 Mar 17.
Article in English | MEDLINE | ID: mdl-38630887

ABSTRACT

BACKGROUND: The international flight network creates multiple routes by which pathogens can quickly spread across the globe. In the early stages of infectious disease outbreaks, analyses using flight passenger data to identify countries at risk of importing the pathogen are common and can help inform disease control efforts. A challenge faced in this modelling is that the latest aviation statistics (referred to as contemporary data) are typically not immediately available. Therefore, flight patterns from a previous year are often used (referred to as historical data). We explored the suitability of historical data for predicting the spatial spread of emerging epidemics. METHODS: We analysed monthly flight passenger data from the International Air Transport Association to assess how baseline air travel patterns were affected in outbreaks of MERS, Zika, and SARS-CoV-2 over the past decade. We then used a stochastic discrete time SEIR metapopulation model to simulate global spread of different pathogens, comparing how epidemic dynamics differed in simulations based on historical and contemporary data. RESULTS: We observed local, short-term disruptions to air travel from South Korea and Brazil for the MERS and Zika outbreaks we studied, whereas global and longer-term flight disruption occurred during the SARS-CoV-2 pandemic.For outbreak events that were accompanied by local, small, and short-term changes in air travel, epidemic models using historical flight data gave similar projections of timing and locations of disease spread as when using contemporary flight data. However, historical data were less reliable to model the spread of an atypical outbreak such as SARS-CoV-2 in which there were durable and extensive levels of global travel disruption. CONCLUSIONS: The use of historical flight data as a proxy in epidemic models is an acceptable practice except in rare, large epidemics that lead to substantial disruptions to international travel.

3.
PLoS One ; 18(10): e0286199, 2023.
Article in English | MEDLINE | ID: mdl-37851661

ABSTRACT

Since 8th March 2020 up to the time of writing, we have been producing near real-time weekly estimates of SARS-CoV-2 transmissibility and forecasts of deaths due to COVID-19 for all countries with evidence of sustained transmission, shared online. We also developed a novel heuristic to combine weekly estimates of transmissibility to produce forecasts over a 4-week horizon. Here we present a retrospective evaluation of the forecasts produced between 8th March to 29th November 2020 for 81 countries. We evaluated the robustness of the forecasts produced in real-time using relative error, coverage probability, and comparisons with null models. During the 39-week period covered by this study, both the short- and medium-term forecasts captured well the epidemic trajectory across different waves of COVID-19 infections with small relative errors over the forecast horizon. The model was well calibrated with 56.3% and 45.6% of the observations lying in the 50% Credible Interval in 1-week and 4-week ahead forecasts respectively. The retrospective evaluation of our models shows that simple transmission models calibrated using routine disease surveillance data can reliably capture the epidemic trajectory in multiple countries. The medium-term forecasts can be used in conjunction with the short-term forecasts of COVID-19 mortality as a useful planning tool as countries continue to relax public health measures.


Subject(s)
COVID-19 , Epidemics , Humans , COVID-19/epidemiology , Retrospective Studies , SARS-CoV-2 , Time , Forecasting
4.
Philos Trans R Soc Lond B Biol Sci ; 378(1887): 20220278, 2023 10 09.
Article in English | MEDLINE | ID: mdl-37598701

ABSTRACT

In 2012, the World Health Organization (WHO) set the elimination of Chagas disease intradomiciliary vectorial transmission as a goal by 2020. After a decade, some progress has been made, but the new 2021-2030 WHO roadmap has set even more ambitious targets. Innovative and robust modelling methods are required to monitor progress towards these goals. We present a modelling pipeline using local seroprevalence data to obtain national disease burden estimates by disease stage. Firstly, local seroprevalence information is used to estimate spatio-temporal trends in the Force-of-Infection (FoI). FoI estimates are then used to predict such trends across larger and fine-scale geographical areas. Finally, predicted FoI values are used to estimate disease burden based on a disease progression model. Using Colombia as a case study, we estimated that the number of infected people would reach 506 000 (95% credible interval (CrI) = 395 000-648 000) in 2020 with a 1.0% (95%CrI = 0.8-1.3%) prevalence in the general population and 2400 (95%CrI = 1900-3400) deaths (approx. 0.5% of those infected). The interplay between a decrease in infection exposure (FoI and relative proportion of acute cases) was overcompensated by a large increase in population size and gradual population ageing, leading to an increase in the absolute number of Chagas disease cases over time. This article is part of the theme issue 'Challenges and opportunities in the fight against neglected tropical diseases: a decade from the London Declaration on NTDs'.


Subject(s)
Aging , Chagas Disease , Humans , Seroepidemiologic Studies , Chagas Disease/epidemiology , Colombia , Cost of Illness , Neglected Diseases/epidemiology
5.
Animals (Basel) ; 13(15)2023 Jul 28.
Article in English | MEDLINE | ID: mdl-37570257

ABSTRACT

A questionnaire to gather evidence on the plastic entanglement of the European hedgehog (Erinaceus europaeus) was sent to 160 wildlife rehabilitation centres in Great Britain. Fifty-four responses were received, and 184 individual admissions owing to plastic entanglement were reported. Death was the outcome for 46% (n = 86) of these cases. A high proportion of Britain's hedgehogs enter rehabilitation centres annually (approximately 5% of the national population and potentially 10% of the urban population), providing a robust basis for assessing the minimum impacts at a national level. We estimate that 4000-7000 hedgehog deaths per year are attributable to plastic, with the true rate likely being higher, since many entangled hedgehogs-in contrast to those involved in road traffic accidents-will not be found. Population modelling indicates that this excess mortality is sufficient to cause population declines. Although the scale of the impact is much lower than that attributable to traffic, it is nevertheless an additional pressure on a species that is already in decline and presents a significant welfare issue to a large number of individuals.

6.
PLoS Comput Biol ; 19(8): e1011439, 2023 08.
Article in English | MEDLINE | ID: mdl-37639484

ABSTRACT

The time-varying reproduction number (Rt) is an important measure of epidemic transmissibility that directly informs policy decisions and the optimisation of control measures. EpiEstim is a widely used opensource software tool that uses case incidence and the serial interval (SI, time between symptoms in a case and their infector) to estimate Rt in real-time. The incidence and the SI distribution must be provided at the same temporal resolution, which can limit the applicability of EpiEstim and other similar methods, e.g. for contexts where the time window of incidence reporting is longer than the mean SI. In the EpiEstim R package, we implement an expectation-maximisation algorithm to reconstruct daily incidence from temporally aggregated data, from which Rt can then be estimated. We assess the validity of our method using an extensive simulation study and apply it to COVID-19 and influenza data. For all datasets, the influence of intra-weekly variability in reported data was mitigated by using aggregated weekly data. Rt estimated on weekly sliding windows using incidence reconstructed from weekly data was strongly correlated with estimates from the original daily data. The simulation study revealed that Rt was well estimated in all scenarios and regardless of the temporal aggregation of the data. In the presence of weekend effects, Rt estimates from reconstructed data were more successful at recovering the true value of Rt than those obtained from reported daily data. These results show that this novel method allows Rt to be successfully recovered from aggregated data using a simple approach with very few data requirements. Additionally, by removing administrative noise when daily incidence data are reconstructed, the accuracy of Rt estimates can be improved.


Subject(s)
COVID-19 , Humans , Incidence , Software , Computer Simulation , Reproduction
7.
Epidemics ; 44: 100692, 2023 09.
Article in English | MEDLINE | ID: mdl-37399634

ABSTRACT

The evolution of SARS-CoV-2 has demonstrated that emerging variants can set back the global COVID-19 response. The ability to rapidly assess the threat of new variants is critical for timely optimisation of control strategies. We present a novel method to estimate the effective transmission advantage of a new variant compared to a reference variant combining information across multiple locations and over time. Through an extensive simulation study designed to mimic real-time epidemic contexts, we show that our method performs well across a range of scenarios and provide guidance on its optimal use and interpretation of results. We also provide an open-source software implementation of our method. The computational speed of our tool enables users to rapidly explore spatial and temporal variations in the estimated transmission advantage. We estimate that the SARS-CoV-2 Alpha variant is 1.46 (95% Credible Interval 1.44-1.47) and 1.29 (95% CrI 1.29-1.30) times more transmissible than the wild type, using data from England and France respectively. We further estimate that Delta is 1.77 (95% CrI 1.69-1.85) times more transmissible than Alpha (England data). Our approach can be used as an important first step towards quantifying the threat of emerging or co-circulating variants of infectious pathogens in real-time.


Subject(s)
COVID-19 , Epidemics , Humans , SARS-CoV-2 , COVID-19/epidemiology , Computer Simulation
8.
Epidemics ; 44: 100703, 2023 09.
Article in English | MEDLINE | ID: mdl-37385853

ABSTRACT

The seasonality of African swine fever (ASF) outbreaks in domestic pigs differs between temperate and subtropical/tropical regions. We hypothesise that variations in the importance of wild boar-to-farm and farm-to-farm transmission routes shape these contrasting patterns, and we emphasise the implications for effective ASF control.


Subject(s)
African Swine Fever Virus , African Swine Fever , Swine Diseases , Swine , Animals , African Swine Fever/epidemiology , Sus scrofa , Disease Outbreaks/veterinary , Disease Outbreaks/prevention & control , Farms , Swine Diseases/epidemiology
9.
Lancet Infect Dis ; 23(9): e383-e388, 2023 09.
Article in English | MEDLINE | ID: mdl-37150186

ABSTRACT

Novel data and analyses have had an important role in informing the public health response to the COVID-19 pandemic. Existing surveillance systems were scaled up, and in some instances new systems were developed to meet the challenges posed by the magnitude of the pandemic. We describe the routine and novel data that were used to address urgent public health questions during the pandemic, underscore the challenges in sustainability and equity in data generation, and highlight key lessons learnt for designing scalable data collection systems to support decision making during a public health crisis. As countries emerge from the acute phase of the pandemic, COVID-19 surveillance systems are being scaled down. However, SARS-CoV-2 resurgence remains a threat to global health security; therefore, a minimal cost-effective system needs to remain active that can be rapidly scaled up if necessary. We propose that a retrospective evaluation to identify the cost-benefit profile of the various data streams collected during the pandemic should be on the scientific research agenda.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , SARS-CoV-2 , Pandemics/prevention & control , Retrospective Studies , Data Collection
10.
Proc Biol Sci ; 290(1997): 20230183, 2023 04 26.
Article in English | MEDLINE | ID: mdl-37072038

ABSTRACT

We investigated the transmission dynamics of lyssavirus in Myotis myotis and Myotis blythii, using serological, virological, demographic and ecological data collected between 2015 and 2022 from two maternity colonies in northern Italian churches. Despite no lyssavirus detection in 556 bats sampled over 11 events by reverse transcription-polymerase chain reaction (RT-PCR), 36.3% of 837 bats sampled over 27 events showed neutralizing antibodies to European bat lyssavirus 1, with a significant increase in summers. By fitting sets of mechanistic models to seroprevalence data, we investigated factors that influenced lyssavirus transmission within and between years. Five models were selected as a group of final models: in one model, a proportion of exposed bats (median model estimate: 5.8%) became infectious and died while the other exposed bats recovered with immunity without becoming infectious; in the other four models, all exposed bats became infectious and recovered with immunity. The final models supported that the two colonies experienced seasonal outbreaks driven by: (i) immunity loss particularly during hibernation, (ii) density-dependent transmission, and (iii) a high transmission rate after synchronous birthing. These findings highlight the importance of understanding ecological factors, including colony size and synchronous birthing timing, and potential infection heterogeneities to enable more robust assessments of lyssavirus spillover risk.


Subject(s)
Chiroptera , Rhabdoviridae Infections , Humans , Pregnancy , Animals , Female , Rhabdoviridae Infections/epidemiology , Rhabdoviridae Infections/veterinary , Seroepidemiologic Studies , Antibodies, Viral , RNA, Viral/analysis
12.
Nat Commun ; 14(1): 2148, 2023 04 14.
Article in English | MEDLINE | ID: mdl-37059861

ABSTRACT

During the COVID-19 pandemic, national testing programmes were conducted worldwide on unprecedented scales. While testing behaviour is generally recognised as dynamic and complex, current literature demonstrating and quantifying such relationships is scarce, despite its importance for infectious disease surveillance and control. Here, we characterise the impacts of SARS-CoV-2 transmission, disease susceptibility/severity, risk perception, and public health measures on SARS-CoV-2 PCR testing behaviour in England over 20 months of the pandemic, by linking testing trends to underlying epidemic trends and contextual meta-data within a systematic conceptual framework. The best-fitting model describing SARS-CoV-2 PCR testing behaviour explained close to 80% of the total deviance in NHS test data. Testing behaviour showed complex associations with factors reflecting transmission level, disease susceptibility/severity (e.g. age, dominant variant, and vaccination), public health measures (e.g. testing strategies and lockdown), and associated changes in risk perception, varying throughout the pandemic and differing between infected and non-infected people.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , COVID-19/diagnosis , COVID-19/epidemiology , COVID-19 Testing , Pandemics/prevention & control , Disease Susceptibility , Communicable Disease Control , England/epidemiology
13.
Epidemics ; 42: 100666, 2023 03.
Article in English | MEDLINE | ID: mdl-36689876

ABSTRACT

Reliable estimates of human mobility are important for understanding the spatial spread of infectious diseases and the effective targeting of control measures. However, when modelling infectious disease dynamics, data on human mobility at an appropriate temporal or spatial resolution are not always available, leading to the common use of model-derived mobility proxies. In this study we reviewed the different data sources and mobility models that have been used to characterise human movement in Africa. We then conducted a simulation study to better understand the implications of using human mobility proxies when predicting the spatial spread and dynamics of infectious diseases. We found major gaps in the availability of empirical measures of human mobility in Africa, leading to mobility proxies being used in place of data. Empirical data on subnational mobility were only available for 17/54 countries, and in most instances, these data characterised long-term movement patterns, which were unsuitable for modelling the spread of pathogens with short generation times (time between infection of a case and their infector). Results from our simulation study demonstrated that using mobility proxies can have a substantial impact on the predicted epidemic dynamics, with complex and non-intuitive biases. In particular, the predicted times and order of epidemic invasion, and the time of epidemic peak in different locations can be underestimated or overestimated, depending on the types of proxies used and the country of interest. Our work underscores the need for regularly updated empirical measures of population movement within and between countries to aid the prevention and control of infectious disease outbreaks. At the same time, there is a need to establish an evidence base to help understand which types of mobility data are most appropriate for describing the spread of emerging infectious diseases in different settings.


Subject(s)
Communicable Diseases , Epidemics , Humans , Computer Simulation , Disease Outbreaks , Africa , Communicable Diseases/epidemiology
14.
R Soc Open Sci ; 9(11): 220491, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36465672

ABSTRACT

Zika virus (ZIKV) is a mosquito-borne pathogen that caused a major epidemic in the Americas in 2015-2017. Although the majority of ZIKV infections are asymptomatic, the virus has been associated with congenital birth defects and neurological complications (NC) in adults. We combined multiple data sources to improve estimates of ZIKV infection attack rates (IARs), reporting rates of Zika virus disease (ZVD) and the risk of ZIKV-associated NC for 28 capital cities in Colombia. ZVD surveillance data were combined with post-epidemic seroprevalence data and a dataset on ZIKV-associated NC in a Bayesian hierarchical model. We found substantial heterogeneity in ZIKV IARs across cities. The overall estimated ZIKV IAR across the 28 cities was 0.38 (95% CrI: 0.17-0.92). The estimated ZVD reporting rate was 0.013 (95% CrI: 0.004-0.024), and 0.51 (95% CrI: 0.17-0.92) cases of ZIKV-associated NC were estimated to be reported per 10 000 ZIKV infections. When we assumed the same ZIKV IAR across sex or age group, we found important spatial heterogeneities in ZVD reporting rates and the risk of being reported as a ZVD case with NC. Our results highlight how additional data sources can be used to overcome biases in surveillance data and estimate key epidemiological parameters.

15.
Vet Res ; 53(1): 106, 2022 Dec 12.
Article in English | MEDLINE | ID: mdl-36510331

ABSTRACT

The "Zero by 30" strategic plan aims to eliminate human deaths from dog-mediated rabies by 2030 and domestic dog vaccination is a vital component of this strategic plan. In areas where domestic dog vaccination has been implemented, it is important to assess the impact of this intervention. Additionally, understanding temporal and seasonal trends in the incidence of animal rabies cases may assist in optimizing such interventions. Data on the incidence of probable rabies cases in domestic and wild animals were collected between January 2011 and December 2018 in thirteen districts of south-east Tanzania where jackals comprise over 40% of reported rabies cases. Vaccination coverage was estimated over this period, as five domestic dog vaccination campaigns took place in all thirteen districts between 2011 and 2016. Negative binomial generalized linear models were used to explore the impact of domestic dog vaccination on the annual incidence of animal rabies cases, whilst generalized additive models were used to investigate the presence of temporal and/or seasonal trends. Increases in domestic dog vaccination coverage were significantly associated with a decreased incidence of rabies cases in both domestic dogs and jackals. A 35% increase in vaccination coverage was associated with a reduction in the incidence of probable dog rabies cases of between 78.0 and 85.5% (95% confidence intervals ranged from 61.2 to 92.2%) and a reduction in the incidence of probable jackal rabies cases of between 75.3 and 91.2% (95% confidence intervals ranged from 53.0 to 96.1%). A statistically significant common seasonality was identified in the monthly incidence of probable rabies cases in both domestic dogs and jackals with the highest incidence from February to August and lowest incidence from September to January. These results align with evidence supporting the use of domestic dog vaccination as part of control strategies aimed at reducing animal rabies cases in both domestic dogs and jackals in this region. The presence of a common seasonal trend requires further investigation but may have implications for the timing of future vaccination campaigns.


Subject(s)
Dog Diseases , Rabies Vaccines , Rabies , Animals , Dogs , Humans , Animals, Domestic , Dog Diseases/epidemiology , Dog Diseases/prevention & control , Rabies/epidemiology , Rabies/prevention & control , Rabies/veterinary , Animals, Wild , Incidence , Vaccination/veterinary
16.
Commun Med (Lond) ; 2(1): 136, 2022 Oct 31.
Article in English | MEDLINE | ID: mdl-36352249

ABSTRACT

BACKGROUND: During the COVID-19 pandemic there has been a strong interest in forecasts of the short-term development of epidemiological indicators to inform decision makers. In this study we evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland for the period from January through April 2021. METHODS: We evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland. These were issued by 15 different forecasting models, run by independent research teams. Moreover, we study the performance of combined ensemble forecasts. Evaluation of probabilistic forecasts is based on proper scoring rules, along with interval coverage proportions to assess calibration. The presented work is part of a pre-registered evaluation study. RESULTS: We find that many, though not all, models outperform a simple baseline model up to four weeks ahead for the considered targets. Ensemble methods show very good relative performance. The addressed time period is characterized by rather stable non-pharmaceutical interventions in both countries, making short-term predictions more straightforward than in previous periods. However, major trend changes in reported cases, like the rebound in cases due to the rise of the B.1.1.7 (Alpha) variant in March 2021, prove challenging to predict. CONCLUSIONS: Multi-model approaches can help to improve the performance of epidemiological forecasts. However, while death numbers can be predicted with some success based on current case and hospitalization data, predictability of case numbers remains low beyond quite short time horizons. Additional data sources including sequencing and mobility data, which were not extensively used in the present study, may help to improve performance.


We compare forecasts of weekly case and death numbers for COVID-19 in Germany and Poland based on 15 different modelling approaches. These cover the period from January to April 2021 and address numbers of cases and deaths one and two weeks into the future, along with the respective uncertainties. We find that combining different forecasts into one forecast can enable better predictions. However, case numbers over longer periods were challenging to predict. Additional data sources, such as information about different versions of the SARS-CoV-2 virus present in the population, might improve forecasts in the future.

17.
PLoS Negl Trop Dis ; 16(7): e0010594, 2022 07.
Article in English | MEDLINE | ID: mdl-35853042

ABSTRACT

BACKGROUND: Chagas disease is a long-lasting disease with a prolonged asymptomatic period. Cumulative indices of infection such as prevalence do not shed light on the current epidemiological situation, as they integrate infection over long periods. Instead, metrics such as the Force-of-Infection (FoI) provide information about the rate at which susceptible people become infected and permit sharper inference about temporal changes in infection rates. FoI is estimated by fitting (catalytic) models to available age-stratified serological (ground-truth) data. Predictive FoI modelling frameworks are then used to understand spatial and temporal trends indicative of heterogeneity in transmission and changes effected by control interventions. Ideally, these frameworks should be able to propagate uncertainty and handle spatiotemporal issues. METHODOLOGY/PRINCIPAL FINDINGS: We compare three methods in their ability to propagate uncertainty and provide reliable estimates of FoI for Chagas disease in Colombia as a case study: two Machine Learning (ML) methods (Boosted Regression Trees (BRT) and Random Forest (RF)), and a Linear Model (LM) framework that we had developed previously. Our analyses show consistent results between the three modelling methods under scrutiny. The predictors (explanatory variables) selected, as well as the location of the most uncertain FoI values, were coherent across frameworks. RF was faster than BRT and LM, and provided estimates with fewer extreme values when extrapolating to areas where no ground-truth data were available. However, BRT and RF were less efficient at propagating uncertainty. CONCLUSIONS/SIGNIFICANCE: The choice of FoI predictive models will depend on the objectives of the analysis. ML methods will help characterise the mean behaviour of the estimates, while LM will provide insight into the uncertainty surrounding such estimates. Our approach can be extended to the modelling of FoI patterns in other Chagas disease-endemic countries and to other infectious diseases for which serosurveys are regularly conducted for surveillance.


Subject(s)
Chagas Disease , Machine Learning , Chagas Disease/epidemiology , Colombia , Humans , Linear Models , Prevalence
18.
BMC Med Res Methodol ; 22(1): 13, 2022 01 13.
Article in English | MEDLINE | ID: mdl-35027002

ABSTRACT

Age-stratified serosurvey data are often used to understand spatiotemporal trends in disease incidence and exposure through estimating the Force-of-Infection (FoI). Typically, median or mean FoI estimates are used as the response variable in predictive models, often overlooking the uncertainty in estimated FoI values when fitting models and evaluating their predictive ability. To assess how this uncertainty impact predictions, we compared three approaches with three levels of uncertainty integration. We propose a performance indicator to assess how predictions reflect initial uncertainty.In Colombia, 76 serosurveys (1980-2014) conducted at municipality level provided age-stratified Chagas disease prevalence data. The yearly FoI was estimated at the serosurvey level using a time-varying catalytic model. Environmental, demographic and entomological predictors were used to fit and predict the FoI at municipality level from 1980 to 2010 across Colombia.A stratified bootstrap method was used to fit the models without temporal autocorrelation at the serosurvey level. The predictive ability of each model was evaluated to select the best-fit models within urban, rural and (Amerindian) indigenous settings. Model averaging, with the 10 best-fit models identified, was used to generate predictions.Our analysis shows a risk of overconfidence in model predictions when median estimates of FoI alone are used to fit and evaluate models, failing to account for uncertainty in FoI estimates. Our proposed methodology fully propagates uncertainty in the estimated FoI onto the generated predictions, providing realistic assessments of both central tendency and current uncertainty surrounding exposure to Chagas disease.


Subject(s)
Chagas Disease , Chagas Disease/diagnosis , Chagas Disease/epidemiology , Cities , Colombia/epidemiology , Humans , Prevalence , Uncertainty
19.
PLOS Digit Health ; 1(6): e0000052, 2022 Jun.
Article in English | MEDLINE | ID: mdl-36812522

ABSTRACT

The time-varying reproduction number (Rt) is an important measure of transmissibility during outbreaks. Estimating whether and how rapidly an outbreak is growing (Rt > 1) or declining (Rt < 1) can inform the design, monitoring and adjustment of control measures in real-time. We use a popular R package for Rt estimation, EpiEstim, as a case study to evaluate the contexts in which Rt estimation methods have been used and identify unmet needs which would enable broader applicability of these methods in real-time. A scoping review, complemented by a small EpiEstim user survey, highlight issues with the current approaches, including the quality of input incidence data, the inability to account for geographical factors, and other methodological issues. We summarise the methods and software developed to tackle the problems identified, but conclude that significant gaps remain which should be addressed to enable easier, more robust and applicable estimation of Rt during epidemics.

20.
PLoS One ; 16(9): e0257005, 2021.
Article in English | MEDLINE | ID: mdl-34525098

ABSTRACT

BACKGROUND: Machine learning (ML) algorithms are now increasingly used in infectious disease epidemiology. Epidemiologists should understand how ML algorithms behave within the context of outbreak data where missingness of data is almost ubiquitous. METHODS: Using simulated data, we use a ML algorithmic framework to evaluate data imputation performance and the resulting case fatality ratio (CFR) estimates, focusing on the scale and type of data missingness (i.e., missing completely at random-MCAR, missing at random-MAR, or missing not at random-MNAR). RESULTS: Across ML methods, dataset sizes and proportions of training data used, the area under the receiver operating characteristic curve decreased by 7% (median, range: 1%-16%) when missingness was increased from 10% to 40%. Overall reduction in CFR bias for MAR across methods, proportion of missingness, outbreak size and proportion of training data was 0.5% (median, range: 0%-11%). CONCLUSION: ML methods could reduce bias and increase the precision in CFR estimates at low levels of missingness. However, no method is robust to high percentages of missingness. Thus, a datacentric approach is recommended in outbreak settings-patient survival outcome data should be prioritised for collection and random-sample follow-ups should be implemented to ascertain missing outcomes.


Subject(s)
Disease Outbreaks , Hemorrhagic Fever, Ebola/mortality , Machine Learning , Models, Statistical , Computer Simulation , Data Interpretation, Statistical , Datasets as Topic , Hemorrhagic Fever, Ebola/epidemiology , Humans , Research Design , Survival Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...