RESUMEN
Policymakers must make management decisions despite incomplete knowledge and conflicting model projections. Little guidance exists for the rapid, representative, and unbiased collection of policy-relevant scientific input from independent modeling teams. Integrating approaches from decision analysis, expert judgment, and model aggregation, we convened multiple modeling teams to evaluate COVID-19 reopening strategies for a mid-sized United States county early in the pandemic. Projections from seventeen distinct models were inconsistent in magnitude but highly consistent in ranking interventions. The 6-mo-ahead aggregate projections were well in line with observed outbreaks in mid-sized US counties. The aggregate results showed that up to half the population could be infected with full workplace reopening, while workplace restrictions reduced median cumulative infections by 82%. Rankings of interventions were consistent across public health objectives, but there was a strong trade-off between public health outcomes and duration of workplace closures, and no win-win intermediate reopening strategies were identified. Between-model variation was high; the aggregate results thus provide valuable risk quantification for decision making. This approach can be applied to the evaluation of management interventions in any setting where models are used to inform decision making. This case study demonstrated the utility of our approach and was one of several multimodel efforts that laid the groundwork for the COVID-19 Scenario Modeling Hub, which has provided multiple rounds of real-time scenario projections for situational awareness and decision making to the Centers for Disease Control and Prevention since December 2020.
Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , COVID-19/prevención & control , Incertidumbre , Brotes de Enfermedades/prevención & control , Salud Pública , Pandemias/prevención & controlRESUMEN
BACKGROUND: Health authorities can minimize the impact of an emergent infectious disease outbreak through effective and timely risk communication, which can build trust and adherence to subsequent behavioral messaging. Monitoring the psychological impacts of an outbreak, as well as public adherence to such messaging, is also important for minimizing long-term effects of an outbreak. OBJECTIVE: We used social media data from Twitter to identify human behaviors relevant to COVID-19 transmission, as well as the perceived impacts of COVID-19 on individuals, as a first step toward real-time monitoring of public perceptions to inform public health communications. METHODS: We developed a coding schema for 6 categories and 11 subcategories, which included both a wide number of behaviors as well codes focused on the impacts of the pandemic (eg, economic and mental health impacts). We used this to develop training data and develop supervised learning classifiers for classes with sufficient labels. Classifiers that performed adequately were applied to our remaining corpus, and temporal and geospatial trends were assessed. We compared the classified patterns to ground truth mobility data and actual COVID-19 confirmed cases to assess the signal achieved here. RESULTS: We applied our labeling schema to approximately 7200 tweets. The worst-performing classifiers had F1 scores of only 0.18 to 0.28 when trying to identify tweets about monitoring symptoms and testing. Classifiers about social distancing, however, were much stronger, with F1 scores of 0.64 to 0.66. We applied the social distancing classifiers to over 228 million tweets. We showed temporal patterns consistent with real-world events, and we showed correlations of up to -0.5 between social distancing signals on Twitter and ground truth mobility throughout the United States. CONCLUSIONS: Behaviors discussed on Twitter are exceptionally varied. Twitter can provide useful information for parameterizing models that incorporate human behavior, as well as for informing public health communication strategies by describing awareness of and compliance with suggested behaviors.
Asunto(s)
COVID-19 , Minería de Datos , Conductas Relacionadas con la Salud , Comunicación en Salud , Medios de Comunicación Sociales , COVID-19/epidemiología , Educación en Salud , Humanos , Salud Mental , Pandemias , Estados UnidosRESUMEN
BACKGROUND: Influenza epidemics result in a public health and economic burden worldwide. Traditional surveillance techniques, which rely on doctor visits, provide data with a delay of 1 to 2 weeks. A means of obtaining real-time data and forecasting future outbreaks is desirable to provide more timely responses to influenza epidemics. OBJECTIVE: This study aimed to present the first implementation of a novel dataset by demonstrating its ability to supplement traditional disease surveillance at multiple spatial resolutions. METHODS: We used internet traffic data from the Centers for Disease Control and Prevention (CDC) website to determine the potential usability of this data source. We tested the traffic generated by 10 influenza-related pages in 8 states and 9 census divisions within the United States and compared it against clinical surveillance data. RESULTS: Our results yielded an r2 value of 0.955 in the most successful case, promising results for some cases, and unsuccessful results for other cases. In the interest of scientific transparency to further the understanding of when internet data streams are an appropriate supplemental data source, we also included negative results (ie, unsuccessful models). Models that focused on a single influenza season were more successful than those that attempted to model multiple influenza seasons. Geographic resolution appeared to play a key role, with national and regional models being more successful, overall, than models at the state level. CONCLUSIONS: These results demonstrate that internet data may be able to complement traditional influenza surveillance in some cases but not in others. Specifically, our results show that the CDC website traffic may inform national- and division-level models but not models for each individual state. In addition, our results show better agreement when the data were broken up by seasons instead of aggregated over several years. We anticipate that this work will lead to more complex nowcasting and forecasting models using this data stream.
Asunto(s)
Centers for Disease Control and Prevention, U.S./normas , Gripe Humana/epidemiología , Análisis de Datos , Humanos , Incidencia , Internet , Salud Pública , Estados UnidosRESUMEN
Biosurveillance, a relatively young field, has recently increased in importance because of increasing emphasis on global health. Databases and tools describing particular subsets of disease are becoming increasingly common in the field. Here, we present an infectious disease database that includes diseases of biosurveillance relevance and an extensible framework for the easy expansion of the database.
Asunto(s)
Biovigilancia/métodos , Enfermedades Transmisibles , Bases de Datos Factuales , HumanosRESUMEN
Mathematical models, such as those that forecast the spread of epidemics or predict the weather, must overcome the challenges of integrating incomplete and inaccurate data in computer simulations, estimating the probability of multiple possible scenarios, incorporating changes in human behavior and/or the pathogen, and environmental factors. In the past 3 decades, the weather forecasting community has made significant advances in data collection, assimilating heterogeneous data steams into models and communicating the uncertainty of their predictions to the general public. Epidemic modelers are struggling with these same issues in forecasting the spread of emerging diseases, such as Zika virus infection and Ebola virus disease. While weather models rely on physical systems, data from satellites, and weather stations, epidemic models rely on human interactions, multiple data sources such as clinical surveillance and Internet data, and environmental or biological factors that can change the pathogen dynamics. We describe some of similarities and differences between these 2 fields and how the epidemic modeling community is rising to the challenges posed by forecasting to help anticipate and guide the mitigation of epidemics. We conclude that some of the fundamental differences between these 2 fields, such as human behavior, make disease forecasting more challenging than weather forecasting.
Asunto(s)
Conducta , Enfermedades Transmisibles/epidemiología , Epidemias , Predicción/métodos , Simulación por Computador , Humanos , Almacenamiento y Recuperación de la Información , Internet , Modelos TeóricosRESUMEN
Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for developing prevention and mitigation strategies. We combine modern data assimilation methods with Wikipedia access logs and CDC influenza-like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013-2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy. Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, our forecasting method produced 50% and 95% credible intervals for the 2013-2014 ILI observations that contained the actual observations for most weeks in the forecast. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has passed.
Asunto(s)
Predicción/métodos , Gripe Humana/epidemiología , Internet , Centers for Disease Control and Prevention, U.S. , Biología Computacional , Monitoreo Epidemiológico , Historia del Siglo XXI , Humanos , Modelos Estadísticos , Estaciones del Año , Estados Unidos/epidemiologíaRESUMEN
Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data, such as social media and search queries, are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with r2 up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.
Asunto(s)
Enfermedades Transmisibles/epidemiología , Bases de Datos Factuales , Brotes de Enfermedades/estadística & datos numéricos , Monitoreo del Ambiente/métodos , Predicción/métodos , Internet , Salud Global , Humanos , Modelos TeóricosRESUMEN
Implementing realistic activity patterns for a population is crucial for modeling, for example, disease spread, supply and demand, and disaster response. Using the dynamic activity simulation engine, DASim, we generate schedules for a population that capture regular (e.g., working, eating, and sleeping) and irregular activities (e.g., shopping or going to the doctor). We use the sample entropy (SampEn) statistic to quantify a schedule's regularity for a population. We show how to tune an activity's regularity by adjusting SampEn, thereby making it possible to realistically design activities when creating a schedule. The tuning process sets up a computationally intractable high-dimensional optimization problem. To reduce the computational demand, we use Bayesian Gaussian process regression to compute global sensitivity indices and identify the parameters that have the greatest effect on the variance of SampEn. We use the harmony search (HS) global optimization algorithm to locate global optima. Our results show that HS combined with global sensitivity analysis can efficiently tune the SampEn statistic with few search iterations. We demonstrate how global sensitivity analysis can guide statistical emulation and global optimization algorithms to efficiently tune activities and generate realistic activity patterns. Though our tuning methods are applied to dynamic activity schedule generation, they are general and represent a significant step in the direction of automated tuning and optimization of high-dimensional computer simulations.
RESUMEN
BACKGROUND: Data from surveillance networks help epidemiologists and public health officials detect emerging diseases, conduct outbreak investigations, manage epidemics, and better understand the mechanics of a particular disease. Surveillance networks are used to determine outbreak intensity (i.e., disease burden) and outbreak timing (i.e., the start, peak, and end of the epidemic), as well as outbreak location. Networks can be tuned to preferentially perform these tasks. Given that resources are limited, careful site selection can save costs while minimizing performance loss. METHODS: We study three different site placement algorithms: two algorithms based on the maximal coverage model and one based on the K-median model. The maximal coverage model chooses sites that maximize the total number of people within a specified distance of a site. The K-median model minimizes the sum of the distances from each individual to the individual's nearest site. Using a ground truth dataset consisting of two million de-identified Medicaid billing records representing eight complete influenza seasons and an evaluation function based on the Huff spatial interaction model, we empirically compare networks against the existing Iowa Department of Public Health influenza-like illness network by simulating the spread of influenza across the state of Iowa. RESULTS: We show that it is possible to design a network that achieves outbreak intensity performance identical to the status quo network using two fewer sites. We also show that if outbreak timing detection is of primary interest, it is actually possible to create a network that matches the existing network's performance using 59% fewer sites. CONCLUSIONS: By simulating the spread of influenza across the state of Iowa, we show that our methods are capable of designing networks that perform better than the status quo in terms of both outbreak intensity and timing. Additionally, our results suggest that network size may only play a minimal role in outbreak timing detection. Finally, we show that it may be possible to reduce the size of a surveillance system without affecting the quality of surveillance information produced.
Asunto(s)
Brotes de Enfermedades , Gripe Humana/epidemiología , Internet , Vigilancia de Guardia , Humanos , Gripe Humana/diagnóstico , Salud Pública/métodos , Estados Unidos/epidemiologíaRESUMEN
Background: Short-term forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID-19 cases and deaths across Europe between 08 March 2021 and 07 March 2022. Methods: We used open-source tools to develop a public European COVID-19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID-19 cases and deaths reported by a standardised source for 32 countries over the next 1-4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equally-weighted average (initially the mean and then from 26th July the median) of all individual models' predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models' forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models' past predictive performance. Results: Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models' forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models' forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models' forecasts of deaths (N=763 predictions from 20 models). Across a 1-4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models. Conclusions: Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks. Funding: AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA1-19-D-0007, and respectively Virginia Dept of Health Grant VDH-21-501-0141, VDH-21-501-0143, VDH-21-501-0147, VDH-21-501-0145, VDH-21-501-0146, VDH-21-501-0142, VDH-21-501-0148. AF, AMa, GL funded by SMIGE - Modelli statistici inferenziali per governare l'epidemia, FISR 2020-Covid-19 I Fase, FISR2020IP-00156, Codice Progetto: PRJ-0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission - Directorate-General for Communications Networks, Content and Technology through the contract LC-01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018-095456-B-I00. DE., MGu funded by Spanish Ministry of Health / REACT-UE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021-021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID-19 (https://www.nfdi4health.de/task-force-covid-19-2) within the framework of a DFG-project (LO-342/17-1). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for real-time monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17-043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16-013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).
Asunto(s)
COVID-19 , Enfermedades Transmisibles , Epidemias , Humanos , COVID-19/diagnóstico , COVID-19/epidemiología , Predicción , Modelos Estadísticos , Estudios RetrospectivosRESUMEN
BACKGROUND AND PURPOSE: The current self-initiated approach by which hospitals acquire Primary Stroke Center (PSC) certification provides insufficient coverage for large areas of the United States. An alternative, directed, algorithmic approach to determine near optimal locations of PSCs would be justified if it significantly improves coverage. METHODS: Using geographic location-allocation modeling techniques, we developed a universal web-based calculator for selecting near optimal PSC locations designed to maximize the population coverage in any state. We analyzed the current PSC network population coverage in Iowa and compared it with the coverage that would exist if a maximal coverage model had instead been used to place those centers. We then estimated the expected gains in population coverage if additional PSCs follow the current self-initiated model and compared it against the more efficient coverage expected by use of a maximal coverage model to select additional locations. RESULTS: The existing 12 self-initiated PSCs in Iowa cover 37% of the population, assuming a time-distance radius of 30 minutes. The current population coverage would have been 47.5% if those 12 PSCs had been located using a maximal coverage model. With the current self-initiated approach, 54 additional PSCs on average will be needed to improve coverage to 75% of the population. Conversely, only 31 additional PSCs would be needed to achieve the same degree of population coverage if a maximal coverage model is used. CONCLUSIONS: Given the substantial gain in population access to adequate acute stroke care, it appears justified to direct the location of additional PSCs or recombinant tissue-type plasminogen activator-capable hospitals through a maximal coverage model algorithmic approach.
Asunto(s)
Accesibilidad a los Servicios de Salud/organización & administración , Atención Primaria de Salud/organización & administración , Accidente Cerebrovascular/terapia , Algoritmos , Servicios Médicos de Urgencia , Fibrinolíticos/uso terapéutico , Geografía , Hospitales Comunitarios , Humanos , Iowa , Modelos Organizacionales , Población , Asignación de Recursos , Población Rural , Terapia Trombolítica , Factores de Tiempo , Activador de Tejido Plasminógeno/uso terapéutico , Tomografía Computarizada por Rayos XRESUMEN
INTRODUCTION: School-age children play a key role in the spread of airborne viruses like influenza due to the prolonged and close contacts they have in school settings. As a result, school closures and other non-pharmaceutical interventions were recommended as the first line of defense in response to the novel coronavirus pandemic (COVID-19). METHODS: We used an agent-based model that simulates communities across the United States including daycares, primary, and secondary schools to quantify the relative health outcomes of reopening schools for the period of August 15, 2020 to April 11, 2021. Our simulation was carried out in early September 2020 and was based on the latest (at the time) Centers for Disease Control and Prevention (CDC)'s Pandemic Planning Scenarios released in May 2020. We explored different reopening scenarios including virtual learning, in-person school, and several hybrid options that stratify the student population into cohorts in order to reduce exposure and pathogen spread. RESULTS: Scenarios where cohorts of students return to school in non-overlapping formats, which we refer to as hybrid scenarios, resulted in significant decreases in the percentage of symptomatic individuals with COVID-19, by as much as 75%. These hybrid scenarios have only slightly more negative health impacts of COVID-19 compared to implementing a 100% virtual learning scenario. Hybrid scenarios can significantly avert the number of COVID-19 cases at the national scale-approximately between 28 M and 60 M depending on the scenario-over the simulated eight-month period. We found the results of our simulations to be highly dependent on the number of workplaces assumed to be open for in-person business, as well as the initial level of COVID-19 incidence within the simulated community. CONCLUSION: In an evolving pandemic, while a large proportion of people remain susceptible, reducing the number of students attending school leads to better health outcomes; part-time in-classroom education substantially reduces health risks.
Asunto(s)
COVID-19 , Niño , Estados Unidos/epidemiología , Humanos , COVID-19/epidemiología , Estudios Retrospectivos , Pandemias/prevención & control , SARS-CoV-2 , Instituciones AcadémicasRESUMEN
BACKGROUND: During the COVID-19 pandemic there has been a strong interest in forecasts of the short-term development of epidemiological indicators to inform decision makers. In this study we evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland for the period from January through April 2021. METHODS: We evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland. These were issued by 15 different forecasting models, run by independent research teams. Moreover, we study the performance of combined ensemble forecasts. Evaluation of probabilistic forecasts is based on proper scoring rules, along with interval coverage proportions to assess calibration. The presented work is part of a pre-registered evaluation study. RESULTS: We find that many, though not all, models outperform a simple baseline model up to four weeks ahead for the considered targets. Ensemble methods show very good relative performance. The addressed time period is characterized by rather stable non-pharmaceutical interventions in both countries, making short-term predictions more straightforward than in previous periods. However, major trend changes in reported cases, like the rebound in cases due to the rise of the B.1.1.7 (Alpha) variant in March 2021, prove challenging to predict. CONCLUSIONS: Multi-model approaches can help to improve the performance of epidemiological forecasts. However, while death numbers can be predicted with some success based on current case and hospitalization data, predictability of case numbers remains low beyond quite short time horizons. Additional data sources including sequencing and mobility data, which were not extensively used in the present study, may help to improve performance.
We compare forecasts of weekly case and death numbers for COVID-19 in Germany and Poland based on 15 different modelling approaches. These cover the period from January to April 2021 and address numbers of cases and deaths one and two weeks into the future, along with the respective uncertainties. We find that combining different forecasts into one forecast can enable better predictions. However, case numbers over longer periods were challenging to predict. Additional data sources, such as information about different versions of the SARS-CoV-2 virus present in the population, might improve forecasts in the future.
RESUMEN
BACKGROUND: The COVID-19 outbreak has left many people isolated within their homes; these people are turning to social media for news and social connection, which leaves them vulnerable to believing and sharing misinformation. Health-related misinformation threatens adherence to public health messaging, and monitoring its spread on social media is critical to understanding the evolution of ideas that have potentially negative public health impacts. OBJECTIVE: The aim of this study is to use Twitter data to explore methods to characterize and classify four COVID-19 conspiracy theories and to provide context for each of these conspiracy theories through the first 5 months of the pandemic. METHODS: We began with a corpus of COVID-19 tweets (approximately 120 million) spanning late January to early May 2020. We first filtered tweets using regular expressions (n=1.8 million) and used random forest classification models to identify tweets related to four conspiracy theories. Our classified data sets were then used in downstream sentiment analysis and dynamic topic modeling to characterize the linguistic features of COVID-19 conspiracy theories as they evolve over time. RESULTS: Analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. Random forest classifier metrics varied across the four conspiracy theories considered (F1 scores between 0.347 and 0.857); this performance increased as the given conspiracy theory was more narrowly defined. We showed that misinformation tweets demonstrate more negative sentiment when compared to nonmisinformation tweets and that theories evolve over time, incorporating details from unrelated conspiracy theories as well as real-world events. CONCLUSIONS: Although we focus here on health-related misinformation, this combination of approaches is not specific to public health and is valuable for characterizing misinformation in general, which is an important first step in creating targeted messaging to counteract its spread. Initial messaging should aim to preempt generalized misinformation before it becomes widespread, while later messaging will need to target evolving conspiracy theories and the new facets of each as they become incorporated.
Asunto(s)
COVID-19/epidemiología , Comunicación , Difusión de la Información/métodos , Medios de Comunicación Sociales/estadística & datos numéricos , HumanosRESUMEN
Dengue virus remains a significant public health challenge in Brazil, and seasonal preparation efforts are hindered by variable intra- and interseasonal dynamics. Here, we present a framework for characterizing weekly dengue activity at the Brazilian mesoregion level from 2010-2016 as time series properties that are relevant to forecasting efforts, focusing on outbreak shape, seasonal timing, and pairwise correlations in magnitude and onset. In addition, we use a combination of 18 satellite remote sensing imagery, weather, clinical, mobility, and census data streams and regression methods to identify a parsimonious set of covariates that explain each time series property. The models explained 54% of the variation in outbreak shape, 38% of seasonal onset, 34% of pairwise correlation in outbreak timing, and 11% of pairwise correlation in outbreak magnitude. Regions that have experienced longer periods of drought sensitivity, as captured by the "normalized burn ratio," experienced less intense outbreaks, while regions with regular fluctuations in relative humidity had less regular seasonal outbreaks. Both the pairwise correlations in outbreak timing and outbreak trend between mesoresgions were best predicted by distance. Our analysis also revealed the presence of distinct geographic clusters where dengue properties tend to be spatially correlated. Forecasting models aimed at predicting the dynamics of dengue activity need to identify the most salient variables capable of contributing to accurate predictions. Our findings show that successful models may need to leverage distinct variables in different locations and be catered to a specific task, such as predicting outbreak magnitude or timing characteristics, to be useful. This advocates in favor of "adaptive models" rather than "one-size-fits-all" models. The results of this study can be applied to improving spatial hierarchical or target-focused forecasting models of dengue activity across Brazil.
Asunto(s)
Dengue/epidemiología , Brotes de Enfermedades/estadística & datos numéricos , Predicción/métodos , Brasil/epidemiología , Humanos , Modelos Estadísticos , Estaciones del Año , Tiempo (Meteorología)RESUMEN
BACKGROUND: Currently, the identification of infectious disease re-emergence is performed without describing specific quantitative criteria that can be used to identify re-emergence events consistently. This practice may lead to ineffective mitigation. In addition, identification of factors contributing to local disease re-emergence and assessment of global disease re-emergence require access to data about disease incidence and a large number of factors at the local level for the entire world. This paper presents Re-emerging Disease Alert (RED Alert), a web-based tool designed to help public health officials detect and understand infectious disease re-emergence. OBJECTIVE: Our objective is to bring together a variety of disease-related data and analytics needed to help public health analysts answer the following 3 primary questions for detecting and understanding disease re-emergence: Is there a potential disease re-emergence at the local (country) level? What are the potential contributing factors for this re-emergence? Is there a potential for global re-emergence? METHODS: We collected and cleaned disease-related data (eg, case counts, vaccination rates, and indicators related to disease transmission) from several data sources including the World Health Organization (WHO), Pan American Health Organization (PAHO), World Bank, and Gideon. We combined these data with machine learning and visual analytics into a tool called RED Alert to detect re-emergence for the following 4 diseases: measles, cholera, dengue, and yellow fever. We evaluated the performance of the machine learning models for re-emergence detection and reviewed the output of the tool through a number of case studies. RESULTS: Our supervised learning models were able to identify 82%-90% of the local re-emergence events, although with 18%-31% (except 46% for dengue) false positives. This is consistent with our goal of identifying all possible re-emergences while allowing some false positives. The review of the web-based tool through case studies showed that local re-emergence detection was possible and that the tool provided actionable information about potential factors contributing to the local disease re-emergence and trends in global disease re-emergence. CONCLUSIONS: To the best of our knowledge, this is the first tool that focuses specifically on disease re-emergence and addresses the important challenges mentioned above.
Asunto(s)
Enfermedades Transmisibles Emergentes/epidemiología , Internet , Vigilancia en Salud Pública/métodos , Humanos , Reproducibilidad de los ResultadosRESUMEN
Policymakers make decisions about COVID-19 management in the face of considerable uncertainty. We convened multiple modeling teams to evaluate reopening strategies for a mid-sized county in the United States, in a novel process designed to fully express scientific uncertainty while reducing linguistic uncertainty and cognitive biases. For the scenarios considered, the consensus from 17 distinct models was that a second outbreak will occur within 6 months of reopening, unless schools and non-essential workplaces remain closed. Up to half the population could be infected with full workplace reopening; non-essential business closures reduced median cumulative infections by 82%. Intermediate reopening interventions identified no win-win situations; there was a trade-off between public health outcomes and duration of workplace closures. Aggregate results captured twice the uncertainty of individual models, providing a more complete expression of risk for decision-making purposes.
RESUMEN
Infectious diseases are changing due to the environment and altered interactions among hosts, reservoirs, vectors, and pathogens. This is particularly true for zoonotic diseases that infect humans, agricultural animals, and wildlife. Within the subset of zoonoses, vector-borne pathogens are changing more rapidly with climate change, and have a complex epidemiology, which may allow them to take advantage of a changing environment. Most mosquito-borne infectious diseases are transmitted by mosquitoes in three genera: Aedes, Anopheles, and Culex, and the expansion of these genera is well documented. There is an urgent need to study vector-borne diseases in response to climate change and to produce a generalizable approach capable of generating risk maps and forecasting outbreaks. Here, we provide a strategy for coupling climate and epidemiological models for zoonotic infectious diseases. We discuss the complexity and challenges of data and model fusion, baseline requirements for data, and animal and human population movement. Disease forecasting needs significant investment to build the infrastructure necessary to collect data about the environment, vectors, and hosts at all spatial and temporal resolutions. These investments can contribute to building a modeling community around the globe to support public health officials so as to reduce disease burden through forecasts with quantified uncertainty.
RESUMEN
Infectious disease reemergence is an important yet ambiguous concept that lacks a quantitative definition. Currently, reemergence is identified without specific criteria describing what constitutes a reemergent event. This practice affects reproducible assessments of high-consequence public health events and disease response prioritization. This in turn can lead to misallocation of resources. More important, early recognition of reemergence facilitates effective mitigation. We used a supervised machine learning approach to detect potential disease reemergence. We demonstrate the feasibility of applying a machine learning classifier to identify reemergence events in a systematic way for 4 different infectious diseases. The algorithm is applicable to temporal trends of disease incidence and includes disease-specific features to identify potential reemergence. Through this study, we offer a structured means of identifying potential reemergence using a data-driven approach.
Asunto(s)
Algoritmos , Enfermedades Transmisibles Emergentes , Brotes de Enfermedades , Aprendizaje Automático Supervisado , Humanos , Informática MédicaRESUMEN
BACKGROUND: Information from historical infectious disease outbreaks provides real-world data about outbreaks and their impacts on affected populations. These data can be used to develop a picture of an unfolding outbreak in its early stages, when incoming information is sparse and isolated, to identify effective control measures and guide their implementation. OBJECTIVE: This study aimed to develop a publicly accessible Web-based visual analytic called Analytics for the Investigation of Disease Outbreaks (AIDO) that uses historical disease outbreak information for decision support and situational awareness of an unfolding outbreak. METHODS: We developed an algorithm to allow the matching of unfolding outbreak data to a representative library of historical outbreaks. This process provides epidemiological clues that facilitate a user's understanding of an unfolding outbreak and facilitates informed decisions about mitigation actions. Disease-specific properties to build a complete picture of the unfolding event were identified through a data-driven approach. A method of analogs approach was used to develop a short-term forecasting feature in the analytic. The 4 major steps involved in developing this tool were (1) collection of historic outbreak data and preparation of the representative library, (2) development of AIDO algorithms, (3) development of user interface and associated visuals, and (4) verification and validation. RESULTS: The tool currently includes representative historical outbreaks for 39 infectious diseases with over 600 diverse outbreaks. We identified 27 different properties categorized into 3 broad domains (population, location, and disease) that were used to evaluate outbreaks across all diseases for their effect on case count and duration of an outbreak. Statistical analyses revealed disease-specific properties from this set that were included in the disease-specific similarity algorithm. Although there were some similarities across diseases, we found that statistically important properties tend to vary, even between similar diseases. This may be because of our emphasis on including diverse representative outbreak presentations in our libraries. AIDO algorithm evaluations (similarity algorithm and short-term forecasting) were conducted using 4 case studies and we have shown details for the Q fever outbreak in Bilbao, Spain (2014), using data from the early stages of the outbreak. Using data from only the initial 2 weeks, AIDO identified historical outbreaks that were very similar in terms of their epidemiological picture (case count, duration, source of exposure, and urban setting). The short-term forecasting algorithm accurately predicted case count and duration for the unfolding outbreak. CONCLUSIONS: AIDO is a decision support tool that facilitates increased situational awareness during an unfolding outbreak and enables informed decisions on mitigation strategies. AIDO analytics are available to epidemiologists across the globe with access to internet, at no cost. In this study, we presented a new approach to applying historical outbreak data to provide actionable information during the early stages of an unfolding infectious disease outbreak.