RESUMEN
During the COVID-19 pandemic, forecasting COVID-19 trends to support planning and response was a priority for scientists and decision makers alike. In the United States, COVID-19 forecasting was coordinated by a large group of universities, companies, and government entities led by the Centers for Disease Control and Prevention and the US COVID-19 Forecast Hub (https://covid19forecasthub.org). We evaluated approximately 9.7 million forecasts of weekly state-level COVID-19 cases for predictions 1-4 weeks into the future submitted by 24 teams from August 2020 to December 2021. We assessed coverage of central prediction intervals and weighted interval scores (WIS), adjusting for missing forecasts relative to a baseline forecast, and used a Gaussian generalized estimating equation (GEE) model to evaluate differences in skill across epidemic phases that were defined by the effective reproduction number. Overall, we found high variation in skill across individual models, with ensemble-based forecasts outperforming other approaches. Forecast skill relative to the baseline was generally higher for larger jurisdictions (e.g., states compared to counties). Over time, forecasts generally performed worst in periods of rapid changes in reported cases (either in increasing or decreasing epidemic phases) with 95% prediction interval coverage dropping below 50% during the growth phases of the winter 2020, Delta, and Omicron waves. Ideally, case forecasts could serve as a leading indicator of changes in transmission dynamics. However, while most COVID-19 case forecasts outperformed a naïve baseline model, even the most accurate case forecasts were unreliable in key phases. Further research could improve forecasts of leading indicators, like COVID-19 cases, by leveraging additional real-time data, addressing performance across phases, improving the characterization of forecast confidence, and ensuring that forecasts were coherent across spatial scales. In the meantime, it is critical for forecast users to appreciate current limitations and use a broad set of indicators to inform pandemic-related decision making.
Asunto(s)
COVID-19 , Predicción , Pandemias , SARS-CoV-2 , COVID-19/epidemiología , COVID-19/transmisión , Humanos , Predicción/métodos , Estados Unidos/epidemiología , Pandemias/estadística & datos numéricos , Biología Computacional , Modelos EstadísticosRESUMEN
BACKGROUND: During the COVID-19 pandemic there has been a strong interest in forecasts of the short-term development of epidemiological indicators to inform decision makers. In this study we evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland for the period from January through April 2021. METHODS: We evaluate probabilistic real-time predictions of confirmed cases and deaths from COVID-19 in Germany and Poland. These were issued by 15 different forecasting models, run by independent research teams. Moreover, we study the performance of combined ensemble forecasts. Evaluation of probabilistic forecasts is based on proper scoring rules, along with interval coverage proportions to assess calibration. The presented work is part of a pre-registered evaluation study. RESULTS: We find that many, though not all, models outperform a simple baseline model up to four weeks ahead for the considered targets. Ensemble methods show very good relative performance. The addressed time period is characterized by rather stable non-pharmaceutical interventions in both countries, making short-term predictions more straightforward than in previous periods. However, major trend changes in reported cases, like the rebound in cases due to the rise of the B.1.1.7 (Alpha) variant in March 2021, prove challenging to predict. CONCLUSIONS: Multi-model approaches can help to improve the performance of epidemiological forecasts. However, while death numbers can be predicted with some success based on current case and hospitalization data, predictability of case numbers remains low beyond quite short time horizons. Additional data sources including sequencing and mobility data, which were not extensively used in the present study, may help to improve performance.
We compare forecasts of weekly case and death numbers for COVID-19 in Germany and Poland based on 15 different modelling approaches. These cover the period from January to April 2021 and address numbers of cases and deaths one and two weeks into the future, along with the respective uncertainties. We find that combining different forecasts into one forecast can enable better predictions. However, case numbers over longer periods were challenging to predict. Additional data sources, such as information about different versions of the SARS-CoV-2 virus present in the population, might improve forecasts in the future.
RESUMEN
[This corrects the article DOI: 10.1371/journal.pcbi.1008618.].
RESUMEN
Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks.
Asunto(s)
COVID-19 , COVID-19/mortalidad , Exactitud de los Datos , Predicción , Humanos , Pandemias , Probabilidad , Salud Pública/tendencias , Estados Unidos/epidemiologíaRESUMEN
For practical reasons, many forecasts of case, hospitalization, and death counts in the context of the current Coronavirus Disease 2019 (COVID-19) pandemic are issued in the form of central predictive intervals at various levels. This is also the case for the forecasts collected in the COVID-19 Forecast Hub (https://covid19forecasthub.org/). Forecast evaluation metrics like the logarithmic score, which has been applied in several infectious disease forecasting challenges, are then not available as they require full predictive distributions. This article provides an overview of how established methods for the evaluation of quantile and interval forecasts can be applied to epidemic forecasts in this format. Specifically, we discuss the computation and interpretation of the weighted interval score, which is a proper score that approximates the continuous ranked probability score. It can be interpreted as a generalization of the absolute error to probabilistic forecasts and allows for a decomposition into a measure of sharpness and penalties for over- and underprediction.
Asunto(s)
COVID-19/epidemiología , Enfermedades Transmisibles/epidemiología , Pandemias , COVID-19/virología , Predicción , Humanos , Probabilidad , SARS-CoV-2/aislamiento & purificaciónRESUMEN
A probability forecast or probabilistic classifier is reliable or calibrated if the predicted probabilities are matched by ex post observed frequencies, as examined visually in reliability diagrams. The classical binning and counting approach to plotting reliability diagrams has been hampered by a lack of stability under unavoidable, ad hoc implementation decisions. Here, we introduce the CORP approach, which generates provably statistically consistent, optimally binned, and reproducible reliability diagrams in an automated way. CORP is based on nonparametric isotonic regression and implemented via the pool-adjacent-violators (PAV) algorithm-essentially, the CORP reliability diagram shows the graph of the PAV-(re)calibrated forecast probabilities. The CORP approach allows for uncertainty quantification via either resampling techniques or asymptotic theory, furnishes a numerical measure of miscalibration, and provides a CORP-based Brier-score decomposition that generalizes to any proper scoring rule. We anticipate that judicious uses of the PAV algorithm yield improved tools for diagnostics and inference for a very wide range of statistical and machine learning methods.
RESUMEN
We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for count data. Our proposals include a nonrandomized version of the probability integral transform, marginal calibration diagrams, and proper scoring rules, such as the predictive deviance. In case studies, we critique count regression models for patent data, and assess the predictive performance of Bayesian age-period-cohort models for larynx cancer counts in Germany. The toolbox applies in Bayesian or classical and parametric or nonparametric settings and to any type of ordered discrete outcomes.
Asunto(s)
Biometría/métodos , Modelos Estadísticos , Teorema de Bayes , Estudios de Cohortes , Alemania/epidemiología , Humanos , Neoplasias Laríngeas/epidemiología , Análisis de Regresión , Estadísticas no ParamétricasRESUMEN
Traditional weather forecasting has been built on a foundation of deterministic modeling--start with initial conditions, put them into a supercomputer model, and end up with a prediction about future weather. But as Gneiting and Raftery discuss in their Perspective, a new approach--ensemble forecasting--was introduced in the early 1990s. In this method, up to 100 different computer runs, each with slightly different starting conditions or model assumptions, are combined into a weather forecast. In concert with statistical techniques, ensembles can provide accurate statements about the uncertainty in daily and seasonal forecasting. The challenge now is to improve the modeling, statistical analysis, and visualization technologies for disseminating the ensemble results.