Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Int J Forecast ; 39(3): 1366-1383, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-35791416

RESUMEN

The U.S. COVID-19 Forecast Hub aggregates forecasts of the short-term burden of COVID-19 in the United States from many contributing teams. We study methods for building an ensemble that combines forecasts from these teams. These experiments have informed the ensemble methods used by the Hub. To be most useful to policymakers, ensemble forecasts must have stable performance in the presence of two key characteristics of the component forecasts: (1) occasional misalignment with the reported data, and (2) instability in the relative performance of component forecasters over time. Our results indicate that in the presence of these challenges, an untrained and robust approach to ensembling using an equally weighted median of all component forecasts is a good choice to support public health decision-makers. In settings where some contributing forecasters have a stable record of good performance, trained ensembles that give those forecasters higher weight can also be helpful.

2.
medRxiv ; 2023 Dec 11.
Artículo en Inglés | MEDLINE | ID: mdl-38168429

RESUMEN

Accurate forecasts can enable more effective public health responses during seasonal influenza epidemics. Forecasting teams were asked to provide national and jurisdiction-specific probabilistic predictions of weekly confirmed influenza hospital admissions for one through four weeks ahead for the 2021-22 and 2022-23 influenza seasons. Across both seasons, 26 teams submitted forecasts, with the submitting teams varying between seasons. Forecast skill was evaluated using the Weighted Interval Score (WIS), relative WIS, and coverage. Six out of 23 models outperformed the baseline model across forecast weeks and locations in 2021-22 and 12 out of 18 models in 2022-23. Averaging across all forecast targets, the FluSight ensemble was the 2nd most accurate model measured by WIS in 2021-22 and the 5th most accurate in the 2022-23 season. Forecast skill and 95% coverage for the FluSight ensemble and most component models degraded over longer forecast horizons and during periods of rapid change. Current influenza forecasting efforts help inform situational awareness, but research is needed to address limitations, including decreased performance during periods of changing epidemic dynamics.

3.
PLoS Comput Biol ; 18(12): e1010771, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36520949

RESUMEN

Distributional forecasts are important for a wide variety of applications, including forecasting epidemics. Often, forecasts are miscalibrated, or unreliable in assigning uncertainty to future events. We present a recalibration method that can be applied to a black-box forecaster given retrospective forecasts and observations, as well as an extension to make this method more effective in recalibrating epidemic forecasts. This method is guaranteed to improve calibration and log score performance when trained and measured in-sample. We also prove that the increase in expected log score of a recalibrated forecaster is equal to the entropy of the PIT distribution. We apply this recalibration method to the 27 influenza forecasters in the FluSight Network and show that recalibration reliably improves forecast accuracy and calibration. This method, available on Github, is effective, robust, and easy to use as a post-processing tool to improve epidemic forecasts.


Asunto(s)
Epidemias , Gripe Humana , Humanos , Estudios Retrospectivos , Incertidumbre , Gripe Humana/epidemiología , Predicción
4.
Ann Stat ; 50(2): 949-986, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36120512

RESUMEN

Interpolators-estimators that achieve zero training error-have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ 2 norm ("ridgeless") interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors x i ∈ ℝ p are obtained by applying a linear transform to a vector of i.i.d. entries, x i = Σ1/2 z i (with z i ∈ ℝ p ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi = φ(Wz i ) (with z i ∈ ℝ d , W ∈ ℝ p × d a matrix of i.i.d. entries, and φ an activation function acting componentwise on Wz i ). We recover-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

6.
Proc Natl Acad Sci U S A ; 118(51)2021 12 21.
Artículo en Inglés | MEDLINE | ID: mdl-34903655

RESUMEN

Short-term forecasts of traditional streams from public health reporting (such as cases, hospitalizations, and deaths) are a key input to public health decision-making during a pandemic. Since early 2020, our research group has worked with data partners to collect, curate, and make publicly available numerous real-time COVID-19 indicators, providing multiple views of pandemic activity in the United States. This paper studies the utility of five such indicators-derived from deidentified medical insurance claims, self-reported symptoms from online surveys, and COVID-related Google search activity-from a forecasting perspective. For each indicator, we ask whether its inclusion in an autoregressive (AR) model leads to improved predictive accuracy relative to the same model excluding it. Such an AR model, without external features, is already competitive with many top COVID-19 forecasting models in use today. Our analysis reveals that 1) inclusion of each of these five indicators improves on the overall predictive accuracy of the AR model; 2) predictive gains are in general most pronounced during times in which COVID cases are trending in "flat" or "down" directions; and 3) one indicator, based on Google searches, seems to be particularly helpful during "up" trends.


Asunto(s)
COVID-19/epidemiología , Indicadores de Salud , Modelos Estadísticos , Métodos Epidemiológicos , Predicción , Humanos , Internet/estadística & datos numéricos , Encuestas y Cuestionarios , Estados Unidos/epidemiología
7.
Proc Natl Acad Sci U S A ; 118(51)2021 12 21.
Artículo en Inglés | MEDLINE | ID: mdl-34903656

RESUMEN

The US COVID-19 Trends and Impact Survey (CTIS) is a large, cross-sectional, internet-based survey that has operated continuously since April 6, 2020. By inviting a random sample of Facebook active users each day, CTIS collects information about COVID-19 symptoms, risks, mitigating behaviors, mental health, testing, vaccination, and other key priorities. The large scale of the survey-over 20 million responses in its first year of operation-allows tracking of trends over short timescales and allows comparisons at fine demographic and geographic detail. The survey has been repeatedly revised to respond to emerging public health priorities. In this paper, we describe the survey methods and content and give examples of CTIS results that illuminate key patterns and trends and help answer high-priority policy questions relevant to the COVID-19 epidemic and response. These results demonstrate how large online surveys can provide continuous, real-time indicators of important outcomes that are not subject to public health reporting delays and backlogs. The CTIS offers high value as a supplement to official reporting data by supplying essential information about behaviors, attitudes toward policy and preventive measures, economic impacts, and other topics not reported in public health surveillance systems.


Asunto(s)
Prueba de COVID-19/estadística & datos numéricos , COVID-19/epidemiología , Indicadores de Salud , Adulto , Anciano , COVID-19/diagnóstico , COVID-19/prevención & control , COVID-19/transmisión , Vacunas contra la COVID-19 , Estudios Transversales , Métodos Epidemiológicos , Femenino , Humanos , Masculino , Persona de Mediana Edad , Aceptación de la Atención de Salud/estadística & datos numéricos , Medios de Comunicación Sociales/estadística & datos numéricos , Estados Unidos/epidemiología , Adulto Joven
9.
Biometrics ; 77(3): 1037-1049, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-33434289

RESUMEN

Changepoint detection methods are used in many areas of science and engineering, for example, in the analysis of copy number variation data to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strength (or the presence) of given changepoints post-selection are lacking. Post-selection inference offers a framework to fill this gap, but the most straightforward application of these methods results in low-powered hypothesis tests and leaves open several important questions about practical usability. In this work, we carefully tailor post-selection inference methods toward changepoint detection, focusing on copy number variation data. To accomplish this, we study commonly used changepoint algorithms: binary segmentation, as well as two of its most popular variants, wild and circular, and the fused lasso. We implement some of the latest developments in post-selection inference theory, mainly auxiliary randomization. This improves the power, which requires implementations of Markov chain Monte Carlo algorithms (importance sampling and hit-and-run sampling) to carry out our tests. We also provide recommendations for improving practical useability, detailed simulations, and example analyses on array comparative genomic hybridization as well as sequencing data.


Asunto(s)
Algoritmos , Variaciones en el Número de Copia de ADN , Hibridación Genómica Comparativa , Variaciones en el Número de Copia de ADN/genética , Cadenas de Markov , Método de Montecarlo
10.
Proc Natl Acad Sci U S A ; 116(48): 24268-24274, 2019 11 26.
Artículo en Inglés | MEDLINE | ID: mdl-31712420

RESUMEN

A wide range of research has promised new tools for forecasting infectious disease dynamics, but little of that research is currently being applied in practice, because tools do not address key public health needs, do not produce probabilistic forecasts, have not been evaluated on external data, or do not provide sufficient forecast skill to be useful. We developed an open collaborative forecasting challenge to assess probabilistic forecasts for seasonal epidemics of dengue, a major global public health problem. Sixteen teams used a variety of methods and data to generate forecasts for 3 epidemiological targets (peak incidence, the week of the peak, and total incidence) over 8 dengue seasons in Iquitos, Peru and San Juan, Puerto Rico. Forecast skill was highly variable across teams and targets. While numerous forecasts showed high skill for midseason situational awareness, early season skill was low, and skill was generally lowest for high incidence seasons, those for which forecasts would be most valuable. A comparison of modeling approaches revealed that average forecast skill was lower for models including biologically meaningful data and mechanisms and that both multimodel and multiteam ensemble forecasts consistently outperformed individual model forecasts. Leveraging these insights, data, and the forecasting framework will be critical to improve forecast skill and the application of forecasts in real time for epidemic preparedness and response. Moreover, key components of this project-integration with public health needs, a common forecasting framework, shared and standardized data, and open participation-can help advance infectious disease forecasting beyond dengue.


Asunto(s)
Dengue/epidemiología , Métodos Epidemiológicos , Brotes de Enfermedades , Epidemias/prevención & control , Humanos , Incidencia , Modelos Estadísticos , Perú/epidemiología , Puerto Rico/epidemiología
11.
Stat Med ; 38(12): 2184-2205, 2019 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-30701586

RESUMEN

We study regularized estimation in high-dimensional longitudinal classification problems, using the lasso and fused lasso regularizers. The constructed coefficient estimates are piecewise constant across the time dimension in the longitudinal problem, with adaptively selected change points (break points). We present an efficient algorithm for computing such estimates, based on proximal gradient descent. We apply our proposed technique to a longitudinal data set on Alzheimer's disease from the Cardiovascular Health Study Cognition Study. Using data analysis and a simulation study, we motivate and demonstrate several practical considerations such as the selection of tuning parameters and the assessment of model stability. While race, gender, vascular and heart disease, lack of caregivers, and deterioration of learning and memory are all important predictors of dementia, we also find that these risk factors become more relevant in the later stages of life.


Asunto(s)
Algoritmos , Estudios Longitudinales , Análisis de Regresión , Medición de Riesgo/métodos , Enfermedad de Alzheimer , Simulación por Computador , Progresión de la Enfermedad , Humanos , Análisis Multinivel , Factores de Riesgo
12.
PLoS Comput Biol ; 14(6): e1006134, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29906286

RESUMEN

Accurate and reliable forecasts of seasonal epidemics of infectious disease can assist in the design of countermeasures and increase public awareness and preparedness. This article describes two main contributions we made recently toward this goal: a novel approach to probabilistic modeling of surveillance time series based on "delta densities", and an optimization scheme for combining output from multiple forecasting methods into an adaptively weighted ensemble. Delta densities describe the probability distribution of the change between one observation and the next, conditioned on available data; chaining together nonparametric estimates of these distributions yields a model for an entire trajectory. Corresponding distributional forecasts cover more observed events than alternatives that treat the whole season as a unit, and improve upon multiple evaluation metrics when extracting key targets of interest to public health officials. Adaptively weighted ensembles integrate the results of multiple forecasting methods, such as delta density, using weights that can change from situation to situation. We treat selection of optimal weightings across forecasting methods as a separate estimation task, and describe an estimation procedure based on optimizing cross-validation performance. We consider some details of the data generation process, including data revisions and holiday effects, both in the construction of these forecasting methods and when performing retrospective evaluation. The delta density method and an adaptively weighted ensemble of other forecasting methods each improve significantly on the next best ensemble component when applied separately, and achieve even better cross-validated performance when used in conjunction. We submitted real-time forecasts based on these contributions as part of CDC's 2015/2016 FluSight Collaborative Comparison. Among the fourteen submissions that season, this system was ranked by CDC as the most accurate.


Asunto(s)
Predicción/métodos , Gripe Humana/prevención & control , Centers for Disease Control and Prevention, U.S. , Enfermedades Transmisibles , Epidemias/prevención & control , Humanos , Modelos Biológicos , Modelos Estadísticos , Salud Pública , Estudios Retrospectivos , Estaciones del Año , Estados Unidos
13.
Epidemics ; 24: 26-33, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29506911

RESUMEN

Accurate forecasts could enable more informed public health decisions. Since 2013, CDC has worked with external researchers to improve influenza forecasts by coordinating seasonal challenges for the United States and the 10 Health and Human Service Regions. Forecasted targets for the 2014-15 challenge were the onset week, peak week, and peak intensity of the season and the weekly percent of outpatient visits due to influenza-like illness (ILI) 1-4 weeks in advance. We used a logarithmic scoring rule to score the weekly forecasts, averaged the scores over an evaluation period, and then exponentiated the resulting logarithmic score. Poor forecasts had a score near 0, and perfect forecasts a score of 1. Five teams submitted forecasts from seven different models. At the national level, the team scores for onset week ranged from <0.01 to 0.41, peak week ranged from 0.08 to 0.49, and peak intensity ranged from <0.01 to 0.17. The scores for predictions of ILI 1-4 weeks in advance ranged from 0.02-0.38 and was highest 1 week ahead. Forecast skill varied by HHS region. Forecasts can predict epidemic characteristics that inform public health actions. CDC, state and local health officials, and researchers are working together to improve forecasts.


Asunto(s)
Gripe Humana/epidemiología , Estaciones del Año , Conducta Cooperativa , Recolección de Datos/estadística & datos numéricos , Recolección de Datos/tendencias , Epidemias/estadística & datos numéricos , Predicción , Humanos , Salud Pública/estadística & datos numéricos , Salud Pública/tendencias , Estados Unidos/epidemiología
14.
PLoS Comput Biol ; 13(3): e1005248, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-28282375

RESUMEN

Infectious diseases impose considerable burden on society, despite significant advances in technology and medicine over the past century. Advanced warning can be helpful in mitigating and preparing for an impending or ongoing epidemic. Historically, such a capability has lagged for many reasons, including in particular the uncertainty in the current state of the system and in the understanding of the processes that drive epidemic trajectories. Presently we have access to data, models, and computational resources that enable the development of epidemiological forecasting systems. Indeed, several recent challenges hosted by the U.S. government have fostered an open and collaborative environment for the development of these technologies. The primary focus of these challenges has been to develop statistical and computational methods for epidemiological forecasting, but here we consider a serious alternative based on collective human judgment. We created the web-based "Epicast" forecasting system which collects and aggregates epidemic predictions made in real-time by human participants, and with these forecasts we ask two questions: how accurate is human judgment, and how do these forecasts compare to their more computational, data-driven alternatives? To address the former, we assess by a variety of metrics how accurately humans are able to predict influenza and chikungunya trajectories. As for the latter, we show that real-time, combined human predictions of the 2014-2015 and 2015-2016 U.S. flu seasons are often more accurate than the same predictions made by several statistical systems, especially for short-term targets. We conclude that there is valuable predictive power in collective human judgment, and we discuss the benefits and drawbacks of this approach.


Asunto(s)
Enfermedades Transmisibles/mortalidad , Brotes de Enfermedades/estadística & datos numéricos , Métodos Epidemiológicos , Predicción/métodos , Modelos Estadísticos , Medición de Riesgo/métodos , Humanos , Prevalencia , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Estados Unidos/epidemiología
15.
PLoS Comput Biol ; 11(8): e1004382, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26317693

RESUMEN

Seasonal influenza epidemics cause consistent, considerable, widespread loss annually in terms of economic burden, morbidity, and mortality. With access to accurate and reliable forecasts of a current or upcoming influenza epidemic's behavior, policy makers can design and implement more effective countermeasures. This past year, the Centers for Disease Control and Prevention hosted the "Predict the Influenza Season Challenge", with the task of predicting key epidemiological measures for the 2013-2014 U.S. influenza season with the help of digital surveillance data. We developed a framework for in-season forecasts of epidemics using a semiparametric Empirical Bayes framework, and applied it to predict the weekly percentage of outpatient doctors visits for influenza-like illness, and the season onset, duration, peak time, and peak height, with and without using Google Flu Trends data. Previous work on epidemic modeling has focused on developing mechanistic models of disease behavior and applying time series tools to explain historical data. However, tailoring these models to certain types of surveillance data can be challenging, and overly complex models with many parameters can compromise forecasting ability. Our approach instead produces possibilities for the epidemic curve of the season of interest using modified versions of data from previous seasons, allowing for reasonable variations in the timing, pace, and intensity of the seasonal epidemics, as well as noise in observations. Since the framework does not make strict domain-specific assumptions, it can easily be applied to some other diseases with seasonal epidemics. This method produces a complete posterior distribution over epidemic curves, rather than, for example, solely point predictions of forecasting targets. We report prospective influenza-like-illness forecasts made for the 2013-2014 U.S. influenza season, and compare the framework's cross-validated prediction error on historical data to that of a variety of simpler baseline predictors.


Asunto(s)
Biología Computacional/métodos , Epidemias/estadística & datos numéricos , Gripe Humana/epidemiología , Modelos Biológicos , Modelos Estadísticos , Teorema de Bayes , Centers for Disease Control and Prevention, U.S. , Humanos , Reproducibilidad de los Resultados , Estados Unidos
16.
Ann Stat ; 42(2): 413-468, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25574062

RESUMEN

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a [Formula: see text] distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than [Formula: see text] under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the [Formula: see text] penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties-adaptivity and shrinkage-and its null distribution is tractable and asymptotically Exp(1).

17.
J R Stat Soc Series B Stat Methodol ; 74(2): 245-266, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25506256

RESUMEN

We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have propose 'SAFE' rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, i.e. predictors that have non-zero coefficients in the solution. We therefore combine them with simple checks of the Karush-Kuhn-Tucker conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the Karush-Kuhn-Tucker, conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide substantial savings in computational time for a variety of statistical optimization problems.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...