Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Biostatistics ; 2023 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-38058013

RESUMO

Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England's Test and Trace programme for COVID-19.

2.
BMC Med Res Methodol ; 24(1): 146, 2024 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-38987715

RESUMO

BACKGROUND: Risk prediction models are routinely used to assist in clinical decision making. A small sample size for model development can compromise model performance when the model is applied to new patients. For binary outcomes, the calibration slope (CS) and the mean absolute prediction error (MAPE) are two key measures on which sample size calculations for the development of risk models have been based. CS quantifies the degree of model overfitting while MAPE assesses the accuracy of individual predictions. METHODS: Recently, two formulae were proposed to calculate the sample size required, given anticipated features of the development data such as the outcome prevalence and c-statistic, to ensure that the expectation of the CS and MAPE (over repeated samples) in models fitted using MLE will meet prespecified target values. In this article, we use a simulation study to evaluate the performance of these formulae. RESULTS: We found that both formulae work reasonably well when the anticipated model strength is not too high (c-statistic < 0.8), regardless of the outcome prevalence. However, for higher model strengths the CS formula underestimates the sample size substantially. For example, for c-statistic = 0.85 and 0.9, the sample size needed to be increased by at least 50% and 100%, respectively, to meet the target expected CS. On the other hand, the MAPE formula tends to overestimate the sample size for high model strengths. These conclusions were more pronounced for higher prevalence than for lower prevalence. Similar results were drawn when the outcome was time to event with censoring. Given these findings, we propose a simulation-based approach, implemented in the new R package 'samplesizedev', to correctly estimate the sample size even for high model strengths. The software can also calculate the variability in CS and MAPE, thus allowing for assessment of model stability. CONCLUSIONS: The calibration and MAPE formulae suggest sample sizes that are generally appropriate for use when the model strength is not too high. However, they tend to be biased for higher model strengths, which are not uncommon in clinical risk prediction studies. On those occasions, our proposed adjustments to the sample size calculations will be relevant.


Assuntos
Modelos Estatísticos , Humanos , Tamanho da Amostra , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Simulação por Computador , Algoritmos
3.
Lancet ; 399(10332): 1303-1312, 2022 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-35305296

RESUMO

BACKGROUND: The omicron variant (B.1.1.529) of SARS-CoV-2 has demonstrated partial vaccine escape and high transmissibility, with early studies indicating lower severity of infection than that of the delta variant (B.1.617.2). We aimed to better characterise omicron severity relative to delta by assessing the relative risk of hospital attendance, hospital admission, or death in a large national cohort. METHODS: Individual-level data on laboratory-confirmed COVID-19 cases resident in England between Nov 29, 2021, and Jan 9, 2022, were linked to routine datasets on vaccination status, hospital attendance and admission, and mortality. The relative risk of hospital attendance or admission within 14 days, or death within 28 days after confirmed infection, was estimated using proportional hazards regression. Analyses were stratified by test date, 10-year age band, ethnicity, residential region, and vaccination status, and were further adjusted for sex, index of multiple deprivation decile, evidence of a previous infection, and year of age within each age band. A secondary analysis estimated variant-specific and vaccine-specific vaccine effectiveness and the intrinsic relative severity of omicron infection compared with delta (ie, the relative risk in unvaccinated cases). FINDINGS: The adjusted hazard ratio (HR) of hospital attendance (not necessarily resulting in admission) with omicron compared with delta was 0·56 (95% CI 0·54-0·58); for hospital admission and death, HR estimates were 0·41 (0·39-0·43) and 0·31 (0·26-0·37), respectively. Omicron versus delta HR estimates varied with age for all endpoints examined. The adjusted HR for hospital admission was 1·10 (0·85-1·42) in those younger than 10 years, decreasing to 0·25 (0·21-0·30) in 60-69-year-olds, and then increasing to 0·47 (0·40-0·56) in those aged at least 80 years. For both variants, past infection gave some protection against death both in vaccinated (HR 0·47 [0·32-0·68]) and unvaccinated (0·18 [0·06-0·57]) cases. In vaccinated cases, past infection offered no additional protection against hospital admission beyond that provided by vaccination (HR 0·96 [0·88-1·04]); however, for unvaccinated cases, past infection gave moderate protection (HR 0·55 [0·48-0·63]). Omicron versus delta HR estimates were lower for hospital admission (0·30 [0·28-0·32]) in unvaccinated cases than the corresponding HR estimated for all cases in the primary analysis. Booster vaccination with an mRNA vaccine was highly protective against hospitalisation and death in omicron cases (HR for hospital admission 8-11 weeks post-booster vs unvaccinated: 0·22 [0·20-0·24]), with the protection afforded after a booster not being affected by the vaccine used for doses 1 and 2. INTERPRETATION: The risk of severe outcomes following SARS-CoV-2 infection is substantially lower for omicron than for delta, with higher reductions for more severe endpoints and significant variation with age. Underlying the observed risks is a larger reduction in intrinsic severity (in unvaccinated individuals) counterbalanced by a reduction in vaccine effectiveness. Documented previous SARS-CoV-2 infection offered some protection against hospitalisation and high protection against death in unvaccinated individuals, but only offered additional protection in vaccinated individuals for the death endpoint. Booster vaccination with mRNA vaccines maintains over 70% protection against hospitalisation and death in breakthrough confirmed omicron infections. FUNDING: Medical Research Council, UK Research and Innovation, Department of Health and Social Care, National Institute for Health Research, Community Jameel, and Engineering and Physical Sciences Research Council.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/epidemiologia , COVID-19/prevenção & controle , Estudos de Coortes , Inglaterra/epidemiologia , Hospitalização , Humanos , Vacinas Sintéticas , Vacinas de mRNA
4.
Stat Med ; 42(13): 2191-2225, 2023 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-37086186

RESUMO

Longitudinal observational data on patients can be used to investigate causal effects of time-varying treatments on time-to-event outcomes. Several methods have been developed for estimating such effects by controlling for the time-dependent confounding that typically occurs. The most commonly used is marginal structural models (MSM) estimated using inverse probability of treatment weights (IPTW) (MSM-IPTW). An alternative, the sequential trials approach, is increasingly popular, and involves creating a sequence of "trials" from new time origins and comparing treatment initiators and non-initiators. Individuals are censored when they deviate from their treatment assignment at the start of each "trial" (initiator or noninitiator), which is accounted for using inverse probability of censoring weights. The analysis uses data combined across trials. We show that the sequential trials approach can estimate the parameters of a particular MSM. The causal estimand that we focus on is the marginal risk difference between the sustained treatment strategies of "always treat" vs "never treat." We compare how the sequential trials approach and MSM-IPTW estimate this estimand, and discuss their assumptions and how data are used differently. The performance of the two approaches is compared in a simulation study. The sequential trials approach, which tends to involve less extreme weights than MSM-IPTW, results in greater efficiency for estimating the marginal risk difference at most follow-up times, but this can, in certain scenarios, be reversed at later time points and relies on modelling assumptions. We apply the methods to longitudinal observational data from the UK Cystic Fibrosis Registry to estimate the effect of dornase alfa on survival.


Assuntos
Modelos Estatísticos , Humanos , Causalidade , Modelos Estruturais , Probabilidade , Análise de Sobrevida , Resultado do Tratamento , Estudos Longitudinais
5.
J Infect Dis ; 226(5): 808-811, 2022 09 13.
Artigo em Inglês | MEDLINE | ID: mdl-35184201

RESUMO

To investigate if the AY.4.2 sublineage of the SARS-CoV-2 delta variant is associated with hospitalization and mortality risks that differ from non-AY.4.2 delta risks, we performed a retrospective cohort study of sequencing-confirmed COVID-19 cases in England based on linkage of routine health care datasets. Using stratified Cox regression, we estimated adjusted hazard ratios (aHR) of hospital admission (aHR = 0.85; 95% confidence interval [CI], .77-.94), hospital admission or emergency care attendance (aHR = 0.87; 95% CI, .81-.94), and COVID-19 mortality (aHR = 0.85; 95% CI, .71-1.03). The results indicate that the risks of hospitalization and mortality are similar or lower for AY.4.2 compared to cases with other delta sublineages.


Assuntos
COVID-19 , SARS-CoV-2 , Hospitalização , Humanos , Estudos Retrospectivos
6.
Stat Med ; 40(16): 3779-3790, 2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33942919

RESUMO

Using data from observational studies to estimate the causal effect of a time-varying exposure, repeatedly measured over time, on an outcome of interest requires careful adjustment for confounding. Standard regression adjustment for observed time-varying confounders is unsuitable, as it can eliminate part of the causal effect and induce bias. Inverse probability weighting, g-computation, and g-estimation have been proposed as being more suitable methods. G-estimation has some advantages over the other two methods, but until recently there has been a lack of flexible g-estimation methods for a survival time outcome. The recently proposed Structural Nested Cumulative Survival Time Model (SNCSTM) is such a method. Efficient estimation of the parameters of this model required bespoke software. In this article we show how the SNCSTM can be fitted efficiently via g-estimation using standard software for fitting generalised linear models. The ability to implement g-estimation for a survival outcome using standard statistical software greatly increases the potential uptake of this method. We illustrate the use of this method of fitting the SNCSTM by reanalyzing data from the UK Cystic Fibrosis Registry, and provide example R code to facilitate the use of this approach by other researchers.


Assuntos
Modelos Estatísticos , Viés , Causalidade , Humanos , Modelos Lineares , Probabilidade
7.
Biom J ; 63(7): 1526-1541, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33983641

RESUMO

Observational longitudinal data on treatments and covariates are increasingly used to investigate treatment effects, but are often subject to time-dependent confounding. Marginal structural models (MSMs), estimated using inverse probability of treatment weighting or the g-formula, are popular for handling this problem. With increasing development of advanced causal inference methods, it is important to be able to assess their performance in different scenarios to guide their application. Simulation studies are a key tool for this, but their use to evaluate causal inference methods has been limited. This paper focuses on the use of simulations for evaluations involving MSMs in studies with a time-to-event outcome. In a simulation, it is important to be able to generate the data in such a way that the correct forms of any models to be fitted to those data are known. However, this is not straightforward in the longitudinal setting because it is natural for data to be generated in a sequential conditional manner, whereas MSMs involve fitting marginal rather than conditional hazard models. We provide general results that enable the form of the correctly specified MSM to be derived based on a conditional data generating procedure, and show how the results can be applied when the conditional hazard model is an Aalen additive hazard or Cox model. Using conditional additive hazard models is advantageous because they imply additive MSMs that can be fitted using standard software. We describe and illustrate a simulation algorithm. Our results will help researchers to effectively evaluate causal inference methods via simulation.


Assuntos
Modelos Estatísticos , Simulação por Computador , Modelos Estruturais , Modelos de Riscos Proporcionais
8.
Biometrics ; 76(2): 472-483, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31562652

RESUMO

Accounting for time-varying confounding when assessing the causal effects of time-varying exposures on survival time is challenging. Standard survival methods that incorporate time-varying confounders as covariates generally yield biased effect estimates. Estimators using weighting by inverse probability of exposure can be unstable when confounders are highly predictive of exposure or the exposure is continuous. Structural nested accelerated failure time models (AFTMs) require artificial recensoring, which can cause estimation difficulties. Here, we introduce the structural nested cumulative survival time model (SNCSTM). This model assumes that intervening to set exposure at time t to zero has an additive effect on the subsequent conditional hazard given exposure and confounder histories when all subsequent exposures have already been set to zero. We show how to fit it using standard software for generalized linear models and describe two more efficient, double robust, closed-form estimators. All three estimators avoid the artificial recensoring of AFTMs and the instability of estimators that use weighting by the inverse probability of exposure. We examine the performance of our estimators using a simulation study and illustrate their use on data from the UK Cystic Fibrosis Registry. The SNCSTM is compared with a recently proposed structural nested cumulative failure time model, and several advantages of the former are identified.


Assuntos
Modelos Estatísticos , Análise de Sobrevida , Biometria , Simulação por Computador , Intervalos de Confiança , Fatores de Confusão Epidemiológicos , Fibrose Cística/tratamento farmacológico , Fibrose Cística/mortalidade , Desoxirribonucleases/uso terapêutico , Humanos , Modelos Lineares , Modelos de Riscos Proporcionais , Sistema de Registros/estatística & dados numéricos , Fatores de Tempo , Reino Unido/epidemiologia
9.
Stat Med ; 39(22): 2921-2935, 2020 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-32677726

RESUMO

We develop and demonstrate methods to perform sensitivity analyses to assess sensitivity to plausible departures from missing at random in incomplete repeated binary outcome data. We use multiple imputation in the not at random fully conditional specification framework, which includes one or more sensitivity parameters (SPs) for each incomplete variable. The use of an online elicitation questionnaire is demonstrated to obtain expert opinion on the SPs, and highest prior density regions are used alongside opinion pooling methods to display credible regions for SPs. We demonstrate that substantive conclusions can be far more sensitive to departures from the missing at random assumption (MAR) when control and intervention nonresponders depart from MAR differently, and show that the correlation of arm specific SPs in expert opinion is particularly important. We illustrate these methods on the iQuit in Practice smoking cessation trial, which compared the impact of a tailored text messaging system versus standard care on smoking cessation. We show that conclusions about the effect of intervention on smoking cessation outcomes at 8 week and 6 months are broadly insensitive to departures from MAR, with conclusions significantly affected only when the differences in behavior between the nonresponders in the two trial arms is larger than expert opinion judges to be realistic.


Assuntos
Projetos de Pesquisa , Abandono do Hábito de Fumar , Interpretação Estatística de Dados , Humanos , Inquéritos e Questionários
10.
Stat Med ; 39(11): 1641-1657, 2020 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32103533

RESUMO

Electronic health records are a valuable data source for investigating health-related questions, and propensity score analysis has become an increasingly popular approach to address confounding bias in such investigations. However, because electronic health records are typically routinely recorded as part of standard clinical care, there are often missing values, particularly for potential confounders. In our motivating study-using electronic health records to investigate the effect of renin-angiotensin system blockers on the risk of acute kidney injury-two key confounders, ethnicity and chronic kidney disease stage, have 59% and 53% missing data, respectively. The missingness pattern approach (MPA), a variant of the missing indicator approach, has been proposed as a method for handling partially observed confounders in propensity score analysis. In the MPA, propensity scores are estimated separately for each missingness pattern present in the data. Although the assumptions underlying the validity of the MPA are stated in the literature, it can be difficult in practice to assess their plausibility. In this article, we explore the MPA's underlying assumptions by using causal diagrams to assess their plausibility in a range of simple scenarios, drawing general conclusions about situations in which they are likely to be violated. We present a framework providing practical guidance for assessing whether the MPA's assumptions are plausible in a particular setting and thus deciding when the MPA is appropriate. We apply our framework to our motivating study, showing that the MPA's underlying assumptions appear reasonable, and we demonstrate the application of MPA to this study.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Viés , Causalidade , Pontuação de Propensão
11.
Biostatistics ; 19(4): 407-425, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-29028922

RESUMO

Cohort data are often incomplete because some subjects drop out of the study, and inverse probability weighting (IPW), multiple imputation (MI), and linear increments (LI) are methods that deal with such missing data. In cohort studies of ageing, missing data can arise from dropout or death. Methods that do not distinguish between these reasons for missingness typically provide inference about a hypothetical cohort where no one can die (immortal cohort). It has been suggested that inference about the cohort composed of those who are still alive at any time point (partly conditional inference) may be more meaningful. MI, LI, and IPW can all be adapted to provide partly conditional inference. In this article, we clarify and compare the assumptions required by these MI, LI, and IPW methods for partly conditional inference on continuous outcomes. We also propose augmented IPW estimators for making partly conditional inference. These are more efficient than IPW estimators and more robust to model misspecification. Our simulation studies show that the methods give approximately unbiased estimates of partly conditional estimands when their assumptions are met, but may be biased otherwise. We illustrate the application of the missing data methods using data from the 'Origins of Variance in the Old-old' Twin study.


Assuntos
Pesquisa Biomédica/métodos , Bioestatística/métodos , Estudos de Coortes , Interpretação Estatística de Dados , Modelos Estatísticos , Projetos de Pesquisa , Humanos
12.
Epidemiology ; 30(1): 29-37, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30234550

RESUMO

BACKGROUND: Cystic fibrosis (CF) is an inherited, chronic, progressive condition affecting around 10,000 individuals in the United Kingdom and over 70,000 worldwide. Survival in CF has improved considerably over recent decades, and it is important to provide up-to-date information on patient prognosis. METHODS: The UK Cystic Fibrosis Registry is a secure centralized database, which collects annual data on almost all CF patients in the United Kingdom. Data from 43,592 annual records from 2005 to 2015 on 6181 individuals were used to develop a dynamic survival prediction model that provides personalized estimates of survival probabilities given a patient's current health status using 16 predictors. We developed the model using the landmarking approach, giving predicted survival curves up to 10 years from 18 to 50 years of age. We compared several models using cross-validation. RESULTS: The final model has good discrimination (C-indexes: 0.873, 0.843, and 0.804 for 2-, 5-, and 10-year survival prediction) and low prediction error (Brier scores: 0.036, 0.076, and 0.133). It identifies individuals at low and high risk of short- and long-term mortality based on their current status. For patients 20 years of age during 2013-2015, for example, over 80% had a greater than 95% probability of 2-year survival and 40% were predicted to survive 10 years or more. CONCLUSIONS: Dynamic personalized prediction models can guide treatment decisions and provide personalized information for patients. Our application illustrates the utility of the landmarking approach for making the best use of longitudinal and survival data and shows how models can be defined and compared in terms of predictive performance.


Assuntos
Fibrose Cística/mortalidade , Modelos Estatísticos , Adulto , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Probabilidade , Prognóstico , Sistema de Registros , Reino Unido/epidemiologia
13.
Stat Sci ; 33(2): 184-197, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29731541

RESUMO

Most methods for handling incomplete data can be broadly classified as inverse probability weighting (IPW) strategies or imputation strategies. The former model the occurrence of incomplete data; the latter, the distribution of the missing variables given observed variables in each missingness pattern. Imputation strategies are typically more efficient, but they can involve extrapolation, which is difficult to diagnose and can lead to large bias. Double robust (DR) methods combine the two approaches. They are typically more efficient than IPW and more robust to model misspecification than imputation. We give a formal introduction to DR estimation of the mean of a partially observed variable, before moving to more general incomplete-data scenarios. We review strategies to improve the performance of DR estimators under model misspecification, reveal connections between DR estimators for incomplete data and 'design-consistent' estimators used in sample surveys, and explain the value of double robustness when using flexible data-adaptive methods for IPW or imputation.

14.
Biometrics ; 74(4): 1427-1437, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-29772074

RESUMO

We propose semi-parametric methods to model cohort data where repeated outcomes may be missing due to death and non-ignorable dropout. Our focus is to obtain inference about the cohort composed of those who are still alive at any time point (partly conditional inference). We propose: i) an inverse probability weighted method that upweights observed subjects to represent subjects who are still alive but are not observed; ii) an outcome regression method that replaces missing outcomes of subjects who are alive with their conditional mean outcomes given past observed data; and iii) an augmented inverse probability method that combines the previous two methods and is double robust against model misspecification. These methods are described for both monotone and non-monotone missing data patterns, and are applied to a cohort of elderly adults from the Health and Retirement Study. Sensitivity analysis to departures from the assumption that missingness at some visit t is independent of the outcome at visit t given past observed data and time of death is used in the data application.


Assuntos
Biometria/métodos , Simulação por Computador/estatística & dados numéricos , Análise de Regressão , Idoso , Idoso de 80 Anos ou mais , Viés , Estudos de Coortes , Morte , Humanos , Estudos Longitudinais , Probabilidade
15.
Biometrics ; 74(4): 1438-1449, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-29870056

RESUMO

The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston (2009) and the "substantive model compatible" (MI-SMC) method of Bartlett et al. (2015). We also apply the "MI matched set" approach of Seaman and Keogh (2015) to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.


Assuntos
Biometria/métodos , Estudos de Casos e Controles , Estudos de Coortes , Simulação por Computador/estatística & dados numéricos , Interpretação Estatística de Dados , Humanos
16.
Stat Med ; 35(9): 1423-40, 2016 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-26576494

RESUMO

In randomised controlled trials of treatments for late-stage cancer, it is common for control arm patients to receive the experimental treatment around the point of disease progression. This treatment switching can dilute the estimated treatment effect on overall survival and impact the assessment of a treatment's benefit on health economic evaluations. The rank-preserving structural failure time model of Robins and Tsiatis (Comm. Stat., 20:2609-2631) offers a potential solution to this problem and is typically implemented using the logrank test. However, in the presence of substantial switching, this test can have low power because the hazard ratio is not constant over time. Schoenfeld (Biometrika, 68:316-319) showed that when the hazard ratio is not constant, weighted versions of the logrank test become optimal. We present a weighted logrank test statistic for the late stage cancer trial context given the treatment switching pattern and working assumptions about the underlying hazard function in the population. Simulations suggest that the weighted approach can lead to large efficiency gains in either an intention-to-treat or a causal rank-preserving structural failure time model analysis compared with the unweighted approach. Furthermore, violation of the working assumptions used in the derivation of the weights only affects the efficiency of the estimates and does not induce bias or inflate the type I error rate. The weighted logrank test statistic should therefore be considered for use as part of a careful secondary, exploratory analysis of trial data affected by substantial treatment switching.


Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Neoplasias/terapia , Humanos , Análise de Intenção de Tratamento , Modelos de Riscos Proporcionais , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Resultado do Tratamento
17.
Stat Med ; 35(7): 1159-77, 2016 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-26514699

RESUMO

Risk prediction models are used to predict a clinical outcome for patients using a set of predictors. We focus on predicting low-dimensional binary outcomes typically arising in epidemiology, health services and public health research where logistic regression is commonly used. When the number of events is small compared with the number of regression coefficients, model overfitting can be a serious problem. An overfitted model tends to demonstrate poor predictive accuracy when applied to new data. We review frequentist and Bayesian shrinkage methods that may alleviate overfitting by shrinking the regression coefficients towards zero (some methods can also provide more parsimonious models by omitting some predictors). We evaluated their predictive performance in comparison with maximum likelihood estimation using real and simulated data. The simulation study showed that maximum likelihood estimation tends to produce overfitted models with poor predictive performance in scenarios with few events, and penalised methods can offer improvement. Ridge regression performed well, except in scenarios with many noise predictors. Lasso performed better than ridge in scenarios with many noise predictors and worse in the presence of correlated predictors. Elastic net, a hybrid of the two, performed well in all scenarios. Adaptive lasso and smoothly clipped absolute deviation performed best in scenarios with many noise predictors; in other scenarios, their performance was inferior to that of ridge and lasso. Bayesian approaches performed well when the hyperparameters for the priors were chosen carefully. Their use may aid variable selection, and they can be easily extended to clustered-data settings and to incorporate external information.


Assuntos
Modelos Estatísticos , Análise de Regressão , Teorema de Bayes , Viés , Bioestatística , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Funções Verossimilhança , Modelos Logísticos , Masculino , Neoplasias Penianas/mortalidade , Prognóstico , Fatores de Risco
18.
Biometrics ; 71(4): 1150-9, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26237003

RESUMO

Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin's Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data.


Assuntos
Estudos de Casos e Controles , Interpretação Estatística de Dados , Biometria/métodos , Neoplasias Colorretais/etiologia , Simulação por Computador , Intervalos de Confiança , Fibras na Dieta/administração & dosagem , Doença/etiologia , Humanos , Modelos Estatísticos , Razão de Chances , Fatores de Risco
19.
BMC Med Res Methodol ; 15: 59, 2015 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-26242875

RESUMO

BACKGROUND: Clustered data with binary outcomes are often analysed using random intercepts models or generalised estimating equations (GEE) resulting in cluster-specific or 'population-average' inference, respectively. METHODS: When a random effects model is fitted to clustered data, predictions may be produced for a member of an existing cluster by using estimates of the fixed effects (regression coefficients) and the random effect for the cluster (conditional risk calculation), or for a member of a new cluster (marginal risk calculation). We focus on the second. Marginal risk calculation from a random effects model is obtained by integrating over the distribution of random effects. However, in practice marginal risks are often obtained, incorrectly, using only estimates of the fixed effects (i.e. by effectively setting the random effects to zero). We compare these two approaches to marginal risk calculation in terms of model calibration. RESULTS: In simulation studies, it has been seen that use of the incorrect marginal risk calculation from random effects models results in poorly calibrated overall marginal predictions (calibration slope <1 and calibration in the large ≠ 0) with mis-calibration becoming worse with higher degrees of clustering. We clarify that this was due to the incorrect calculation of marginal predictions from a random intercepts model and explain intuitively why this approach is incorrect. We show via simulation that the correct calculation of marginal risks from a random intercepts model results in predictions with excellent calibration. CONCLUSION: The logistic random intercepts model can be used to obtain valid marginal predictions by integrating over the distribution of random effects.


Assuntos
Algoritmos , Interpretação Estatística de Dados , Modelos Logísticos , Avaliação de Resultados em Cuidados de Saúde/métodos , Pesquisa Biomédica/métodos , Calibragem , Análise por Conglomerados , Simulação por Computador , Humanos , Reprodutibilidade dos Testes
20.
Paediatr Perinat Epidemiol ; 29(6): 567-75, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26332368

RESUMO

BACKGROUND: Informative birth size occurs when the average outcome depends on the number of infants per birth. Although analysis methods have been proposed for handling informative birth size, their performance is not well understood. Our aim was to evaluate the performance of these methods and to provide recommendations for their application in randomised trials including infants from single and multiple births. METHODS: Three generalised estimating equation (GEE) approaches were considered for estimating the effect of treatment on a continuous or binary outcome: cluster weighted GEEs, which produce treatment effects with a mother-level interpretation when birth size is informative; standard GEEs with an independence working correlation structure, which produce treatment effects with an infant-level interpretation when birth size is informative; and standard GEEs with an exchangeable working correlation structure, which do not account for informative birth size. The methods were compared through simulation and analysis of an example dataset. RESULTS: Treatment effect estimates were affected by informative birth size in the simulation study when the effect of treatment in singletons differed from that in multiples (i.e. in the presence of a treatment group by multiple birth interaction). The strength of evidence supporting the effectiveness of treatment varied between methods in the example dataset. CONCLUSIONS: Informative birth size is always a possibility in randomised trials including infants from both single and multiple births, and analysis methods should be pre-specified with this in mind. We recommend estimating treatment effects using standard GEEs with an independence working correlation structure to give an infant-level interpretation.


Assuntos
Retardo do Crescimento Fetal/epidemiologia , Recém-Nascido de Baixo Peso , Recém-Nascido Prematuro , Gravidez Múltipla/estatística & dados numéricos , Nascimento Prematuro/epidemiologia , Adulto , Feminino , Humanos , Recém-Nascido , Masculino , Vigilância da População , Gravidez , Ensaios Clínicos Controlados Aleatórios como Assunto , Padrões de Referência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA