RESUMO
Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England's Test and Trace programme for COVID-19.
RESUMO
BACKGROUND: Risk prediction models are routinely used to assist in clinical decision making. A small sample size for model development can compromise model performance when the model is applied to new patients. For binary outcomes, the calibration slope (CS) and the mean absolute prediction error (MAPE) are two key measures on which sample size calculations for the development of risk models have been based. CS quantifies the degree of model overfitting while MAPE assesses the accuracy of individual predictions. METHODS: Recently, two formulae were proposed to calculate the sample size required, given anticipated features of the development data such as the outcome prevalence and c-statistic, to ensure that the expectation of the CS and MAPE (over repeated samples) in models fitted using MLE will meet prespecified target values. In this article, we use a simulation study to evaluate the performance of these formulae. RESULTS: We found that both formulae work reasonably well when the anticipated model strength is not too high (c-statistic < 0.8), regardless of the outcome prevalence. However, for higher model strengths the CS formula underestimates the sample size substantially. For example, for c-statistic = 0.85 and 0.9, the sample size needed to be increased by at least 50% and 100%, respectively, to meet the target expected CS. On the other hand, the MAPE formula tends to overestimate the sample size for high model strengths. These conclusions were more pronounced for higher prevalence than for lower prevalence. Similar results were drawn when the outcome was time to event with censoring. Given these findings, we propose a simulation-based approach, implemented in the new R package 'samplesizedev', to correctly estimate the sample size even for high model strengths. The software can also calculate the variability in CS and MAPE, thus allowing for assessment of model stability. CONCLUSIONS: The calibration and MAPE formulae suggest sample sizes that are generally appropriate for use when the model strength is not too high. However, they tend to be biased for higher model strengths, which are not uncommon in clinical risk prediction studies. On those occasions, our proposed adjustments to the sample size calculations will be relevant.
Assuntos
Modelos Estatísticos , Humanos , Tamanho da Amostra , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Simulação por Computador , AlgoritmosRESUMO
Importance: Previous studies on the comparative effectiveness between buprenorphine and methadone provided limited evidence on differences in treatment effects across key subgroups and were drawn from populations who use primarily heroin or prescription opioids, although fentanyl use is increasing across North America. Objective: To assess the risk of treatment discontinuation and mortality among individuals receiving buprenorphine/naloxone vs methadone for the treatment of opioid use disorder. Design, Setting, and Participants: Population-based retrospective cohort study using linked health administrative databases in British Columbia, Canada. The study included treatment recipients between January 1, 2010, and March 17, 2020, who were 18 years or older and not incarcerated, pregnant, or receiving palliative cancer care at initiation. Exposures: Receipt of buprenorphine/naloxone or methadone among incident (first-time) users and prevalent new users (including first and subsequent treatment attempts). Main Outcomes and Measures: Hazard ratios (HRs) with 95% compatibility (confidence) intervals were estimated for treatment discontinuation (lasting ≥5 days for methadone and ≥6 days for buprenorphine/naloxone) and all-cause mortality within 24 months using discrete-time survival models for comparisons of medications as assigned at initiation regardless of treatment adherence ("initiator") and received according to dosing guidelines (approximating per-protocol analysis). Results: A total of 30â¯891 incident users (39% receiving buprenorphine/naloxone; 66% male; median age, 33 [25th-75th, 26-43] years) were included in the initiator analysis and 25â¯614 in the per-protocol analysis. Incident users of buprenorphine/naloxone had a higher risk of treatment discontinuation compared with methadone in initiator analyses (88.8% vs 81.5% discontinued at 24 months; adjusted HR, 1.58 [95% CI, 1.53-1.63]), with limited change in estimates when evaluated at optimal dose in per-protocol analysis (42.1% vs 30.7%; adjusted HR, 1.67 [95% CI, 1.58-1.76]). Per-protocol analyses of mortality while receiving treatment exhibited ambiguous results among incident users (0.08% vs 0.13% mortality at 24 months; adjusted HR, 0.57 [95% CI, 0.24-1.35]) and among prevalent users (0.08% vs 0.09%; adjusted HR, 0.97 [95% CI, 0.54-1.73]). Results were consistent after the introduction of fentanyl and across patient subgroups and sensitivity analyses. Conclusions and Relevance: Receipt of methadone was associated with a lower risk of treatment discontinuation compared with buprenorphine/naloxone. The risk of mortality while receiving treatment was similar for buprenorphine/naloxone and methadone, although the CI estimate for the hazard ratio was wide.
RESUMO
BACKGROUND: The omicron variant (B.1.1.529) of SARS-CoV-2 has demonstrated partial vaccine escape and high transmissibility, with early studies indicating lower severity of infection than that of the delta variant (B.1.617.2). We aimed to better characterise omicron severity relative to delta by assessing the relative risk of hospital attendance, hospital admission, or death in a large national cohort. METHODS: Individual-level data on laboratory-confirmed COVID-19 cases resident in England between Nov 29, 2021, and Jan 9, 2022, were linked to routine datasets on vaccination status, hospital attendance and admission, and mortality. The relative risk of hospital attendance or admission within 14 days, or death within 28 days after confirmed infection, was estimated using proportional hazards regression. Analyses were stratified by test date, 10-year age band, ethnicity, residential region, and vaccination status, and were further adjusted for sex, index of multiple deprivation decile, evidence of a previous infection, and year of age within each age band. A secondary analysis estimated variant-specific and vaccine-specific vaccine effectiveness and the intrinsic relative severity of omicron infection compared with delta (ie, the relative risk in unvaccinated cases). FINDINGS: The adjusted hazard ratio (HR) of hospital attendance (not necessarily resulting in admission) with omicron compared with delta was 0·56 (95% CI 0·54-0·58); for hospital admission and death, HR estimates were 0·41 (0·39-0·43) and 0·31 (0·26-0·37), respectively. Omicron versus delta HR estimates varied with age for all endpoints examined. The adjusted HR for hospital admission was 1·10 (0·85-1·42) in those younger than 10 years, decreasing to 0·25 (0·21-0·30) in 60-69-year-olds, and then increasing to 0·47 (0·40-0·56) in those aged at least 80 years. For both variants, past infection gave some protection against death both in vaccinated (HR 0·47 [0·32-0·68]) and unvaccinated (0·18 [0·06-0·57]) cases. In vaccinated cases, past infection offered no additional protection against hospital admission beyond that provided by vaccination (HR 0·96 [0·88-1·04]); however, for unvaccinated cases, past infection gave moderate protection (HR 0·55 [0·48-0·63]). Omicron versus delta HR estimates were lower for hospital admission (0·30 [0·28-0·32]) in unvaccinated cases than the corresponding HR estimated for all cases in the primary analysis. Booster vaccination with an mRNA vaccine was highly protective against hospitalisation and death in omicron cases (HR for hospital admission 8-11 weeks post-booster vs unvaccinated: 0·22 [0·20-0·24]), with the protection afforded after a booster not being affected by the vaccine used for doses 1 and 2. INTERPRETATION: The risk of severe outcomes following SARS-CoV-2 infection is substantially lower for omicron than for delta, with higher reductions for more severe endpoints and significant variation with age. Underlying the observed risks is a larger reduction in intrinsic severity (in unvaccinated individuals) counterbalanced by a reduction in vaccine effectiveness. Documented previous SARS-CoV-2 infection offered some protection against hospitalisation and high protection against death in unvaccinated individuals, but only offered additional protection in vaccinated individuals for the death endpoint. Booster vaccination with mRNA vaccines maintains over 70% protection against hospitalisation and death in breakthrough confirmed omicron infections. FUNDING: Medical Research Council, UK Research and Innovation, Department of Health and Social Care, National Institute for Health Research, Community Jameel, and Engineering and Physical Sciences Research Council.
Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/epidemiologia , COVID-19/prevenção & controle , Estudos de Coortes , Inglaterra/epidemiologia , Hospitalização , Humanos , Vacinas Sintéticas , Vacinas de mRNARESUMO
Longitudinal observational data on patients can be used to investigate causal effects of time-varying treatments on time-to-event outcomes. Several methods have been developed for estimating such effects by controlling for the time-dependent confounding that typically occurs. The most commonly used is marginal structural models (MSM) estimated using inverse probability of treatment weights (IPTW) (MSM-IPTW). An alternative, the sequential trials approach, is increasingly popular, and involves creating a sequence of "trials" from new time origins and comparing treatment initiators and non-initiators. Individuals are censored when they deviate from their treatment assignment at the start of each "trial" (initiator or noninitiator), which is accounted for using inverse probability of censoring weights. The analysis uses data combined across trials. We show that the sequential trials approach can estimate the parameters of a particular MSM. The causal estimand that we focus on is the marginal risk difference between the sustained treatment strategies of "always treat" vs "never treat." We compare how the sequential trials approach and MSM-IPTW estimate this estimand, and discuss their assumptions and how data are used differently. The performance of the two approaches is compared in a simulation study. The sequential trials approach, which tends to involve less extreme weights than MSM-IPTW, results in greater efficiency for estimating the marginal risk difference at most follow-up times, but this can, in certain scenarios, be reversed at later time points and relies on modelling assumptions. We apply the methods to longitudinal observational data from the UK Cystic Fibrosis Registry to estimate the effect of dornase alfa on survival.
Assuntos
Modelos Estatísticos , Humanos , Causalidade , Modelos Estruturais , Probabilidade , Análise de Sobrevida , Resultado do Tratamento , Estudos LongitudinaisRESUMO
To investigate if the AY.4.2 sublineage of the SARS-CoV-2 delta variant is associated with hospitalization and mortality risks that differ from non-AY.4.2 delta risks, we performed a retrospective cohort study of sequencing-confirmed COVID-19 cases in England based on linkage of routine health care datasets. Using stratified Cox regression, we estimated adjusted hazard ratios (aHR) of hospital admission (aHRâ =â 0.85; 95% confidence interval [CI], .77-.94), hospital admission or emergency care attendance (aHRâ =â 0.87; 95% CI, .81-.94), and COVID-19 mortality (aHRâ =â 0.85; 95% CI, .71-1.03). The results indicate that the risks of hospitalization and mortality are similar or lower for AY.4.2 compared to cases with other delta sublineages.
Assuntos
COVID-19 , SARS-CoV-2 , Hospitalização , Humanos , Estudos RetrospectivosRESUMO
Using data from observational studies to estimate the causal effect of a time-varying exposure, repeatedly measured over time, on an outcome of interest requires careful adjustment for confounding. Standard regression adjustment for observed time-varying confounders is unsuitable, as it can eliminate part of the causal effect and induce bias. Inverse probability weighting, g-computation, and g-estimation have been proposed as being more suitable methods. G-estimation has some advantages over the other two methods, but until recently there has been a lack of flexible g-estimation methods for a survival time outcome. The recently proposed Structural Nested Cumulative Survival Time Model (SNCSTM) is such a method. Efficient estimation of the parameters of this model required bespoke software. In this article we show how the SNCSTM can be fitted efficiently via g-estimation using standard software for fitting generalised linear models. The ability to implement g-estimation for a survival outcome using standard statistical software greatly increases the potential uptake of this method. We illustrate the use of this method of fitting the SNCSTM by reanalyzing data from the UK Cystic Fibrosis Registry, and provide example R code to facilitate the use of this approach by other researchers.
Assuntos
Modelos Estatísticos , Viés , Causalidade , Humanos , Modelos Lineares , ProbabilidadeRESUMO
Observational longitudinal data on treatments and covariates are increasingly used to investigate treatment effects, but are often subject to time-dependent confounding. Marginal structural models (MSMs), estimated using inverse probability of treatment weighting or the g-formula, are popular for handling this problem. With increasing development of advanced causal inference methods, it is important to be able to assess their performance in different scenarios to guide their application. Simulation studies are a key tool for this, but their use to evaluate causal inference methods has been limited. This paper focuses on the use of simulations for evaluations involving MSMs in studies with a time-to-event outcome. In a simulation, it is important to be able to generate the data in such a way that the correct forms of any models to be fitted to those data are known. However, this is not straightforward in the longitudinal setting because it is natural for data to be generated in a sequential conditional manner, whereas MSMs involve fitting marginal rather than conditional hazard models. We provide general results that enable the form of the correctly specified MSM to be derived based on a conditional data generating procedure, and show how the results can be applied when the conditional hazard model is an Aalen additive hazard or Cox model. Using conditional additive hazard models is advantageous because they imply additive MSMs that can be fitted using standard software. We describe and illustrate a simulation algorithm. Our results will help researchers to effectively evaluate causal inference methods via simulation.
Assuntos
Modelos Estatísticos , Simulação por Computador , Modelos Estruturais , Modelos de Riscos ProporcionaisRESUMO
We develop and demonstrate methods to perform sensitivity analyses to assess sensitivity to plausible departures from missing at random in incomplete repeated binary outcome data. We use multiple imputation in the not at random fully conditional specification framework, which includes one or more sensitivity parameters (SPs) for each incomplete variable. The use of an online elicitation questionnaire is demonstrated to obtain expert opinion on the SPs, and highest prior density regions are used alongside opinion pooling methods to display credible regions for SPs. We demonstrate that substantive conclusions can be far more sensitive to departures from the missing at random assumption (MAR) when control and intervention nonresponders depart from MAR differently, and show that the correlation of arm specific SPs in expert opinion is particularly important. We illustrate these methods on the iQuit in Practice smoking cessation trial, which compared the impact of a tailored text messaging system versus standard care on smoking cessation. We show that conclusions about the effect of intervention on smoking cessation outcomes at 8 week and 6 months are broadly insensitive to departures from MAR, with conclusions significantly affected only when the differences in behavior between the nonresponders in the two trial arms is larger than expert opinion judges to be realistic.
Assuntos
Projetos de Pesquisa , Abandono do Hábito de Fumar , Interpretação Estatística de Dados , Humanos , Inquéritos e QuestionáriosRESUMO
Cohort data are often incomplete because some subjects drop out of the study, and inverse probability weighting (IPW), multiple imputation (MI), and linear increments (LI) are methods that deal with such missing data. In cohort studies of ageing, missing data can arise from dropout or death. Methods that do not distinguish between these reasons for missingness typically provide inference about a hypothetical cohort where no one can die (immortal cohort). It has been suggested that inference about the cohort composed of those who are still alive at any time point (partly conditional inference) may be more meaningful. MI, LI, and IPW can all be adapted to provide partly conditional inference. In this article, we clarify and compare the assumptions required by these MI, LI, and IPW methods for partly conditional inference on continuous outcomes. We also propose augmented IPW estimators for making partly conditional inference. These are more efficient than IPW estimators and more robust to model misspecification. Our simulation studies show that the methods give approximately unbiased estimates of partly conditional estimands when their assumptions are met, but may be biased otherwise. We illustrate the application of the missing data methods using data from the 'Origins of Variance in the Old-old' Twin study.
Assuntos
Pesquisa Biomédica/métodos , Bioestatística/métodos , Estudos de Coortes , Interpretação Estatística de Dados , Modelos Estatísticos , Projetos de Pesquisa , HumanosRESUMO
BACKGROUND: Cystic fibrosis (CF) is an inherited, chronic, progressive condition affecting around 10,000 individuals in the United Kingdom and over 70,000 worldwide. Survival in CF has improved considerably over recent decades, and it is important to provide up-to-date information on patient prognosis. METHODS: The UK Cystic Fibrosis Registry is a secure centralized database, which collects annual data on almost all CF patients in the United Kingdom. Data from 43,592 annual records from 2005 to 2015 on 6181 individuals were used to develop a dynamic survival prediction model that provides personalized estimates of survival probabilities given a patient's current health status using 16 predictors. We developed the model using the landmarking approach, giving predicted survival curves up to 10 years from 18 to 50 years of age. We compared several models using cross-validation. RESULTS: The final model has good discrimination (C-indexes: 0.873, 0.843, and 0.804 for 2-, 5-, and 10-year survival prediction) and low prediction error (Brier scores: 0.036, 0.076, and 0.133). It identifies individuals at low and high risk of short- and long-term mortality based on their current status. For patients 20 years of age during 2013-2015, for example, over 80% had a greater than 95% probability of 2-year survival and 40% were predicted to survive 10 years or more. CONCLUSIONS: Dynamic personalized prediction models can guide treatment decisions and provide personalized information for patients. Our application illustrates the utility of the landmarking approach for making the best use of longitudinal and survival data and shows how models can be defined and compared in terms of predictive performance.
Assuntos
Fibrose Cística/mortalidade , Modelos Estatísticos , Adulto , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Probabilidade , Prognóstico , Sistema de Registros , Reino Unido/epidemiologiaRESUMO
Most methods for handling incomplete data can be broadly classified as inverse probability weighting (IPW) strategies or imputation strategies. The former model the occurrence of incomplete data; the latter, the distribution of the missing variables given observed variables in each missingness pattern. Imputation strategies are typically more efficient, but they can involve extrapolation, which is difficult to diagnose and can lead to large bias. Double robust (DR) methods combine the two approaches. They are typically more efficient than IPW and more robust to model misspecification than imputation. We give a formal introduction to DR estimation of the mean of a partially observed variable, before moving to more general incomplete-data scenarios. We review strategies to improve the performance of DR estimators under model misspecification, reveal connections between DR estimators for incomplete data and 'design-consistent' estimators used in sample surveys, and explain the value of double robustness when using flexible data-adaptive methods for IPW or imputation.
RESUMO
We propose semi-parametric methods to model cohort data where repeated outcomes may be missing due to death and non-ignorable dropout. Our focus is to obtain inference about the cohort composed of those who are still alive at any time point (partly conditional inference). We propose: i) an inverse probability weighted method that upweights observed subjects to represent subjects who are still alive but are not observed; ii) an outcome regression method that replaces missing outcomes of subjects who are alive with their conditional mean outcomes given past observed data; and iii) an augmented inverse probability method that combines the previous two methods and is double robust against model misspecification. These methods are described for both monotone and non-monotone missing data patterns, and are applied to a cohort of elderly adults from the Health and Retirement Study. Sensitivity analysis to departures from the assumption that missingness at some visit t is independent of the outcome at visit t given past observed data and time of death is used in the data application.
Assuntos
Biometria/métodos , Simulação por Computador/estatística & dados numéricos , Análise de Regressão , Idoso , Idoso de 80 Anos ou mais , Viés , Estudos de Coortes , Morte , Humanos , Estudos Longitudinais , ProbabilidadeRESUMO
The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston (2009) and the "substantive model compatible" (MI-SMC) method of Bartlett et al. (2015). We also apply the "MI matched set" approach of Seaman and Keogh (2015) to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.
Assuntos
Biometria/métodos , Estudos de Casos e Controles , Estudos de Coortes , Simulação por Computador/estatística & dados numéricos , Interpretação Estatística de Dados , HumanosRESUMO
Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin's Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data.
Assuntos
Estudos de Casos e Controles , Interpretação Estatística de Dados , Biometria/métodos , Neoplasias Colorretais/etiologia , Simulação por Computador , Intervalos de Confiança , Fibras na Dieta/administração & dosagem , Doença/etiologia , Humanos , Modelos Estatísticos , Razão de Chances , Fatores de RiscoRESUMO
BACKGROUND: Informative birth size occurs when the average outcome depends on the number of infants per birth. Although analysis methods have been proposed for handling informative birth size, their performance is not well understood. Our aim was to evaluate the performance of these methods and to provide recommendations for their application in randomised trials including infants from single and multiple births. METHODS: Three generalised estimating equation (GEE) approaches were considered for estimating the effect of treatment on a continuous or binary outcome: cluster weighted GEEs, which produce treatment effects with a mother-level interpretation when birth size is informative; standard GEEs with an independence working correlation structure, which produce treatment effects with an infant-level interpretation when birth size is informative; and standard GEEs with an exchangeable working correlation structure, which do not account for informative birth size. The methods were compared through simulation and analysis of an example dataset. RESULTS: Treatment effect estimates were affected by informative birth size in the simulation study when the effect of treatment in singletons differed from that in multiples (i.e. in the presence of a treatment group by multiple birth interaction). The strength of evidence supporting the effectiveness of treatment varied between methods in the example dataset. CONCLUSIONS: Informative birth size is always a possibility in randomised trials including infants from both single and multiple births, and analysis methods should be pre-specified with this in mind. We recommend estimating treatment effects using standard GEEs with an independence working correlation structure to give an infant-level interpretation.
Assuntos
Retardo do Crescimento Fetal/epidemiologia , Recém-Nascido de Baixo Peso , Recém-Nascido Prematuro , Gravidez Múltipla/estatística & dados numéricos , Nascimento Prematuro/epidemiologia , Adulto , Feminino , Humanos , Recém-Nascido , Masculino , Vigilância da População , Gravidez , Ensaios Clínicos Controlados Aleatórios como Assunto , Padrões de ReferênciaRESUMO
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991-2003), Edinburgh (1999-2003), and Cambridge (1990-2006), as well as Scottish national pregnancy discharge data (2004-2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation.
Assuntos
Interpretação Estatística de Dados , Bases de Dados Factuais , Modelos Estatísticos , Síndrome de Down , Métodos Epidemiológicos , Humanos , Modelos Logísticos , Análise Multivariada , Curva ROCRESUMO
Clustered data commonly arise in epidemiology. We assume each cluster member has an outcome Y and covariates X. When there are missing data in Y, the distribution of Y given X in all cluster members ("complete clusters") may be different from the distribution just in members with observed Y ("observed clusters"). Often the former is of interest, but when data are missing because in a fundamental sense Y does not exist (e.g., quality of life for a person who has died), the latter may be more meaningful (quality of life conditional on being alive). Weighted and doubly weighted generalized estimating equations and shared random-effects models have been proposed for observed-cluster inference when cluster size is informative, that is, the distribution of Y given X in observed clusters depends on observed cluster size. We show these methods can be seen as actually giving inference for complete clusters and may not also give observed-cluster inference. This is true even if observed clusters are complete in themselves rather than being the observed part of larger complete clusters: here methods may describe imaginary complete clusters rather than the observed clusters. We show under which conditions shared random-effects models proposed for observed-cluster inference do actually describe members with observed Y. A psoriatic arthritis dataset is used to illustrate the danger of misinterpreting estimates from shared random-effects models.
Assuntos
Biometria/métodos , Análise por Conglomerados , Métodos Epidemiológicos , Artrite Psoriásica/epidemiologia , Feminino , Humanos , Masculino , Modelos EstatísticosRESUMO
We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable.