RESUMO
Measurement error is common in environmental epidemiologic studies, but methods for correcting measurement error in regression models with multiple environmental exposures as covariates have not been well investigated. We consider a multiple imputation approach, combining external or internal calibration samples that contain information on both true and error-prone exposures with the main study data of multiple exposures measured with error. We propose a constrained chained equations multiple imputation (CEMI) algorithm that places constraints on the imputation model parameters in the chained equations imputation based on the assumptions of strong nondifferential measurement error. We also extend the constrained CEMI method to accommodate nondetects in the error-prone exposures in the main study data. We estimate the variance of the regression coefficients using the bootstrap with two imputations of each bootstrapped sample. The constrained CEMI method is shown by simulations to outperform existing methods, namely the method that ignores measurement error, classical calibration, and regression prediction, yielding estimated regression coefficients with smaller bias and confidence intervals with coverage close to the nominal level. We apply the proposed method to the Neighborhood Asthma and Allergy Study to investigate the associations between the concentrations of multiple indoor allergens and the fractional exhaled nitric oxide level among asthmatic children in New York City. The constrained CEMI method can be implemented by imposing constraints on the imputation matrix using the mice and bootImpute packages in R.
Assuntos
Algoritmos , Exposição Ambiental , Criança , Humanos , Animais , Camundongos , Exposição Ambiental/efeitos adversos , Estudos Epidemiológicos , Calibragem , ViésRESUMO
Methods for handling missing data in clinical psychology studies are reviewed. Missing data are defined, and a taxonomy of main approaches to analysis is presented, including complete-case and available-case analysis, weighting, maximum likelihood, Bayes, single and multiple imputation, and augmented inverse probability weighting. Missingness mechanisms, which play a key role in the performance of alternative methods, are defined. Approaches to robust inference, and to inference when the mechanism is potentially missing not at random, are discussed.
Assuntos
Psicologia Clínica , Humanos , Interpretação Estatística de Dados , Psicologia Clínica/métodos , Projetos de Pesquisa/normas , Teorema de BayesRESUMO
Randomized clinical trials with outcome measured longitudinally are frequently analyzed using either random effect models or generalized estimating equations. Both approaches assume that the dropout mechanism is missing at random (MAR) or missing completely at random (MCAR). We propose a Bayesian pattern-mixture model to incorporate missingness mechanisms that might be missing not at random (MNAR), where the distribution of the outcome measure at the follow-up time tk , conditional on the prior history, differs across the patterns of missing data. We then perform sensitivity analysis on estimates of the parameters of interest. The sensitivity parameters relate the distribution of the outcome of interest between subjects from a missing-data pattern at time tk with that of the observed subjects at time tk . The large number of the sensitivity parameters is reduced by treating them as random with a prior distribution having some pre-specified mean and variance, which are varied to explore the sensitivity of inferences. The missing at random (MAR) mechanism is a special case of the proposed model, allowing a sensitivity analysis of deviations from MAR. The proposed approach is applied to data from the Trial of Preventing Hypertension.
Assuntos
Modelos Estatísticos , Avaliação de Resultados em Cuidados de Saúde , Teorema de Bayes , Coleta de Dados , Humanos , Estudos Longitudinais , Pacientes Desistentes do Tratamento , Ensaios Clínicos Controlados Aleatórios como AssuntoRESUMO
We consider comparative effectiveness research (CER) from observational data with two or more treatments. In observational studies, the estimation of causal effects is prone to bias due to confounders related to both treatment and outcome. Methods based on propensity scores are routinely used to correct for such confounding biases. A large fraction of propensity score methods in the current literature consider the case of either two treatments or continuous outcome. There has been extensive literature with multiple treatment and binary outcome, but interest often lies in the intersection, for which the literature is still evolving. The contribution of this article is to focus on this intersection and compare across methods, some of which are fairly recent. We describe propensity-based methods when more than two treatments are being compared, and the outcome is binary. We assess the relative performance of these methods through a set of simulation studies. The methods are applied to assess the effect of four common therapies for castration-resistant advanced-stage prostate cancer. The data consist of medical and pharmacy claims from a large national private health insurance network, with the adverse outcome being admission to the emergency room within a short time window of treatment initiation.
Assuntos
Pesquisa Comparativa da Efetividade , Modelos Estatísticos , Viés , Causalidade , Simulação por Computador , Humanos , Masculino , Pontuação de PropensãoRESUMO
Accidents are a leading cause of deaths in U.S. active duty personnel. Understanding accident deaths during wartime could facilitate future operational planning and inform risk prevention efforts. This study expands prior research, identifying health risk factors associated with U.S. Army accident deaths during the Afghanistan and Iraq war. Military records for 2004-2009 enlisted, active duty, Regular Army soldiers were analyzed using logistic regression modeling to identify mental health, injury, and polypharmacy (multiple narcotic and/or psychotropic medications) predictors of accident deaths for current, previously, and never deployed groups. Deployed soldiers with anxiety diagnoses showed higher risk for accident deaths. Over half had anxiety diagnoses prior to being deployed, suggesting anticipatory anxiety or symptom recurrence may contribute to high risk. For previously deployed soldiers, traumatic brain injury (TBI) indicated higher risk. Two-thirds of these soldiers had first TBI medical-encounter while non-deployed, but mild, combat-related TBIs may have been undetected during deployments. Post-Traumatic Stress Disorder (PTSD) predicted higher risk for never deployed soldiers, as did polypharmacy which may relate to reasons for deployment ineligibility. Health risk predictors for Army accident deaths are identified and potential practice and policy implications discussed. Further research could test for replicability and expand models to include unobserved factors or modifiable mechanisms related to high risk. PTSD predicted high risk among those never deployed, suggesting importance of identification, treatment, and prevention of non-combat traumatic events. Finally, risk predictors overlapped with those identified for suicides, suggesting effective intervention might reduce both types of deaths.
Assuntos
Acidentes de Trabalho/mortalidade , Transtornos Mentais/diagnóstico , Militares/estatística & dados numéricos , Polimedicação , Ferimentos e Lesões , Acidentes de Trabalho/prevenção & controle , Adulto , Feminino , Humanos , Masculino , Medição de Risco , Fatores de Risco , Estados Unidos/epidemiologiaRESUMO
A case study is presented assessing the impact of missing data on the analysis of daily diary data from a study evaluating the effect of a drug for the treatment of insomnia. The primary analysis averaged daily diary values for each patient into a weekly variable. Following the commonly used approach, missing daily values within a week were ignored provided there was a minimum number of diary reports (i.e., at least 4). A longitudinal model was then fit with treatment, time, and patient-specific effects. A treatment effect at a pre-specified landmark time was obtained from the model. Weekly values following dropout were regarded as missing, but intermittent daily missing values were obscured. Graphical summaries and tables are presented to characterize the complex missing data patterns. We use multiple imputation for daily diary data to create completed data sets so that exactly 7 daily diary values contribute to each weekly patient average. Standard analysis methods are then applied for landmark analysis of the completed data sets, and the resulting estimates are combined using the standard multiple imputation approach. The observed data are subject to digit heaping and patterned responses (e.g., identical values for several consecutive days), which makes accurate modeling of the response data difficult. Sensitivity analyses under different modeling assumptions for the data were performed, along with pattern mixture models assessing the sensitivity to the missing at random assumption. The emphasis is on graphical displays and computational methods that can be implemented with general-purpose software. Copyright © 2016 John Wiley & Sons, Ltd.
Assuntos
Ensaios Clínicos como Assunto , Confiabilidade dos Dados , Interpretação Estatística de Dados , Autorrelato , Humanos , Distúrbios do Início e da Manutenção do Sono/terapia , SoftwareRESUMO
BACKGROUND: The potential impact of missing data on the results of clinical trials has received heightened attention recently. A National Research Council study provides recommendations for limiting missing data in clinical trial design and conduct, and principles for analysis, including the need for sensitivity analyses to assess robustness of findings to alternative assumptions about the missing data. A Food and Drug Administration advisory committee raised missing data as a serious concern in their review of results from the ATLAS ACS 2 TIMI 51 study, a large clinical trial that assessed rivaroxaban for its ability to reduce the risk of cardiovascular death, myocardial infarction or stroke in patients with acute coronary syndrome. This case study describes a variety of measures that were taken to address concerns about the missing data. METHODS: A range of analyses are described to assess the potential impact of missing data on conclusions. In particular, measures of the amount of missing data are discussed, and the fraction of missing information from multiple imputation is proposed as an alternative measure. The sensitivity analysis in the National Research Council study is modified in the context of survival analysis where some individuals are lost to follow-up. The impact of deviations from ignorable censoring is assessed by differentially increasing the hazard of the primary outcome in the treatment groups and multiply imputing events between dropout and the end of the study. Tipping-point analyses are described, where the deviation from ignorable censoring that results in a reversal of significance of the treatment effect is determined. A study to determine the vital status of participants lost to follow-up was also conducted, and the results of including this additional information are assessed. RESULTS: Sensitivity analyses suggest that findings of the ATLAS ACS 2 TIMI 51 study are robust to missing data; this robustness is reinforced by the follow-up study, since inclusion of data from this study had little impact on the study conclusions. CONCLUSION: Missing data are a serious problem in clinical trials. The methods presented here, namely, the sensitivity analyses, the follow-up study to determine survival of missing cases, and the proposed measurement of missing data via the fraction of missing information, have potential application in other studies involving survival analysis where missing data are a concern.
Assuntos
Síndrome Coronariana Aguda/tratamento farmacológico , Inibidores do Fator Xa/uso terapêutico , Perda de Seguimento , Pacientes Desistentes do Tratamento , Rivaroxabana/uso terapêutico , Doenças Cardiovasculares/mortalidade , Método Duplo-Cego , Humanos , Estudos Multicêntricos como Assunto , Infarto do Miocárdio/epidemiologia , Ensaios Clínicos Controlados Aleatórios como Assunto , Acidente Vascular Cerebral/epidemiologia , Análise de SobrevidaAssuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Peptídeos Semelhantes ao Glucagon/administração & dosagem , Humanos , Hipoglicemiantes/administração & dosagem , Ensaios Clínicos Controlados Aleatórios como Assunto , Fosfato de Sitagliptina/administração & dosagemRESUMO
BACKGROUND: Missing data are an unavoidable problem in clinical trials. Most existing missing data approaches assume the missing data are missing at random. However, the missing at random assumption is often questionable when the real causes of missing data are not well known and cannot be tested from observed data. METHODS: We propose a specific missing not at random assumption, which we call masked missing not at random, which may be more plausible than missing at random for masked clinical trials. We formulate models for categorical and continuous outcomes under this assumption. Simulations are conducted to examine the finite sample performance of our methods and compare them with other methods. R code for the proposed methods is provided in supplementary materials. RESULTS: Simulation studies confirm that maximum likelihood methods assuming masked missing not at random outperform complete case analysis and maximum likelihood assuming missing at random when masked missing not at random is true. For the particular missing at random model where both of missing at random and masked missing not at random are satisfied, theory suggests that maximum likelihood assuming missing at random is at least as efficient as maximum likelihood assuming masked missing not at random. However, maximum likelihood assuming masked missing not at random is nearly as efficient as maximum likelihood assuming missing at random in our simulated settings. We also applied our methods to the TRial Of Preventing HYpertension study. The missing at random estimated treatment effect and its 95% confidence interval are robust to deviations from missing at random of the form implied by masked missing not at random. CONCLUSION: Methods based on the masked missing not at random assumption are useful for masked clinical trials, either in their own right or to provide a form of sensitivity analysis for deviations from missing at random. Missing at random analysis might be favored on grounds of efficiency if the estimates based on masked missing not at random and missing at random are similar, but if the estimates are substantially different, the masked missing not at random estimates might be preferred because the mechanism is more plausible.
Assuntos
Interpretação Estatística de Dados , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Projetos de Pesquisa , Simulação por Computador , Método Duplo-Cego , Humanos , Funções Verossimilhança , Modelos Estatísticos , Probabilidade , Método Simples-CegoRESUMO
Missing values in predictors are a common problem in survival analysis. In this paper, we review estimation methods for accelerated failure time models with missing predictors, and apply a new method called subsample ignorable likelihood (IL) Little and Zhang (J R Stat Soc 60:591-605, 2011) to this class of models. The approach applies a likelihood-based method to a subsample of observations that are complete on a subset of the covariates, chosen based on assumptions about the missing data mechanism. We give conditions on the missing data mechanism under which the subsample IL method is consistent, while both complete-case analysis and ignorable maximum likelihood are inconsistent. We illustrate the properties of the proposed method by simulation and apply the method to a real dataset.
Assuntos
Funções Verossimilhança , Análise de Sobrevida , Bioestatística , Simulação por Computador , Humanos , Modelos Estatísticos , Mortalidade , Fatores SocioeconômicosAssuntos
Ensaios Clínicos como Assunto/estatística & dados numéricos , Interpretação Estatística de Dados , Guias como Assunto , Congressos como Assunto , Indústria Farmacêutica , Humanos , Modelos Estatísticos , Modelos de Riscos Proporcionais , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Projetos de Pesquisa , Estatística como AssuntoRESUMO
Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worth and sociodemographic factors, and facilitates completed data analyses with general purposes. MI incorporates highly predictive covariates into imputation models and increases efficiency. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large.
RESUMO
BACKGROUND: Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. METHODS: We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. RESULTS: The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. CONCLUSIONS: Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.
Assuntos
Interpretação Estatística de Dados , Projetos de Pesquisa , Análise de Variância , Densidade Óssea , Osso e Ossos/metabolismo , Intervalos de Confiança , Feminino , Humanos , Modelos Lineares , Michigan/epidemiologia , Pessoa de Meia-Idade , Análise Multivariada , Análise de Regressão , Projetos de Pesquisa/estatística & dados numéricos , Globulina de Ligação a Hormônio Sexual/análiseRESUMO
We consider the linear regression of outcome Y on regressors W and Z with some values of W missing, when our main interest is the effect of Z on Y, controlling for W. Three common approaches to regression with missing covariates are (i) complete-case analysis (CC), which discards the incomplete cases, and (ii) ignorable likelihood methods, which base inference on the likelihood based on the observed data, assuming the missing data are missing at random (Rubin, 1976b), and (iii) nonignorable modeling, which posits a joint distribution of the variables and missing data indicators. Another simple practical approach that has not received much theoretical attention is to drop the regressor variables containing missing values from the regression modeling (DV, for drop variables). DV does not lead to bias when either (i) the regression coefficient of W is zero or (ii) W and Z are uncorrelated. We propose a pseudo-Bayesian approach for regression with missing covariates that compromises between the CC and DV estimates, exploiting information in the incomplete cases when the data support DV assumptions. We illustrate favorable properties of the method by simulation, and apply the proposed method to a liver cancer study. Extension of the method to more than one missing covariate is also discussed.
Assuntos
Modelos Estatísticos , Análise de Regressão , Teorema de Bayes , Viés , Biometria , Ensaios Clínicos como Assunto/estatística & dados numéricos , Humanos , Funções Verossimilhança , Modelos Lineares , Neoplasias Hepáticas/diagnóstico , Análise MultivariadaRESUMO
In this paper, the authors describe a simple method for making longitudinal comparisons of alternative markers of a subsequent event. The method is based on the aggregate prediction gain from knowing whether or not a marker has occurred at any particular age. An attractive feature of the method is the exact decomposition of the measure into 2 components: 1) discriminatory ability, which is the difference in the mean time to the subsequent event for individuals for whom the marker has and has not occurred, and 2) prevalence factor, which is related to the proportion of individuals who are positive for the marker at a particular age. Development of the method was motivated by a study that evaluated proposed markers of the menopausal transition, where the markers are measures based on successive menstrual cycles and the subsequent event is the final menstrual period. Here, results from application of the method to 4 alternative proposed markers of the menopausal transition are compared with previous findings.
Assuntos
Menopausa , História Reprodutiva , Adulto , Fatores Etários , Biomarcadores , Feminino , Humanos , Estudos Longitudinais , Ciclo Menstrual , Pessoa de Meia-Idade , Valor Preditivo dos Testes , PrevalênciaRESUMO
In longitudinal studies of developmental and disease processes, participants are followed prospectively with intermediate milestones identified as they occur. Frequently, studies enroll participants over a range of ages including ages at which some participants' milestones have already passed. Ages at milestones that occur prior to study entry are left censored if individuals are enrolled in the study or left truncated if they are not. The authors examined the bias incurred by ignoring these issues when estimating the distribution of age at milestones or the time between 2 milestones. Methods that account for left truncation and censoring are considered. Data on the menopausal transition are used to illustrate the problem. Simulations show that bias can be substantial and that standard errors can be severely underestimated in naïve analyses that ignore left truncation. Bias can be reduced when analyses account for left truncation, although the results are unstable when the fraction truncated is high. Simulations suggest that a better solution, when possible, is to modify the study design so that information on current status (i.e., whether or not a milestone has passed) is collected on all potential participants, analyzing those who are past the milestone at the time of recruitment as left censored rather than excluding such individuals from the analysis.
Assuntos
Viés , Pesquisa Biomédica , Estudos Longitudinais , Projetos de Pesquisa , Progressão da Doença , Feminino , Desenvolvimento Humano , Humanos , Menopausa , Pessoa de Meia-IdadeRESUMO
In clinical trials, a biomarker (S ) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z ) on T. A useful biomarker can be measured earlier than T and cost less than T. In this article, we consider the use of S as an auxiliary variable and examine the information recovery from using S for estimating the treatment effect on T, when S is completely observed and T is partially observed. In an ideal but often unrealistic setting, when S satisfies Prentice's definition for perfect surrogacy, there is the potential for substantial gain in precision by using data from S to estimate the treatment effect on T. When S is not close to a perfect surrogate, it can provide substantial information only under particular circumstances. We propose to use a targeted shrinkage regression approach that data-adaptively takes advantage of the potential efficiency gain yet avoids the need to make a strong surrogacy assumption. Simulations show that this approach strikes a balance between bias and efficiency gain. Compared with competing methods, it has better mean squared error properties and can achieve substantial efficiency gain, particularly in a common practical setting when S captures much but not all of the treatment effect and the sample size is relatively small. We apply the proposed method to a glaucoma data example.
Assuntos
Interpretação Estatística de Dados , Determinação de Ponto Final/métodos , Glaucoma/epidemiologia , Glaucoma/terapia , Avaliação de Resultados em Cuidados de Saúde/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Biomarcadores/análise , Glaucoma/diagnóstico , Humanos , Prevalência , Prognóstico , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Resultado do TratamentoRESUMO
We consider the estimation of the regression of an outcome Y on a covariate X, where X is unobserved, but a variable W that measures X with error is observed. A calibration sample that measures pairs of values of X and W is also available; we consider calibration samples where Y is measured (internal calibration) and not measured (external calibration). One common approach for measurement error correction is Regression Calibration (RC), which substitutes the unknown values of X by predictions from the regression of X on W estimated from the calibration sample. An alternative approach is to multiply impute the missing values of X given Y and W based on an imputation model, and then use multiple imputation (MI) combining rules for inferences. Most of current work assumes that the measurement error of W has a constant variance, whereas in many situations, the variance varies as a function of X. We consider extensions of the RC and MI methods that allow for heteroscedastic measurement error, and compare them by simulation. The MI method is shown to provide better inferences in this setting. We also illustrate the proposed methods using a data set from the BioCycle study.
Assuntos
Teorema de Bayes , Interpretação Estatística de Dados , Modelos Estatísticos , Análise de Regressão , Simulação por Computador , Feminino , Humanos , Ciclo Menstrual/fisiologia , Estresse Oxidativo/fisiologia , Progesterona/sangue , beta Caroteno/sangueRESUMO
A non-probability sampling mechanism arising from non-response or non-selection is likely to bias estimates of parameters with respect to a target population of interest. This bias poses a unique challenge when selection is 'non-ignorable', i.e. dependent upon the unobserved outcome of interest, since it is then undetectable and thus cannot be ameliorated. We extend a simulation study by Nishimura et al. [International Statistical Review, 84, 43-62 (2016)], adding two recently published statistics: the so-called 'standardized measure of unadjusted bias (SMUB)' and 'standardized measure of adjusted bias (SMAB)', which explicitly quantify the extent of bias (in the case of SMUB) or non-ignorable bias (in the case of SMAB) under the assumption that a specified amount of non-ignorable selection exists. Our findings suggest that this new sensitivity diagnostic is more correlated with, and more predictive of, the true, unknown extent of selection bias than other diagnostics, even when the underlying assumed level of non-ignorability is incorrect.
RESUMO
Selection bias is a serious potential problem for inference about relationships of scientific interest based on samples without well-defined probability sampling mechanisms. Motivated by the potential for selection bias in: (a) estimated relationships of polygenic scores (PGSs) with phenotypes in genetic studies of volunteers and (b) estimated differences in subgroup means in surveys of smartphone users, we derive novel measures of selection bias for estimates of the coefficients in linear and probit regression models fitted to nonprobability samples, when aggregate-level auxiliary data are available for the selected sample and the target population. The measures arise from normal pattern-mixture models that allow analysts to examine the sensitivity of their inferences to assumptions about nonignorable selection in these samples. We examine the effectiveness of the proposed measures in a simulation study and then use them to quantify the selection bias in: (a) estimated PGS-phenotype relationships in a large study of volunteers recruited via Facebook and (b) estimated subgroup differences in mean past-year employment duration in a nonprobability sample of low-educated smartphone users. We evaluate the performance of the measures in these applications using benchmark estimates from large probability samples.