RESUMO
Missing data are often overcome using imputation, which leverages the entire dataset to replace missing values with informed placeholders. This method can be modified for censored data by also incorporating partial information from censored values. One such modification proposed by Atem et al. (2017, 2019a, 2019b) is conditional mean imputation where censored covariates are replaced by their conditional means given other fully observed information. These methods are robust to additional parametric assumptions on the censored covariate and utilize all available data, which is appealing. However, in implementing these methods, we discovered that these three articles provide nonequivalent formulas and, in fact, none is the correct formula for the conditional mean. Herein, we derive the correct form of the conditional mean and discuss the bias incurred when using the incorrect formulas. Furthermore, we note that even the correct formula can perform poorly for log hazard ratios far from 0${\mathbf {0}}$ . We also provide user-friendly R software, the imputeCensoRd package, to enable future researchers to tackle censored covariates correctly.
Assuntos
Modelos Estatísticos , Viés , Simulação por Computador , Modelos de Riscos ProporcionaisRESUMO
The landscape of survival analysis is constantly being revolutionized to answer biomedical challenges, most recently the statistical challenge of censored covariates rather than outcomes. There are many promising strategies to tackle censored covariates, including weighting, imputation, maximum likelihood, and Bayesian methods. Still, this is a relatively fresh area of research, different from the areas of censored outcomes (i.e., survival analysis) or missing covariates. In this review, we discuss the unique statistical challenges encountered when handling censored covariates and provide an in-depth review of existing methods designed to address those challenges. We emphasize each method's relative strengths and weaknesses, providing recommendations to help investigators pinpoint the best approach to handling censored covariates in their data.