Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome.
Biometrics
; 79(4): 3676-3689, 2023 12.
Article
em En
| MEDLINE
| ID: mdl-37129942
Q-learning has been one of the most commonly used methods for optimizing dynamic treatment regimes (DTRs) in multistage decision-making. Right-censored survival outcome poses a significant challenge to Q-Learning due to its reliance on parametric models for counterfactual estimation which are subject to misspecification and sensitive to missing covariates. In this paper, we propose an imputation-based Q-learning (IQ-learning) where flexible nonparametric or semiparametric models are employed to estimate optimal treatment rules for each stage and then weighted hot-deck multiple imputation (MI) and direct-draw MI are used to predict optimal potential survival times. Missing data are handled using inverse probability weighting and MI, and the nonrandom treatment assignment among the observed is accounted for using a propensity-score approach. We investigate the performance of IQ-learning via extensive simulations and show that it is more robust to model misspecification than existing Q-Learning methods, imputes only plausible potential survival times contrary to parametric models and provides more flexibility in terms of baseline hazard shape. Using IQ-learning, we developed an optimal DTR for leukemia treatment based on a randomized trial with observational follow-up that motivated this study.
Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Simulação por Computador
Idioma:
En
Ano de publicação:
2023
Tipo de documento:
Article