Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data.
Stat Med
; 41(23): 4716-4743, 2022 10 15.
Article
em En
| MEDLINE
| ID: mdl-35908775
Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focusing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this article, we investigate two alternative solutions: test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: as one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.
Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Projetos de Pesquisa
/
Algoritmos
Tipo de estudo:
Etiology_studies
/
Incidence_studies
/
Observational_studies
/
Risk_factors_studies
Limite:
Child
/
Humans
Idioma:
En
Ano de publicação:
2022
Tipo de documento:
Article