Search | VHL Regional Portal

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.

De Silva, Anurika Priyanjali; Moreno-Betancur, Margarita; De Livera, Alysha Madhu; Lee, Katherine Jane; Simpson, Julie Anne.

BMC Med Res Methodol ; 19(1): 14, 2019 01 10.

Article in English | MEDLINE | ID: mdl-30630434

ABSTRACT

BACKGROUND: Longitudinal categorical variables are sometimes restricted in terms of how individuals transition between categories over time. For example, with a time-dependent measure of smoking categorised as never-smoker, ex-smoker, and current-smoker, current-smokers or ex-smokers cannot transition to a never-smoker at a subsequent wave. These longitudinal variables often contain missing values, however, there is little guidance on whether these restrictions need to be accommodated when using multiple imputation methods. Multiply imputing such missing values, ignoring the restrictions, could lead to implausible transitions. METHODS: We designed a simulation study based on the Longitudinal Study of Australian Children, where the target analysis was the association between (incomplete) maternal smoking and childhood obesity. We set varying proportions of data on maternal smoking to missing completely at random or missing at random. We compared the performance of fully conditional specification with multinomial and ordinal logistic imputation, and predictive mean matching, two-fold fully conditional specification, indicator based imputation under multivariate normal imputation with projected distance-based rounding, and continuous imputation under multivariate normal imputation with calibration, where each of these multiple imputation methods were applied, accounting for the restrictions using a semi-deterministic imputation procedure. RESULTS: Overall, we observed reduced bias when applying multiple imputation methods with restrictions, and fully conditional specification with predictive mean matching performed the best. Applying fully conditional specification and two-fold fully conditional specification for imputing nominal variables based on multinomial logistic regression had severe convergence issues. Both imputation methods under multivariate normal imputation produced biased estimates when restrictions were not accommodated, however, we observed substantial reductions in bias when restrictions were applied with continuous imputation under multivariate normal imputation with calibration. CONCLUSION: In a similar longitudinal setting we recommend the use of fully conditional specification with predictive mean matching, with restrictions applied during the imputation stage.

Subject(s)

Data Accuracy , Maternal Exposure/adverse effects , Models, Statistical , Pediatric Obesity/etiology , Smoking/adverse effects , Algorithms , Australia , Computer Simulation , Data Collection , Data Interpretation, Statistical , Female , Humans , Longitudinal Studies , Prospective Studies , Research Design

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

De Silva, Anurika Priyanjali; Moreno-Betancur, Margarita; De Livera, Alysha Madhu; Lee, Katherine Jane; Simpson, Julie Anne.

BMC Med Res Methodol ; 17(1): 114, 2017 Jul 25.

Article in English | MEDLINE | ID: mdl-28743256

ABSTRACT

BACKGROUND: Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another 'distinct' variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time - a commonly encountered scenario in epidemiological studies. METHODS: We simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems. RESULTS: The standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one. CONCLUSION: We recommend the use of FCS or MVNI in a similar longitudinal setting, and when encountering convergence issues due to a large number of time points or variables with missing values, the two-fold FCS with exploration of a suitable time window.

Subject(s)

Data Collection/statistics & numerical data , Pediatric Obesity/epidemiology , Research Design , Sleep Wake Disorders/epidemiology , Algorithms , Australia/epidemiology , Child , Comorbidity , Computer Simulation , Data Collection/methods , Female , Humans , Longitudinal Studies , Male , Models, Statistical

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL