RESUMO
Measurement error is common in environmental epidemiologic studies, but methods for correcting measurement error in regression models with multiple environmental exposures as covariates have not been well investigated. We consider a multiple imputation approach, combining external or internal calibration samples that contain information on both true and error-prone exposures with the main study data of multiple exposures measured with error. We propose a constrained chained equations multiple imputation (CEMI) algorithm that places constraints on the imputation model parameters in the chained equations imputation based on the assumptions of strong nondifferential measurement error. We also extend the constrained CEMI method to accommodate nondetects in the error-prone exposures in the main study data. We estimate the variance of the regression coefficients using the bootstrap with two imputations of each bootstrapped sample. The constrained CEMI method is shown by simulations to outperform existing methods, namely the method that ignores measurement error, classical calibration, and regression prediction, yielding estimated regression coefficients with smaller bias and confidence intervals with coverage close to the nominal level. We apply the proposed method to the Neighborhood Asthma and Allergy Study to investigate the associations between the concentrations of multiple indoor allergens and the fractional exhaled nitric oxide level among asthmatic children in New York City. The constrained CEMI method can be implemented by imposing constraints on the imputation matrix using the mice and bootImpute packages in R.
Assuntos
Algoritmos , Exposição Ambiental , Criança , Humanos , Animais , Camundongos , Exposição Ambiental/efeitos adversos , Estudos Epidemiológicos , Calibragem , ViésRESUMO
Temporal proteomics data sets are often confounded by the challenges of missing values. These missing data points, in a time-series context, can lead to fluctuations in measurements or the omission of critical events, thus hindering the ability to fully comprehend the underlying biomedical processes. We introduce a Data Multiple Imputation (DMI) pipeline designed to address this challenge in temporal data set turnover rate quantifications, enabling robust downstream analysis to gain novel discoveries. To demonstrate its utility and generalizability, we applied this pipeline to two use cases: a murine cardiac temporal proteomics data set and a human plasma temporal proteomics data set, both aimed at examining protein turnover rates. This DMI pipeline significantly enhanced the detection of protein turnover rate in both data sets, and furthermore, the imputed data sets captured new representation of proteins, leading to an augmented view of biological pathways, protein complex dynamics, as well as biomarker-disease associations. Importantly, DMI exhibited superior performance in benchmark data sets compared to single imputation methods (DSI). In summary, we have demonstrated that this DMI pipeline is effective at overcoming challenges introduced by missing values in temporal proteome dynamics studies.
Assuntos
Proteoma , Proteômica , Humanos , Proteoma/análise , Proteoma/metabolismo , Proteômica/métodos , Animais , Camundongos , Estudos Longitudinais , Interpretação Estatística de DadosRESUMO
Multiple imputation (MI) is commonly implemented to mitigate potential selection bias due to missing data. The accompanying article by Nguyen and Stuart (Am J Epidemiol. 2024;193(10):1470-1476) examines the statistical consistency of several ways of integrating MI with propensity scores. As Nguyen and Stuart noted, variance estimation for these different approaches remains to be developed. One common option is the nonparametric bootstrap, which can provide valid inference when closed-form variance estimators are not available. However, there is no consensus on how to implement MI and nonparametric bootstrapping in analyses. To complement Nguyen and Stuart's article on MI and propensity score analyses, we review some currently available approaches on variance estimation with MI and nonparametric bootstrapping.
Assuntos
Pontuação de Propensão , Humanos , Interpretação Estatística de Dados , Viés de Seleção , Modelos Estatísticos , Estatísticas não ParamétricasRESUMO
In epidemiology and the social sciences, propensity score methods are popular for estimating treatment effects using observational data, and multiple imputation is popular for handling covariate missingness. However, how to appropriately use multiple imputation for propensity score analysis is not completely clear. This paper aims to bring clarity on the consistency (or lack thereof) of methods that have been proposed, focusing on the "within" approach (where the effect is estimated separately in each imputed dataset and then the multiple estimates are combined) and the "across" approach (where typically propensity scores are averaged across imputed datasets before being used for effect estimation). We show that the within method is valid and can be used with any causal effect estimator that is consistent in the full-data setting. Existing across methods are inconsistent, but a different across method that averages the inverse probability weights across imputed datasets is consistent for propensity score weighting. We also comment on methods that rely on imputing a function of the missing covariate rather than the covariate itself, including imputation of the propensity score and of the probability weight. Based on consistency results and practical flexibility, we recommend generally using the standard within method. Throughout, we provide intuition to make the results meaningful to the broad audience of applied researchers.
Assuntos
Pontuação de Propensão , Humanos , Interpretação Estatística de Dados , Modelos Estatísticos , CausalidadeRESUMO
Auxiliary variables are used in multiple imputation (MI) to reduce bias and increase efficiency. These variables may often themselves be incomplete. We explored how missing data in auxiliary variables influenced estimates obtained from MI. We implemented a simulation study with three different missing data mechanisms for the outcome. We then examined the impact of increasing proportions of missing data and different missingness mechanisms for the auxiliary variable on bias of an unadjusted linear regression coefficient and the fraction of missing information. We illustrate our findings with an applied example in the Avon Longitudinal Study of Parents and Children. We found that where complete records analyses were biased, increasing proportions of missing data in auxiliary variables, under any missing data mechanism, reduced the ability of MI including the auxiliary variable to mitigate this bias. Where there was no bias in the complete records analysis, inclusion of a missing not at random auxiliary variable in MI introduced bias of potentially important magnitude (up to 17% of the effect size in our simulation). Careful consideration of the quantity and nature of missing data in auxiliary variables needs to be made when selecting them for use in MI models.
RESUMO
Routinely collected testing data have been a vital resource for public health response during the COVID-19 pandemic and have revealed the extent to which Black and Hispanic persons have borne a disproportionate burden of SARS-CoV-2 infections and hospitalizations in the United States. However, missing race and ethnicity data and missed infections due to testing disparities limit the interpretation of testing data and obscure the true toll of the pandemic. We investigated potential bias arising from these 2 types of missing data through a case study carried out in Holyoke, Massachusetts, during the prevaccination phase of the pandemic. First, we estimated SARS-CoV-2 testing and case rates by race and ethnicity, imputing missing data using a joint modeling approach. We then investigated disparities in SARS-CoV-2 reported case rates and missed infections by comparing case rate estimates with estimates derived from a COVID-19 seroprevalence survey. Compared with the non-Hispanic White population, we found that the Hispanic population had similar testing rates (476 tested per 1000 vs 480 per 1000) but twice the case rate (8.1% vs 3.7%). We found evidence of inequitable testing, with a higher rate of missed infections in the Hispanic population than in the non-Hispanic White population (79 infections missed per 1000 vs 60 missed per 1000).
Assuntos
Teste para COVID-19 , COVID-19 , Hispânico ou Latino , SARS-CoV-2 , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Negro ou Afro-Americano/estatística & dados numéricos , COVID-19/etnologia , COVID-19/epidemiologia , COVID-19/diagnóstico , Teste para COVID-19/estatística & dados numéricos , Etnicidade/estatística & dados numéricos , Disparidades nos Níveis de Saúde , Disparidades em Assistência à Saúde/etnologia , Disparidades em Assistência à Saúde/estatística & dados numéricos , Hispânico ou Latino/estatística & dados numéricos , Massachusetts/epidemiologia , Diagnóstico Ausente/estatística & dados numéricos , BrancosRESUMO
Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.
Assuntos
Causalidade , Humanos , Funções Verossimilhança , Adolescente , Interpretação Estatística de Dados , Viés , Modelos Estatísticos , Simulação por ComputadorRESUMO
It is unclear how the risk of post-covid symptoms evolved during the pandemic, especially before the spread of Severe Acute Respiratory Syndrome Coronavirus 2 variants and the availability of vaccines. We used modified Poisson regressions to compare the risk of six-month post-covid symptoms and their associated risk factors according to the period of first acute covid: during the French first (March-May 2020) or second (September-November 2020) wave. Non-response weights and multiple imputation were used to handle missing data. Among participants aged 15 or more in a national population-based cohort, the risk of post-covid symptoms was 14.6% (95% CI: 13.9%, 15.3%) in March-May 2020, versus 7.0% (95% CI: 6.3%, 7.7%) in September-November 2020 (adjusted RR: 1.36, 95% CI: 1.20, 1.55). For both periods, the risk was higher in the presence of baseline physical condition(s), and it increased with the number of acute symptoms. During the first wave, the risk was also higher for women, in the presence of baseline mental condition(s), and it varied with educational level. In France in 2020, the risk of six-month post-covid symptoms was higher during the first than the second wave. This difference was observed before the spread of variants and the availability of vaccines.
RESUMO
Understanding associations between injury severity and postacute care recovery for patients with traumatic brain injury (TBI) is crucial to improving care. Estimating these associations requires information on patients' injury, demographics, and healthcare utilization, which are dispersed across multiple data sets. Because of privacy regulations, unique identifiers are not available to link records across these data sets. Record linkage methods identify records that represent the same patient across data sets in the absence of unique identifiers. With a large number of records, these methods may result in many false links. Health providers are a natural grouping scheme for patients, because only records that receive care from the same provider can represent the same patient. In some cases, providers are defined within each data set, but they are not uniquely identified across data sets. We propose a Bayesian record linkage procedure that simultaneously links providers and patients. The procedure improves the accuracy of the estimated links compared to current methods. We use this procedure to merge a trauma registry with Medicare claims to estimate the association between TBI patients' injury severity and postacute care recovery.
Assuntos
Lesões Encefálicas Traumáticas , Cuidados Semi-Intensivos , Idoso , Humanos , Estados Unidos , Medicare , Teorema de Bayes , Sistema de Registros , Lesões Encefálicas Traumáticas/terapiaRESUMO
Missing data is a common problem in medical research, and is commonly addressed using multiple imputation. Although traditional imputation methods allow for valid statistical inference when data are missing at random (MAR), their implementation is problematic when the presence of missingness depends on unobserved variables, that is, the data are missing not at random (MNAR). Unfortunately, this MNAR situation is rather common, in observational studies, registries and other sources of real-world data. While several imputation methods have been proposed for addressing individual studies when data are MNAR, their application and validity in large datasets with multilevel structure remains unclear. We therefore explored the consequence of MNAR data in hierarchical data in-depth, and proposed a novel multilevel imputation method for common missing patterns in clustered datasets. This method is based on the principles of Heckman selection models and adopts a two-stage meta-analysis approach to impute binary and continuous variables that may be outcomes or predictors and that are systematically or sporadically missing. After evaluating the proposed imputation model in simulated scenarios, we illustrate it use in a cross-sectional community survey to estimate the prevalence of malaria parasitemia in children aged 2-10 years in five regions in Uganda.
Assuntos
Pesquisa Biomédica , Criança , Humanos , Estudos Transversais , Uganda/epidemiologiaRESUMO
Policymakers often require information on programs' long-term impacts that is not available when decisions are made. For example, while rigorous evidence from the Oregon Health Insurance Experiment (OHIE) shows that having health insurance influences short-term health and financial measures, the impact on long-term outcomes, such as mortality, will not be known for many years following the program's implementation. We demonstrate how data fusion methods may be used address the problem of missing final outcomes and predict long-run impacts of interventions before the requisite data are available. We implement this method by concatenating data on an intervention (such as the OHIE) with auxiliary long-term data and then imputing missing long-term outcomes using short-term surrogate outcomes while approximating uncertainty with replication methods. We use simulations to examine the performance of the methodology and apply the method in a case study. Specifically, we fuse data on the OHIE with data from the National Longitudinal Mortality Study and estimate that being eligible to apply for subsidized health insurance will lead to a statistically significant improvement in long-term mortality.
Assuntos
Seguro Saúde , Humanos , Oregon , Seguro Saúde/estatística & dados numéricos , Simulação por Computador , Mortalidade , Estudos Longitudinais , Estados Unidos , Modelos EstatísticosRESUMO
In clinical studies, multi-state model (MSM) analysis is often used to describe the sequence of events that patients experience, enabling better understanding of disease progression. A complicating factor in many MSM studies is that the exact event times may not be known. Motivated by a real dataset of patients who received stem cell transplants, we considered the setting in which some event times were exactly observed and some were missing. In our setting, there was little information about the time intervals in which the missing event times occurred and missingness depended on the event type, given the analysis model covariates. These additional challenges limited the usefulness of some missing data methods (maximum likelihood, complete case analysis, and inverse probability weighting). We show that multiple imputation (MI) of event times can perform well in this setting. MI is a flexible method that can be used with any complete data analysis model. Through an extensive simulation study, we show that MI by predictive mean matching (PMM), in which sampling is from a set of observed times without reliance on a specific parametric distribution, has little bias when event times are missing at random, conditional on the observed data. Applying PMM separately for each sub-group of patients with a different pathway through the MSM tends to further reduce bias and improve precision. We recommend MI using PMM methods when performing MSM analysis with Markov models and partially observed event times.
Assuntos
Projetos de Pesquisa , Humanos , Interpretação Estatística de Dados , Simulação por Computador , Probabilidade , ViésRESUMO
Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 83 671 women living with HIV, whose data were originally extracted from electronic medical records, of whom 4732 had their charts reviewed, and a subsequent 1210 also had a telephone interview to validate key study variables.
Assuntos
Confiabilidade dos Dados , Registros Eletrônicos de Saúde , Feminino , Humanos , Infecções por HIVRESUMO
Biomarkers are often measured in bulk to diagnose patients, monitor patient conditions, and research novel drug pathways. The measurement of these biomarkers often suffers from detection limits that result in missing and untrustworthy measurements. Frequently, missing biomarkers are imputed so that down-stream analysis can be conducted with modern statistical methods that cannot normally handle data subject to informative censoring. This work develops an empirical Bayes g $$ g $$ -modeling method for imputing and denoising biomarker measurements. We establish superior estimation properties compared to popular methods in simulations and with real data, providing the useful biomarker measurement estimations for down-stream analysis.
Assuntos
Teorema de Bayes , Biomarcadores , Simulação por Computador , Humanos , Biomarcadores/análise , Modelos Estatísticos , Estatísticas não Paramétricas , Interpretação Estatística de DadosRESUMO
This research introduces a multivariate τ $$ \tau $$ -inflated beta regression ( τ $$ \tau $$ -IBR) modeling approach for the analysis of censored recurrent event data that is particularly useful when there is a mixture of (a) individuals who are generally less susceptible to recurrent events and (b) heterogeneity in duration of event-free periods amongst those who experience events. The modeling approach is applied to a restructured version of the recurrent event data that consists of censored longitudinal times-to-first-event in τ $$ \tau $$ length follow-up windows that potentially overlap. Multiple imputation (MI) and expectation-solution (ES) approaches appropriate for censored data are developed as part of the model fitting process. A suite of useful analysis outputs are provided from the τ $$ \tau $$ -IBR model that include parameter estimates to help interpret the (a) and (b) mixture of event times in the data, estimates of mean τ $$ \tau $$ -restricted event-free duration in a τ $$ \tau $$ -length follow-up window based on a patient's covariate profile, and heat maps of raw τ $$ \tau $$ -restricted event-free durations observed in the data with censored observations augmented via averages across MI datasets. Simulations indicate good statistical performance of the proposed τ $$ \tau $$ -IBR approach to modeling censored recurrent event data. An example is given based on the Azithromycin for Prevention of COPD Exacerbations Trial.
Assuntos
Azitromicina , Doença Pulmonar Obstrutiva Crônica , HumanosRESUMO
BACKGROUND: In computer-aided diagnosis (CAD) studies utilizing multireader multicase (MRMC) designs, missing data might occur when there are instances of misinterpretation or oversight by the reader or problems with measurement techniques. Improper handling of these missing data can lead to bias. However, little research has been conducted on addressing the missing data issue within the MRMC framework. METHODS: We introduced a novel approach that integrates multiple imputation with MRMC analysis (MI-MRMC). An elaborate simulation study was conducted to compare the efficacy of our proposed approach with that of the traditional complete case analysis strategy within the MRMC design. Furthermore, we applied these approaches to a real MRMC design CAD study on aneurysm detection via head and neck CT angiograms to further validate their practicality. RESULTS: Compared with traditional complete case analysis, the simulation study demonstrated the MI-MRMC approach provides an almost unbiased estimate of diagnostic capability, alongside satisfactory performance in terms of statistical power and the type I error rate within the MRMC framework, even in small sample scenarios. In the real CAD study, the proposed MI-MRMC method further demonstrated strong performance in terms of both point estimates and confidence intervals compared with traditional complete case analysis. CONCLUSION: Within MRMC design settings, the adoption of an MI-MRMC approach in the face of missing data can facilitate the attainment of unbiased and robust estimates of diagnostic capability.
Assuntos
Simulação por Computador , Humanos , Projetos de Pesquisa , Algoritmos , Interpretação Estatística de DadosRESUMO
BACKGROUND: Epidemiological and clinical studies often have missing data, frequently analysed using multiple imputation (MI). In general, MI estimates will be biased if data are missing not at random (MNAR). Bias due to data MNAR can be reduced by including other variables ("auxiliary variables") in imputation models, in addition to those required for the substantive analysis. Common advice is to take an inclusive approach to auxiliary variable selection (i.e. include all variables thought to be predictive of missingness and/or the missing values). There are no clear guidelines about the impact of this strategy when data may be MNAR. METHODS: We explore the impact of including an auxiliary variable predictive of missingness but, in truth, unrelated to the partially observed variable, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of the additional bias of the MI estimator for the exposure coefficient (fitting either a linear or logistic regression model), when the (continuous or binary) partially observed variable is either the analysis outcome or the exposure. Here, "additional bias" refers to the difference in magnitude of the MI estimator when the imputation model includes (i) the auxiliary variable and the other analysis model variables; (ii) just the other analysis model variables, noting that both will be biased due to data MNAR. We illustrate the extent of this additional bias by re-analysing data from a birth cohort study. RESULTS: The additional bias can be relatively large when the outcome is partially observed and missingness is caused by the outcome itself, and even larger if missingness is caused by both the outcome and the exposure (when either the outcome or exposure is partially observed). CONCLUSIONS: When using MI, the naïve and commonly used strategy of including all available auxiliary variables should be avoided. We recommend including the variables most predictive of the partially observed variable as auxiliary variables, where these can be identified through consideration of the plausible casual diagrams and missingness mechanisms, as well as data exploration (noting that associations with the partially observed variable in the complete records may be distorted due to selection bias).
Assuntos
Viés , Humanos , Interpretação Estatística de Dados , Modelos Estatísticos , Simulação por Computador , Algoritmos , Modelos Logísticos , Projetos de Pesquisa/estatística & dados numéricosRESUMO
BACKGROUND: Early identification of children at high risk of developing myopia is essential to prevent myopia progression by introducing timely interventions. However, missing data and measurement error (ME) are common challenges in risk prediction modelling that can introduce bias in myopia prediction. METHODS: We explore four imputation methods to address missing data and ME: single imputation (SI), multiple imputation under missing at random (MI-MAR), multiple imputation with calibration procedure (MI-ME), and multiple imputation under missing not at random (MI-MNAR). We compare four machine-learning models (Decision Tree, Naive Bayes, Random Forest, and Xgboost) and three statistical models (logistic regression, stepwise logistic regression, and least absolute shrinkage and selection operator logistic regression) in myopia risk prediction. We apply these models to the Shanghai Jinshan Myopia Cohort Study and also conduct a simulation study to investigate the impact of missing mechanisms, the degree of ME, and the importance of predictors on model performance. Model performance is evaluated using the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). RESULTS: Our findings indicate that in scenarios with missing data and ME, using MI-ME in combination with logistic regression yields the best prediction results. In scenarios without ME, employing MI-MAR to handle missing data outperforms SI regardless of the missing mechanisms. When ME has a greater impact on prediction than missing data, the relative advantage of MI-MAR diminishes, and MI-ME becomes more superior. Furthermore, our results demonstrate that statistical models exhibit better prediction performance than machine-learning models. CONCLUSION: MI-ME emerges as a reliable method for handling missing data and ME in important predictors for early-onset myopia risk prediction.
Assuntos
Aprendizado de Máquina , Miopia , Humanos , Miopia/diagnóstico , Miopia/epidemiologia , Feminino , Criança , Masculino , Modelos Logísticos , Modelos Estatísticos , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Fatores de Risco , Curva ROC , Teorema de Bayes , China/epidemiologia , Estudos de Coortes , Idade de InícioRESUMO
BACKGROUND: The ICH E9 (R1) addendum on Estimands and Sensitivity analysis in Clinical trials proposes a framework for the design and analysis of clinical trials aimed at improving clarity around the definition of the targeted treatment effect (the estimand) of a study. METHODS: We adopt the estimand framework in the context of a study using "trial emulation" to estimate the risk of pneumocystis pneumonia, an opportunistic disease contracted by people living with HIV and AIDS having a weakened immune system, when considering two antibiotic treatment regimes for stopping antibiotic prophylaxis treatment against this disease. A "while on treatment" strategy has been implemented for post-randomisation (intercurrent) events. We then perform a sensitivity analysis using reference based multiple imputation to model a scenario in which patients lost to follow-up stop taking prophylaxis. RESULTS: The primary analysis indicated a protective effect for the new regime which used viral suppression as prophylaxis stopping criteria (hazard ratio (HR) 0.78, 95% confidence interval [0.69, 0.89], p < 0.001). For the sensitivity analysis, when we apply the "jump to off prophylaxis" approach, the hazard ratio is almost the same compared to that from the primary analysis (HR 0.80 [0.69, 0.95], p = 0.009). The sensitivity analysis confirmed that the new regime exhibits a clear improvement over the existing guidelines for PcP prophylaxis when those lost to follow-up "jump to off prophylaxis". CONCLUSIONS: Our application using reference based multiple imputation demonstrates the method's flexibility and simplicity for sensitivity analyses in the context of the estimand framework for (emulated) trials.
Assuntos
Infecções por HIV , Pneumonia por Pneumocystis , Humanos , Infecções por HIV/tratamento farmacológico , Pneumonia por Pneumocystis/prevenção & controle , Projetos de Pesquisa/normas , Antibioticoprofilaxia/métodos , Interpretação Estatística de Dados , Modelos de Riscos Proporcionais , Infecções Oportunistas Relacionadas com a AIDS/prevenção & controle , Infecções Oportunistas Relacionadas com a AIDS/tratamento farmacológico , Ensaios Clínicos como Assunto/métodos , Antibacterianos/uso terapêuticoRESUMO
BACKGROUND: When studying the association between treatment and a clinical outcome, a parametric multivariable model of the conditional outcome expectation is often used to adjust for covariates. The treatment coefficient of the outcome model targets a conditional treatment effect. Model-based standardization is typically applied to average the model predictions over the target covariate distribution, and generate a covariate-adjusted estimate of the marginal treatment effect. METHODS: The standard approach to model-based standardization involves maximum-likelihood estimation and use of the non-parametric bootstrap. We introduce a novel, general-purpose, model-based standardization method based on multiple imputation that is easily applicable when the outcome model is a generalized linear model. We term our proposed approach multiple imputation marginalization (MIM). MIM consists of two main stages: the generation of synthetic datasets and their analysis. MIM accommodates a Bayesian statistical framework, which naturally allows for the principled propagation of uncertainty, integrates the analysis into a probabilistic framework, and allows for the incorporation of prior evidence. RESULTS: We conduct a simulation study to benchmark the finite-sample performance of MIM in conjunction with a parametric outcome model. The simulations provide proof-of-principle in scenarios with binary outcomes, continuous-valued covariates, a logistic outcome model and the marginal log odds ratio as the target effect measure. When parametric modeling assumptions hold, MIM yields unbiased estimation in the target covariate distribution, valid coverage rates, and similar precision and efficiency than the standard approach to model-based standardization. CONCLUSION: We demonstrate that multiple imputation can be used to marginalize over a target covariate distribution, providing appropriate inference with a correctly specified parametric outcome model and offering statistical performance comparable to that of the standard approach to model-based standardization.