Pesquisa | BVS IEC

1.

Multiple imputation of more than one environmental exposure with nondifferential measurement error.

Yu, Yuanzhi; Little, Roderick J; Perzanowski, Matthew; Chen, Qixuan.

Biostatistics ; 25(2): 306-322, 2024 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-37230469

RESUMO

Measurement error is common in environmental epidemiologic studies, but methods for correcting measurement error in regression models with multiple environmental exposures as covariates have not been well investigated. We consider a multiple imputation approach, combining external or internal calibration samples that contain information on both true and error-prone exposures with the main study data of multiple exposures measured with error. We propose a constrained chained equations multiple imputation (CEMI) algorithm that places constraints on the imputation model parameters in the chained equations imputation based on the assumptions of strong nondifferential measurement error. We also extend the constrained CEMI method to accommodate nondetects in the error-prone exposures in the main study data. We estimate the variance of the regression coefficients using the bootstrap with two imputations of each bootstrapped sample. The constrained CEMI method is shown by simulations to outperform existing methods, namely the method that ignores measurement error, classical calibration, and regression prediction, yielding estimated regression coefficients with smaller bias and confidence intervals with coverage close to the nominal level. We apply the proposed method to the Neighborhood Asthma and Allergy Study to investigate the associations between the concentrations of multiple indoor allergens and the fractional exhaled nitric oxide level among asthmatic children in New York City. The constrained CEMI method can be implemented by imposing constraints on the imputation matrix using the mice and bootImpute packages in R.

Assuntos

Algoritmos , Exposição Ambiental , Criança , Humanos , Animais , Camundongos , Exposição Ambiental/efeitos adversos , Estudos Epidemiológicos , Calibragem , Viés

2.

Missing Values in Longitudinal Proteome Dynamics Studies: Making a Case for Data Multiple Imputation.

Yan, Yu; Sankar, Baradwaj Simha; Mirza, Bilal; Ng, Dominic C M; Pelletier, Alexander R; Huang, Sarah D; Wang, Wei; Watson, Karol; Wang, Ding; Ping, Peipei.

J Proteome Res ; 23(9): 4151-4162, 2024 Sep 06.

Artigo em Inglês | MEDLINE | ID: mdl-39189460

RESUMO

Temporal proteomics data sets are often confounded by the challenges of missing values. These missing data points, in a time-series context, can lead to fluctuations in measurements or the omission of critical events, thus hindering the ability to fully comprehend the underlying biomedical processes. We introduce a Data Multiple Imputation (DMI) pipeline designed to address this challenge in temporal data set turnover rate quantifications, enabling robust downstream analysis to gain novel discoveries. To demonstrate its utility and generalizability, we applied this pipeline to two use cases: a murine cardiac temporal proteomics data set and a human plasma temporal proteomics data set, both aimed at examining protein turnover rates. This DMI pipeline significantly enhanced the detection of protein turnover rate in both data sets, and furthermore, the imputed data sets captured new representation of proteins, leading to an augmented view of biological pathways, protein complex dynamics, as well as biomarker-disease associations. Importantly, DMI exhibited superior performance in benchmark data sets compared to single imputation methods (DSI). In summary, we have demonstrated that this DMI pipeline is effective at overcoming challenges introduced by missing values in temporal proteome dynamics studies.

Assuntos

Proteoma , Proteômica , Humanos , Proteoma/análise , Proteoma/metabolismo , Proteômica/métodos , Animais , Camundongos , Estudos Longitudinais , Interpretação Estatística de Dados

3.

Invited Commentary: Mixing multiple imputation and bootstrapping for variance estimation.

Li, Catherine X; Zivich, Paul N.

Am J Epidemiol ; 2024 May 16.

Artigo em Inglês | MEDLINE | ID: mdl-38751323

RESUMO

In 2023, Martinez et al. examined trends in the inclusion, conceptualization, operationalization and analysis of race and ethnicity among studies published in US epidemiology journals. Based on a random sample of papers (N=1,050) published from 1995-2018, the authors describe the treatment of race, ethnicity, and ethnorace in the analytic sample (N=414, 39% of baseline sample) over time. Between 32% and 19% of studies in each time stratum lacked race data; 61% to 34% lacked ethnicity data. The review supplies stark evidence of the routine omission and variability of measures of race and ethnicity in epidemiologic research. Informed by public health critical race praxis (PHCRP), this commentary discusses the implications of four problems the findings suggest pervade epidemiology: 1) a general lack of clarity about what race and ethnicity are; 2) the limited use of critical race or other theory; 3) an ironic lack of rigor in measuring race and ethnicity; and, 4) the ordinariness of racism and white supremacy in epidemiology. The identified practices reflect neither current publication guidelines nor the state of the knowledge on race, ethnicity and racism; therefore, we conclude by offering recommendations to move epidemiology toward more rigorous research in an increasingly diverse society.

4.

Multiple imputation for propensity score analysis with covariates missing at random: some clarity on within and across methods.

Nguyen, Trang Quynh; Stuart, Elizabeth A.

Am J Epidemiol ; 2024 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-38863120

RESUMO

In epidemiology and social sciences, propensity score methods are popular for estimating treatment effects using observational data, and multiple imputation is popular for handling covariate missingness. However, how to appropriately use multiple imputation for propensity score analysis is not completely clear. This paper aims to bring clarity on the consistency (or lack thereof) of methods that have been proposed, focusing on the within approach (where the effect is estimated separately in each imputed dataset and then the multiple estimates are combined) and the across approach (where typically propensity scores are averaged across imputed datasets before being used for effect estimation). We show that the within method is valid and can be used with any causal effect estimator that is consistent in the full-data setting. Existing across methods are inconsistent, but a different across method that averages the inverse probability weights across imputed datasets is consistent for propensity score weighting. We also comment on methods that rely on imputing a function of the missing covariate rather than the covariate itself, including imputation of the propensity score and of the probability weight. Based on consistency results and practical flexibility, we recommend generally using the standard within method. Throughout, we provide intuition to make the results meaningful to the broad audience of applied researchers.

5.

Missing data and missed infections: investigating racial and ethnic disparities in SARS-CoV-2 testing and infection rates in Holyoke, Massachusetts.

Sauer, Sara M; Fulcher, Isabel R; Matias, Wilfredo R; Paxton, Ryan; Elnaiem, Ahmed; Gonsalves, Sean; Zhu, Jack; Guillaume, Yodeline; Franke, Molly; Ivers, Louise C.

Am J Epidemiol ; 193(6): 908-916, 2024 06 03.

Artigo em Inglês | MEDLINE | ID: mdl-38422371

RESUMO

Routinely collected testing data have been a vital resource for public health response during the COVID-19 pandemic and have revealed the extent to which Black and Hispanic persons have borne a disproportionate burden of SARS-CoV-2 infections and hospitalizations in the United States. However, missing race and ethnicity data and missed infections due to testing disparities limit the interpretation of testing data and obscure the true toll of the pandemic. We investigated potential bias arising from these 2 types of missing data through a case study carried out in Holyoke, Massachusetts, during the prevaccination phase of the pandemic. First, we estimated SARS-CoV-2 testing and case rates by race and ethnicity, imputing missing data using a joint modeling approach. We then investigated disparities in SARS-CoV-2 reported case rates and missed infections by comparing case rate estimates with estimates derived from a COVID-19 seroprevalence survey. Compared with the non-Hispanic White population, we found that the Hispanic population had similar testing rates (476 tested per 1000 vs 480 per 1000) but twice the case rate (8.1% vs 3.7%). We found evidence of inequitable testing, with a higher rate of missed infections in the Hispanic population than in the non-Hispanic White population (79 infections missed per 1000 vs 60 missed per 1000).

Assuntos

Teste para COVID-19 , COVID-19 , Hispânico ou Latino , SARS-CoV-2 , Humanos , COVID-19/etnologia , COVID-19/epidemiologia , COVID-19/diagnóstico , Massachusetts/epidemiologia , Teste para COVID-19/estatística & dados numéricos , Hispânico ou Latino/estatística & dados numéricos , Masculino , Feminino , Pessoa de Meia-Idade , Disparidades em Assistência à Saúde/etnologia , Disparidades em Assistência à Saúde/estatística & dados numéricos , Adulto , Disparidades nos Níveis de Saúde , Negro ou Afro-Americano/estatística & dados numéricos , Etnicidade/estatística & dados numéricos , Idoso , Diagnóstico Ausente/estatística & dados numéricos

6.

Analyses Using Multiple Imputation Need to Consider Missing Data in Auxiliary Variables.

Madley-Dowd, Paul; Curnow, Elinor; Hughes, Rachael A; Cornish, Rosie; Tilling, Kate; Heron, Jon.

Am J Epidemiol ; 2024 Aug 27.

Artigo em Inglês | MEDLINE | ID: mdl-39191658

RESUMO

Auxiliary variables are used in multiple imputation (MI) to reduce bias and increase efficiency. These variables may often themselves be incomplete. We explored how missing data in auxiliary variables influenced estimates obtained from MI. We implemented a simulation study with three different missing data mechanisms for the outcome. We then examined the impact of increasing proportions of missing data and different missingness mechanisms for the auxiliary variable on bias of an unadjusted linear regression coefficient and the fraction of missing information. We illustrate our findings with an applied example in the Avon Longitudinal Study of Parents and Children. We found that where complete records analyses were biased, increasing proportions of missing data in auxiliary variables, under any missing data mechanism, reduced the ability of MI including the auxiliary variable to mitigate this bias. Where there was no bias in the complete records analysis, inclusion of a missing not at random auxiliary variable in MI introduced bias of potentially important magnitude (up to 17% of the effect size in our simulation). Careful consideration of the quantity and nature of missing data in auxiliary variables needs to be made when selecting them for use in MI models.

7.

Handling missing data when estimating causal effects with targeted maximum likelihood estimation.

Dashti, S Ghazaleh; Lee, Katherine J; Simpson, Julie A; White, Ian R; Carlin, John B; Moreno-Betancur, Margarita.

Am J Epidemiol ; 193(7): 1019-1030, 2024 07 08.

Artigo em Inglês | MEDLINE | ID: mdl-38400653

RESUMO

Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.

Assuntos

Causalidade , Humanos , Funções Verossimilhança , Adolescente , Interpretação Estatística de Dados , Viés , Modelos Estatísticos , Simulação por Computador

8.

Temporal changes in the risk of six-month post-covid symptoms: a national population-based cohort study.

Pastorello, Anne; Meyer, Laurence; Coste, Joël; Davisse-Paturet, Camille; de Lamballerie, Xavier; Melchior, Maria; Novelli, Sophie; Rahib, Delphine; Bajos, Nathalie; Vuillermoz, Cécile; Franck, Jeanna-Eve; Manto, Carmelite; Rouquette, Alexandra; Warszawski, Josiane; EpiCov Study Group, For The.

Am J Epidemiol ; 2024 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-38960664

RESUMO

It is unclear how the risk of post-covid symptoms evolved during the pandemic, especially before the spread of Severe Acute Respiratory Syndrome Coronavirus 2 variants and the availability of vaccines. We used modified Poisson regressions to compare the risk of six-month post-covid symptoms and their associated risk factors according to the period of first acute covid: during the French first (March-May 2020) or second (September-November 2020) wave. Non-response weights and multiple imputation were used to handle missing data. Among participants aged 15 or more in a national population-based cohort, the risk of post-covid symptoms was 14.6% (95% CI: 13.9%, 15.3%) in March-May 2020, versus 7.0% (95% CI: 6.3%, 7.7%) in September-November 2020 (adjusted RR: 1.36, 95% CI: 1.20, 1.55). For both periods, the risk was higher in the presence of baseline physical condition(s), and it increased with the number of acute symptoms. During the first wave, the risk was also higher for women, in the presence of baseline mental condition(s), and it varied with educational level. In France in 2020, the risk of six-month post-covid symptoms was higher during the first than the second wave. This difference was observed before the spread of variants and the availability of vaccines.

9.

A Bayesian MultiLayer Record Linkage Procedure to Analyze Post-Acute Care Recovery of Patients with Traumatic Brain Injury.

Shan, Mingyang; Thomas, Kali S; Gutman, Roee.

Biostatistics ; 24(3): 743-759, 2023 Jul 14.

Artigo em Inglês | MEDLINE | ID: mdl-35579386

RESUMO

Understanding associations between injury severity and postacute care recovery for patients with traumatic brain injury (TBI) is crucial to improving care. Estimating these associations requires information on patients' injury, demographics, and healthcare utilization, which are dispersed across multiple data sets. Because of privacy regulations, unique identifiers are not available to link records across these data sets. Record linkage methods identify records that represent the same patient across data sets in the absence of unique identifiers. With a large number of records, these methods may result in many false links. Health providers are a natural grouping scheme for patients, because only records that receive care from the same provider can represent the same patient. In some cases, providers are defined within each data set, but they are not uniquely identified across data sets. We propose a Bayesian record linkage procedure that simultaneously links providers and patients. The procedure improves the accuracy of the estimated links compared to current methods. We use this procedure to merge a trauma registry with Medicare claims to estimate the association between TBI patients' injury severity and postacute care recovery.

Assuntos

Lesões Encefálicas Traumáticas , Cuidados Semi-Intensivos , Idoso , Humanos , Estados Unidos , Medicare , Teorema de Bayes , Sistema de Registros , Lesões Encefálicas Traumáticas/terapia

10.

Multiple imputation of incomplete multilevel data using Heckman selection models.

Muñoz, Johanna; Efthimiou, Orestis; Audigier, Vincent; de Jong, Valentijn M T; Debray, Thomas P A.

Stat Med ; 43(3): 514-533, 2024 02 10.

Artigo em Inglês | MEDLINE | ID: mdl-38073512

RESUMO

Missing data is a common problem in medical research, and is commonly addressed using multiple imputation. Although traditional imputation methods allow for valid statistical inference when data are missing at random (MAR), their implementation is problematic when the presence of missingness depends on unobserved variables, that is, the data are missing not at random (MNAR). Unfortunately, this MNAR situation is rather common, in observational studies, registries and other sources of real-world data. While several imputation methods have been proposed for addressing individual studies when data are MNAR, their application and validity in large datasets with multilevel structure remains unclear. We therefore explored the consequence of MNAR data in hierarchical data in-depth, and proposed a novel multilevel imputation method for common missing patterns in clustered datasets. This method is based on the principles of Heckman selection models and adopts a two-stage meta-analysis approach to impute binary and continuous variables that may be outcomes or predictors and that are systematically or sporadically missing. After evaluating the proposed imputation model in simulated scenarios, we illustrate it use in a cross-sectional community survey to estimate the prevalence of malaria parasitemia in children aged 2-10 years in five regions in Uganda.

Assuntos

Pesquisa Biomédica , Criança , Humanos , Estudos Transversais , Uganda/epidemiologia

11.

Multiple imputation strategies for missing event times in a multi-state model analysis.

Curnow, Elinor; Hughes, Rachael A; Birnie, Kate; Tilling, Kate; Crowther, Michael J.

Stat Med ; 43(6): 1238-1255, 2024 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-38258282

RESUMO

In clinical studies, multi-state model (MSM) analysis is often used to describe the sequence of events that patients experience, enabling better understanding of disease progression. A complicating factor in many MSM studies is that the exact event times may not be known. Motivated by a real dataset of patients who received stem cell transplants, we considered the setting in which some event times were exactly observed and some were missing. In our setting, there was little information about the time intervals in which the missing event times occurred and missingness depended on the event type, given the analysis model covariates. These additional challenges limited the usefulness of some missing data methods (maximum likelihood, complete case analysis, and inverse probability weighting). We show that multiple imputation (MI) of event times can perform well in this setting. MI is a flexible method that can be used with any complete data analysis model. Through an extensive simulation study, we show that MI by predictive mean matching (PMM), in which sampling is from a set of observed times without reliance on a specific parametric distribution, has little bias when event times are missing at random, conditional on the observed data. Applying PMM separately for each sub-group of patients with a different pathway through the MSM tends to further reduce bias and improve precision. We recommend MI using PMM methods when performing MSM analysis with Markov models and partially observed event times.

Assuntos

Projetos de Pesquisa , Humanos , Interpretação Estatística de Dados , Simulação por Computador , Probabilidade , Viés

12.

Data fusion for predicting long-term program impacts.

Robbins, Michael W; Bauhoff, Sebastian; Burgette, Lane.

Stat Med ; 43(19): 3702-3722, 2024 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-38890124

RESUMO

Policymakers often require information on programs' long-term impacts that is not available when decisions are made. For example, while rigorous evidence from the Oregon Health Insurance Experiment (OHIE) shows that having health insurance influences short-term health and financial measures, the impact on long-term outcomes, such as mortality, will not be known for many years following the program's implementation. We demonstrate how data fusion methods may be used address the problem of missing final outcomes and predict long-run impacts of interventions before the requisite data are available. We implement this method by concatenating data on an intervention (such as the OHIE) with auxiliary long-term data and then imputing missing long-term outcomes using short-term surrogate outcomes while approximating uncertainty with replication methods. We use simulations to examine the performance of the methodology and apply the method in a case study. Specifically, we fuse data on the OHIE with data from the National Longitudinal Mortality Study and estimate that being eligible to apply for subsidized health insurance will lead to a statistically significant improvement in long-term mortality.

Assuntos

Seguro Saúde , Humanos , Oregon , Seguro Saúde/estatística & dados numéricos , Simulação por Computador , Mortalidade , Estudos Longitudinais , Estados Unidos , Modelos Estatísticos

13.

Three-phase generalized raking and multiple imputation estimators to address error-prone data.

Amorim, Gustavo; Tao, Ran; Lotspeich, Sarah; Shaw, Pamela A; Lumley, Thomas; Patel, Rena C; Shepherd, Bryan E.

Stat Med ; 43(2): 379-394, 2024 01 30.

Artigo em Inglês | MEDLINE | ID: mdl-37987515

RESUMO

Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 83 671 women living with HIV, whose data were originally extracted from electronic medical records, of whom 4732 had their charts reviewed, and a subsequent 1210 also had a telephone interview to validate key study variables.

Assuntos

Confiabilidade dos Dados , Registros Eletrônicos de Saúde , Feminino , Humanos , Infecções por HIV

14.

τ $$ \tau $$ -Inflated beta regression model for censored recurrent events.

Wang, Yizhuo; Murray, Susan.

Stat Med ; 43(6): 1170-1193, 2024 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-38386367

RESUMO

This research introduces a multivariate τ $$ \tau $$ -inflated beta regression ( τ $$ \tau $$ -IBR) modeling approach for the analysis of censored recurrent event data that is particularly useful when there is a mixture of (a) individuals who are generally less susceptible to recurrent events and (b) heterogeneity in duration of event-free periods amongst those who experience events. The modeling approach is applied to a restructured version of the recurrent event data that consists of censored longitudinal times-to-first-event in τ $$ \tau $$ length follow-up windows that potentially overlap. Multiple imputation (MI) and expectation-solution (ES) approaches appropriate for censored data are developed as part of the model fitting process. A suite of useful analysis outputs are provided from the τ $$ \tau $$ -IBR model that include parameter estimates to help interpret the (a) and (b) mixture of event times in the data, estimates of mean τ $$ \tau $$ -restricted event-free duration in a τ $$ \tau $$ -length follow-up window based on a patient's covariate profile, and heat maps of raw τ $$ \tau $$ -restricted event-free durations observed in the data with censored observations augmented via averages across MI datasets. Simulations indicate good statistical performance of the proposed τ $$ \tau $$ -IBR approach to modeling censored recurrent event data. An example is given based on the Azithromycin for Prevention of COPD Exacerbations Trial.

Assuntos

Azitromicina , Doença Pulmonar Obstrutiva Crônica , Humanos

15.

Nonparametric empirical Bayes biomarker imputation and estimation.

Barbehenn, Alton; Zhao, Sihai Dave.

Stat Med ; 43(19): 3742-3758, 2024 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-38897921

RESUMO

Biomarkers are often measured in bulk to diagnose patients, monitor patient conditions, and research novel drug pathways. The measurement of these biomarkers often suffers from detection limits that result in missing and untrustworthy measurements. Frequently, missing biomarkers are imputed so that down-stream analysis can be conducted with modern statistical methods that cannot normally handle data subject to informative censoring. This work develops an empirical Bayes g $$ g $$ -modeling method for imputing and denoising biomarker measurements. We establish superior estimation properties compared to popular methods in simulations and with real data, providing the useful biomarker measurement estimations for down-stream analysis.

Assuntos

Teorema de Bayes , Biomarcadores , Simulação por Computador , Humanos , Biomarcadores/análise , Modelos Estatísticos , Estatísticas não Paramétricas , Interpretação Estatística de Dados

16.

Model-based standardization using multiple imputation.

Remiro-Azócar, Antonio; Heath, Anna; Baio, Gianluca.

BMC Med Res Methodol ; 24(1): 32, 2024 Feb 10.

Artigo em Inglês | MEDLINE | ID: mdl-38341552

RESUMO

BACKGROUND: When studying the association between treatment and a clinical outcome, a parametric multivariable model of the conditional outcome expectation is often used to adjust for covariates. The treatment coefficient of the outcome model targets a conditional treatment effect. Model-based standardization is typically applied to average the model predictions over the target covariate distribution, and generate a covariate-adjusted estimate of the marginal treatment effect. METHODS: The standard approach to model-based standardization involves maximum-likelihood estimation and use of the non-parametric bootstrap. We introduce a novel, general-purpose, model-based standardization method based on multiple imputation that is easily applicable when the outcome model is a generalized linear model. We term our proposed approach multiple imputation marginalization (MIM). MIM consists of two main stages: the generation of synthetic datasets and their analysis. MIM accommodates a Bayesian statistical framework, which naturally allows for the principled propagation of uncertainty, integrates the analysis into a probabilistic framework, and allows for the incorporation of prior evidence. RESULTS: We conduct a simulation study to benchmark the finite-sample performance of MIM in conjunction with a parametric outcome model. The simulations provide proof-of-principle in scenarios with binary outcomes, continuous-valued covariates, a logistic outcome model and the marginal log odds ratio as the target effect measure. When parametric modeling assumptions hold, MIM yields unbiased estimation in the target covariate distribution, valid coverage rates, and similar precision and efficiency than the standard approach to model-based standardization. CONCLUSION: We demonstrate that multiple imputation can be used to marginalize over a target covariate distribution, providing appropriate inference with a correctly specified parametric outcome model and offering statistical performance comparable to that of the standard approach to model-based standardization.

Assuntos

Modelos Estatísticos , Humanos , Teorema de Bayes , Modelos Lineares , Simulação por Computador , Modelos Logísticos , Padrões de Referência

17.

Handling missing data and measurement error for early-onset myopia risk prediction models.

Lai, Hongyu; Gao, Kaiye; Li, Meiyan; Li, Tao; Zhou, Xiaodong; Zhou, Xingtao; Guo, Hui; Fu, Bo.

BMC Med Res Methodol ; 24(1): 194, 2024 Sep 06.

Artigo em Inglês | MEDLINE | ID: mdl-39243025

RESUMO

BACKGROUND: Early identification of children at high risk of developing myopia is essential to prevent myopia progression by introducing timely interventions. However, missing data and measurement error (ME) are common challenges in risk prediction modelling that can introduce bias in myopia prediction. METHODS: We explore four imputation methods to address missing data and ME: single imputation (SI), multiple imputation under missing at random (MI-MAR), multiple imputation with calibration procedure (MI-ME), and multiple imputation under missing not at random (MI-MNAR). We compare four machine-learning models (Decision Tree, Naive Bayes, Random Forest, and Xgboost) and three statistical models (logistic regression, stepwise logistic regression, and least absolute shrinkage and selection operator logistic regression) in myopia risk prediction. We apply these models to the Shanghai Jinshan Myopia Cohort Study and also conduct a simulation study to investigate the impact of missing mechanisms, the degree of ME, and the importance of predictors on model performance. Model performance is evaluated using the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). RESULTS: Our findings indicate that in scenarios with missing data and ME, using MI-ME in combination with logistic regression yields the best prediction results. In scenarios without ME, employing MI-MAR to handle missing data outperforms SI regardless of the missing mechanisms. When ME has a greater impact on prediction than missing data, the relative advantage of MI-MAR diminishes, and MI-ME becomes more superior. Furthermore, our results demonstrate that statistical models exhibit better prediction performance than machine-learning models. CONCLUSION: MI-ME emerges as a reliable method for handling missing data and ME in important predictors for early-onset myopia risk prediction.

Assuntos

Aprendizado de Máquina , Miopia , Humanos , Miopia/diagnóstico , Miopia/epidemiologia , Feminino , Criança , Masculino , Modelos Logísticos , Modelos Estatísticos , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Fatores de Risco , Curva ROC , Teorema de Bayes , China/epidemiologia , Estudos de Coortes , Idade de Início

18.

Multi-metric comparison of machine learning imputation methods with application to breast cancer survival.

El Badisy, Imad; Graffeo, Nathalie; Khalis, Mohamed; Giorgi, Roch.

BMC Med Res Methodol ; 24(1): 191, 2024 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-39215245

RESUMO

Handling missing data in clinical prognostic studies is an essential yet challenging task. This study aimed to provide a comprehensive assessment of the effectiveness and reliability of different machine learning (ML) imputation methods across various analytical perspectives. Specifically, it focused on three distinct classes of performance metrics used to evaluate ML imputation methods: post-imputation bias of regression estimates, post-imputation predictive accuracy, and substantive model-free metrics. As an illustration, we applied data from a real-world breast cancer survival study. This comprehensive approach aimed to provide a thorough assessment of the effectiveness and reliability of ML imputation methods across various analytical perspectives. A simulated dataset with 30% Missing At Random (MAR) values was used. A number of single imputation (SI) methods - specifically KNN, missMDA, CART, missForest, missRanger, missCforest - and multiple imputation (MI) methods - specifically miceCART and miceRF - were evaluated. The performance metrics used were Gower's distance, estimation bias, empirical standard error, coverage rate, length of confidence interval, predictive accuracy, proportion of falsely classified (PFC), normalized root mean squared error (NRMSE), AUC, and C-index scores. The analysis revealed that in terms of Gower's distance, CART and missForest were the most accurate, while missMDA and CART excelled for binary covariates; missForest and miceCART were superior for continuous covariates. When assessing bias and accuracy in regression estimates, miceCART and miceRF exhibited the least bias. Overall, the various imputation methods demonstrated greater efficiency than complete-case analysis (CCA), with MICE methods providing optimal confidence interval coverage. In terms of predictive accuracy for Cox models, missMDA and missForest had superior AUC and C-index scores. Despite offering better predictive accuracy, the study found that SI methods introduced more bias into the regression coefficients compared to MI methods. This study underlines the importance of selecting appropriate imputation methods based on study goals and data types in time-to-event research. The varying effectiveness of methods across the different performance metrics studied highlights the value of using advanced machine learning algorithms within a multiple imputation framework to enhance research integrity and the robustness of findings.

Assuntos

Neoplasias da Mama , Aprendizado de Máquina , Humanos , Neoplasias da Mama/mortalidade , Feminino , Reprodutibilidade dos Testes , Algoritmos , Prognóstico , Interpretação Estatística de Dados , Análise de Sobrevida

19.

Propensity Score Weighting with Missing Data on Covariates and Clustered Data Structure.

Liu, Xiao.

Multivariate Behav Res ; 59(3): 411-433, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38379305

RESUMO

Propensity score (PS) analyses are increasingly popular in behavioral sciences. Two issues often add complexities to PS analyses, including missing data in observed covariates and clustered data structure. In previous research, methods for conducting PS analyses with considering either issue alone were examined. In practice, the two issues often co-occur; but the performance of methods for PS analyses in the presence of both issues has not been evaluated previously. In this study, we consider PS weighting analysis when data are clustered and observed covariates have missing values. A simulation study is conducted to evaluate the performance of different missing data handling methods (complete-case, single-level imputation, or multilevel imputation) combined with different multilevel PS weighting methods (fixed- or random-effects PS models, inverse-propensity-weighting or the clustered weighting, weighted single-level or multilevel outcome models). The results suggest that the bias in average treatment effect estimation can be reduced, by better accounting for clustering in both the missing data handling stage (such as with the multilevel imputation) and the PS analysis stage (such as with the fixed-effects PS model, clustered weighting, and weighted multilevel outcome model). A real-data example is provided for illustration.

Assuntos

Simulação por Computador , Pontuação de Propensão , Humanos , Análise por Conglomerados , Interpretação Estatística de Dados , Simulação por Computador/estatística & dados numéricos , Modelos Estatísticos , Análise Multinível/métodos , Viés

20.

Multiple Imputation with Factor Scores: A Practical Approach for Handling Simultaneous Missingness Across Items in Longitudinal Designs.

Li, Yanling; Oravecz, Zita; Ji, Linying; Chow, Sy-Miin.

Multivariate Behav Res ; : 1-29, 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-38997153

RESUMO

Missingness in intensive longitudinal data triggered by latent factors constitute one type of nonignorable missingness that can generate simultaneous missingness across multiple items on each measurement occasion. To address this issue, we propose a multiple imputation (MI) strategy called MI-FS, which incorporates factor scores, lag/lead variables, and missing data indicators into the imputation model. In the context of process factor analysis (PFA), we conducted a Monte Carlo simulation study to compare the performance of MI-FS to listwise deletion (LD), MI with manifest variables (MI-MV, which implements MI on both dependent variables and covariates), and partial MI with MVs (PMI-MV, which implements MI on covariates and handles missing dependent variables via full-information maximum likelihood) under different conditions. Across conditions, we found MI-based methods overall outperformed the LD; the MI-FS approach yielded lower root mean square errors (RMSEs) and higher coverage rates for auto-regression (AR) parameters compared to MI-MV; and the PMI-MV and MI-MV approaches yielded higher coverage rates for most parameters except AR parameters compared to MI-FS. These approaches were also compared using an empirical example investigating the relationships between negative affect and perceived stress over time. Recommendations on when and how to incorporate factor scores into MI processes were discussed.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA