Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Pharmacoepidemiol Drug Saf ; 32(3): 330-340, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36380400

RESUMO

PURPOSE: In distributed research network (DRN) settings, multiple imputation cannot be directly implemented because pooling individual-level data are often not feasible. The performance of multiple imputation in combination with meta-analysis is not well understood within DRNs. METHODS: To evaluate the performance of imputation for missing baseline covariate data in combination with meta-analysis for time-to-event analysis within DRNs, we compared two parametric algorithms including one approximated linear imputation model (Approx), and one nonlinear substantive model compatible imputation model (SMC), as well as two non-parametric machine learning algorithms including random forest (RF), and classification and regression trees (CART), through simulation studies motivated by a real-world data set. RESULTS: Under the setting with small effect sizes (i.e., log-Hazard ratios [logHR]) and homogeneous missingness mechanisms across sites, all imputation methods produced unbiased and more efficient estimates while the complete-case analysis could be biased and inefficient; and under heterogeneous missingness mechanisms, estimates with RF method could have higher efficiency. Estimates from the distributed imputation combined by meta-analysis were similar to those from the imputation using pooled data. When logHRs were large, the SMC imputation algorithm generally performed better than others. CONCLUSIONS: These findings suggest the validity and feasibility of imputation within DRNs in the presence of missing covariate data in time-to-event analysis under various settings. The performance of the four imputation algorithms varies with the effect sizes and level of missingness.


Assuntos
Algoritmos , Humanos , Simulação por Computador , Modelos de Riscos Proporcionais , Modelos Lineares
2.
J Biopharm Stat ; 32(5): 717-739, 2022 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-35041565

RESUMO

The literature on dealing with missing covariates in nonrandomized studies advocates the use of sophisticated methods like multiple imputation (MI) and maximum likelihood (ML)-based approaches over simple methods. However, these methods are not necessarily optimal in terms of bias and efficiency of treatment effect estimation in randomized studies, where the covariate of interest (treatment group) is independent of all baseline (pre-randomization) covariates due to randomization. This has been shown in the literature, but only for missingness on a single baseline covariate. Here, we extend the situation to multiple baseline covariates with missingness and evaluate the performance of MI and ML compared with simple alternative methods under various missingness scenarios in RCTs with a quantitative outcome. We first derive asymptotic relative efficiencies of the simple methods under the missing completely at random (MCAR) scenario and then perform a simulation study for non-MCAR scenarios. Finally, a trial on chronic low back pain is used to illustrate the implementation of the methods. The results show that all simple methods give unbiased treatment effect estimation but with increased mean squared residual. It also turns out that mean imputation and the missing-indicator method are most efficient under all covariate missingness scenarios and perform at least as well as MI and LM in each scenario.


Assuntos
Projetos de Pesquisa , Viés , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto
3.
Entropy (Basel) ; 24(3)2022 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-35327897

RESUMO

Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the use of modern Machine-Learning algorithms for imputation. This originates from their capability of showing favorable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine-Learning-based methods for both imputation and prediction are used. We see that even a slight decrease in imputation accuracy can seriously affect the prediction accuracy. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as the coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.

4.
Biometrics ; 76(1): 270-280, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31393001

RESUMO

For regression with covariates missing not at random where the missingness depends on the missing covariate values, complete-case (CC) analysis leads to consistent estimation when the missingness is independent of the response given all covariates, but it may not have the desired level of efficiency. We propose a general empirical likelihood framework to improve estimation efficiency over the CC analysis. We expand on methods in Bartlett et al. (2014, Biostatistics 15, 719-730) and Xie and Zhang (2017, Int J Biostat 13, 1-20) that improve efficiency by modeling the missingness probability conditional on the response and fully observed covariates by allowing the possibility of modeling other data distribution-related quantities. We also give guidelines on what quantities to model and demonstrate that our proposal has the potential to yield smaller biases than existing methods when the missingness probability model is incorrect. Simulation studies are presented, as well as an application to data collected from the US National Health and Nutrition Examination Survey.


Assuntos
Biometria/métodos , Análise de Regressão , Análise de Variância , Viés , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Funções Verossimilhança , Modelos Estatísticos , Inquéritos Nutricionais/estatística & dados numéricos , Probabilidade , Estados Unidos
5.
Stat Med ; 2020 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-32101332

RESUMO

This study develops a two-part hidden Markov model (HMM) for analyzing semicontinuous longitudinal data in the presence of missing covariates. The proposed model manages a semicontinuous variable by splitting it into two random variables: a binary indicator for determining the occurrence of excess zeros at all occasions and a continuous random variable for examining its actual level. For the continuous longitudinal response, an HMM is proposed to describe the relationship between the observation and unobservable finite-state transition processes. The HMM consists of two major components. The first component is a transition model for investigating how potential covariates influence the probabilities of transitioning from one hidden state to another. The second component is a conditional regression model for examining the state-specific effects of covariates on the response. A shared random effect is introduced to each part of the model to accommodate possible unobservable heterogeneity among observation processes and the nonignorability of missing covariates. A Bayesian adaptive least absolute shrinkage and selection operator (lasso) procedure is developed to conduct simultaneous variable selection and estimation. The proposed methodology is applied to a study on the Alzheimer's Disease Neuroimaging Initiative dataset. New insights into the pathology of Alzheimer's disease and its potential risk factors are obtained.

6.
Biom J ; 62(4): 1025-1037, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31957905

RESUMO

Data with missing covariate values but fully observed binary outcomes are an important subset of the missing data challenge. Common approaches are complete case analysis (CCA) and multiple imputation (MI). While CCA relies on missing completely at random (MCAR), MI usually relies on a missing at random (MAR) assumption to produce unbiased results. For MI involving logistic regression models, it is also important to consider several missing not at random (MNAR) conditions under which CCA is asymptotically unbiased and, as we show, MI is also valid in some cases. We use a data application and simulation study to compare the performance of several machine learning and parametric MI methods under a fully conditional specification framework (MI-FCS). Our simulation includes five scenarios involving MCAR, MAR, and MNAR under predictable and nonpredictable conditions, where "predictable" indicates missingness is not associated with the outcome. We build on previous results in the literature to show MI and CCA can both produce unbiased results under more conditions than some analysts may realize. When both approaches were valid, we found that MI-FCS was at least as good as CCA in terms of estimated bias and coverage, and was superior when missingness involved a categorical covariate. We also demonstrate how MNAR sensitivity analysis can build confidence that unbiased results were obtained, including under MNAR-predictable, when CCA and MI are both valid. Since the missingness mechanism cannot be identified from observed data, investigators should compare results from MI and CCA when both are plausibly valid, followed by MNAR sensitivity analysis.


Assuntos
Biometria/métodos , Viés , Modelos Logísticos , Aprendizado de Máquina , Análise Multivariada
7.
Stat Med ; 38(3): 452-479, 2019 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-30311246

RESUMO

Missing covariates in regression analysis are a pervasive problem in medical, social, and economic researches. We study empirical-likelihood confidence regions for unconstrained and constrained regression parameters in a nonignorable covariate-missing data problem. For an assumed conditional mean regression model, we assume that some covariates are fully observed but other covariates are missing for some subjects. By exploitation of a probability model of missingness and a working conditional score model from a semiparametric perspective, we build a system of unbiased estimating equations, where the number of equations exceeds the number of unknown parameters. Based on the proposed estimating equations, we introduce unconstrained and constrained empirical-likelihood ratio statistics to construct empirical-likelihood confidence regions for the underlying regression parameters without and with constraints. We establish the asymptotic distributions of the proposed empirical-likelihood ratio statistics. Simulation results show that the proposed empirical-likelihood methods have a better finite-sample performance than other competitors in terms of coverage probability and interval length. Finally, we apply the proposed empirical-likelihood methods to the analysis of a data set from the US National Health and Nutrition Examination Survey.


Assuntos
Intervalos de Confiança , Interpretação Estatística de Dados , Funções Verossimilhança , Viés , Humanos , Modelos Estatísticos , Inquéritos Nutricionais/estatística & dados numéricos , Probabilidade , Análise de Regressão
8.
Ann Inst Stat Math ; 71(2): 365-387, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31530958

RESUMO

This paper presents simple weighted and fully augmented weighted estimators for the additive hazards model with missing covariates when they are missing at random. The additive hazards model estimates the difference in hazards and has an intuitive biological interpretation. The proposed weighted estimators for the additive hazards model use incomplete data nonparametrically and have close-form expressions. We show that they are consistent and asymptotically normal, and are more efficient than the simple weighted estimator which only uses the complete data. We illustrate their finite-sample performance through simulation studies and an application to study the progression from mild cognitive impairment to dementia using data from the Alzheimer's Disease Neuroimaging Initiative as well as an application to the mouse leukemia study.

9.
Stat Med ; 37(8): 1325-1342, 2018 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-29318652

RESUMO

Missing covariate values are prevalent in regression applications. While an array of methods have been developed for estimating parameters in regression models with missing covariate data for a variety of response types, minimal focus has been given to validation of the response model and influence diagnostics. Previous research has mainly focused on estimating residuals for observations with missing covariates using expected values, after which specialized techniques are needed to conduct proper inference. We suggest a multiple imputation strategy that allows for the use of standard methods for residual analyses on the imputed data sets or a stacked data set. We demonstrate the suggested multiple imputation method by analyzing the Sleep in Mammals data in the context of a linear regression model and the New York Social Indicators Status data with a logistic regression model.


Assuntos
Viés , Interpretação Estatística de Dados , Modelos Lineares , Modelos Logísticos , Reprodutibilidade dos Testes , Animais , Simulação por Computador , Demografia , Humanos , Camundongos , New York , Análise de Regressão , Sono
10.
Biostatistics ; 17(4): 751-63, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27179002

RESUMO

Studies often follow individuals until they fail from one of a number of competing failure types. One approach to analyzing such competing risks data involves modeling the cause-specific hazards as functions of baseline covariates. A common issue that arises in this context is missing values in covariates. In this setting, we first establish conditions under which complete case analysis (CCA) is valid. We then consider application of multiple imputation to handle missing covariate values, and extend the recently proposed substantive model compatible version of fully conditional specification (SMC-FCS) imputation to the competing risks setting. Through simulations and an illustrative data analysis, we compare CCA, SMC-FCS, and a recent proposal for imputing missing covariates in the competing risks setting.


Assuntos
Bioestatística/métodos , Interpretação Estatística de Dados , Modelos Estatísticos , Inquéritos Nutricionais/estatística & dados numéricos , Medição de Risco/métodos , Humanos , Modelos de Riscos Proporcionais
11.
Biometrics ; 73(1): 271-282, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27378229

RESUMO

In this article, we propose an association model to estimate the penetrance (risk) of successive cancers in the presence of competing risks. The association between the successive events is modeled via a copula and a proportional hazards model is specified for each competing event. This work is motivated by the analysis of successive cancers for people with Lynch Syndrome in the presence of competing risks. The proposed inference procedure is adapted to handle missing genetic covariates and selection bias, induced by the data collection protocol of the data at hand. The performance of the proposed estimation procedure is evaluated by simulations and its use is illustrated with data from the Colon Cancer Family Registry (Colon CFR).


Assuntos
Neoplasias Colorretais Hereditárias sem Polipose/patologia , Interpretação Estatística de Dados , Modelos de Riscos Proporcionais , Análise de Variância , Viés , Neoplasias do Colo , Simulação por Computador , Genética , Humanos , Sistema de Registros , Risco
12.
Biostatistics ; 15(4): 719-30, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24907708

RESUMO

Missing values in covariates of regression models are a pervasive problem in empirical research. Popular approaches for analyzing partially observed datasets include complete case analysis (CCA), multiple imputation (MI), and inverse probability weighting (IPW). In the case of missing covariate values, these methods (as typically implemented) are valid under different missingness assumptions. In particular, CCA is valid under missing not at random (MNAR) mechanisms in which missingness in a covariate depends on the value of that covariate, but is conditionally independent of outcome. In this paper, we argue that in some settings such an assumption is more plausible than the missing at random assumption underpinning most implementations of MI and IPW. When the former assumption holds, although CCA gives consistent estimates, it does not make use of all observed information. We therefore propose an augmented CCA approach which makes the same conditional independence assumption for missingness as CCA, but which improves efficiency through specification of an additional model for the probability of missingness, given the fully observed variables. The new method is evaluated using simulations and illustrated through application to data on reported alcohol consumption and blood pressure from the US National Health and Nutrition Examination Survey, in which data are likely MNAR independent of outcome.


Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Consumo de Bebidas Alcoólicas/epidemiologia , Pressão Sanguínea , Humanos
13.
Stat Sin ; 23(3): 1155-1180, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-24489449

RESUMO

Case-cohort design, an outcome-dependent sampling design for censored survival data, is increasingly used in biomedical research. The development of asymptotic theory for a case-cohort design in the current literature primarily relies on counting process stochastic integrals. Such an approach, however, is rather limited and lacks theoretical justification for outcome-dependent weighted methods due to non-predictability. Instead of stochastic integrals, we derive asymptotic properties for case-cohort studies based on a general Z-estimation theory for semi-parametric models with bundled parameters using empirical process theory. Both the Cox model and the additive hazards model with time-dependent covariates are considered.

14.
Best Pract Res Clin Haematol ; 36(2): 101477, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37353284

RESUMO

Missing data are frequently encountered across studies in clinical haematology. Failure to handle these missing values in an appropriate manner can complicate the interpretation of a study's findings, as estimates presented may be biased and/or imprecise. In the present work, we first provide an overview of current methods for handling missing covariate data, along with their advantages and disadvantages. Furthermore, a systematic review is presented, exploring both contemporary reporting of missing values in major haematological journals, and the methods used for handling them. A principal finding was that the method of handling missing data was explicitly specified in a minority of articles (in 76 out of 195 articles reporting missing values, 39%). Among these, complete case analysis and the missing indicator method were the most common approaches to dealing with missing values, with more complex methods such as multiple imputation being extremely rare (in 7 out of 195 articles). An example analysis (with associated code) is also provided using hematopoietic stem cell transplantation data, illustrating the different approaches to handling missing values. We conclude with various recommendations regarding the reporting and handling of missing values for future studies in clinical haematology.


Assuntos
Hematologia , Humanos , Interpretação Estatística de Dados , Projetos de Pesquisa
15.
J Appl Stat ; 50(9): 2014-2035, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37378269

RESUMO

Predicting the annual frequency of tropical storms is of interest because it can provide basic information towards improved preparation against these storms. Sea surface temperatures (SSTs) averaged over the hurricane season can predict annual tropical cyclone activity well. But predictions need to be made before the hurricane season when the predictors are not yet observed. Several climate models issue forecasts of the SSTs, which can be used instead. Such models use the forecasts of SSTs as surrogates for the true SSTs. We develop a Bayesian negative binomial regression model, which makes a distinction between the true SSTs and their forecasts, both of which are included in the model. For prediction, the true SSTs may be regarded as unobserved predictors and sampled from their posterior predictive distribution. We also have a small fraction of missing data for the SST forecasts from the climate models. Thus, we propose a model that can simultaneously handle missing predictors and variable selection uncertainty. If the main goal is prediction, an interesting question is should we include predictors in the model that are missing at the time of prediction? We attempt to answer this question and demonstrate that our model can provide gains in prediction.

16.
J Stat Plan Inference ; 142(10): 2819-2831, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23805025

RESUMO

Many analyses for incomplete longitudinal data are directed to examining the impact of covariates on the marginal mean responses. We consider the setting in which longitudinal responses are collected from individuals nested within clusters. We discuss methods for assessing covariate effects on the mean and association parameters when covariates are incompletely observed. Weighted first and second order estimating equations are constructed to obtain consistent estimates of mean and association parameters when covariates are missing at random. Empirical studies demonstrate that estimators from the proposed method have negligible finite sample biases in moderate samples. An application to the National Alzheimer's Coordinating Center (NACC) Uniform Data Set (UDS) demonstrates the utility of the proposed method.

17.
Stat (Int Stat Inst) ; 11(1)2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37854542

RESUMO

High-dimensional data with censored outcomes of interest are prevalent in medical research. To analyze such data, the regularized Buckley-James estimator has been successfully applied to build accurate predictive models and conduct variable selection. In this paper, we consider the problem of parameter estimation and variable selection for the semiparametric accelerated failure time model for high-dimensional block-missing multimodal neuroimaging data with censored outcomes. We propose a penalized Buckley-James method that can simultaneously handle block-wise missing covariates and censored outcomes. This method can also perform variable selection. The proposed method is evaluated by simulations and applied to a multimodal neuroimaging dataset and obtains meaningful results.

18.
Stat Methods Med Res ; 31(10): 1860-1880, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35658734

RESUMO

In studies analyzing competing time-to-event outcomes, interest often lies in both estimating the effects of baseline covariates on the cause-specific hazards and predicting cumulative incidence functions. When missing values occur in these baseline covariates, they may be discarded as part of a complete-case analysis or multiply imputed. In the latter case, the imputations may be performed either compatibly with a substantive model pre-specified as a cause-specific Cox model [substantive model compatible fully conditional specification (SMC-FCS)], or approximately so [multivariate imputation by chained equations (MICE)]. In a large simulation study, we assessed the performance of these three different methods in terms of estimating cause-specific regression coefficients and predicting cumulative incidence functions. Concerning regression coefficients, results provide further support for use of SMC-FCS over MICE, particularly when covariate effects are large and the baseline hazards of the competing events are substantially different. Complete-case analysis also shows adequate performance in settings where missingness is not outcome dependent. With regard to cumulative incidence prediction, SMC-FCS and MICE are performed more similarly, as also evidenced in the illustrative analysis of competing outcomes following a hematopoietic stem cell transplantation. The findings are discussed alongside recommendations for practising statisticians.


Assuntos
Modelos Estatísticos , Simulação por Computador , Interpretação Estatística de Dados , Modelos de Riscos Proporcionais
19.
Bayesian Anal ; 15(3): 759-780, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33692872

RESUMO

For many biomedical, environmental, and economic studies, the single index model provides a practical dimension reaction as well as a good physical interpretation of the unknown nonlinear relationship between the response and its multiple predictors. However, widespread uses of existing Bayesian analysis for such models are lacking in practice due to some major impediments, including slow mixing of the Markov Chain Monte Carlo (MCMC), the inability to deal with missing covariates and a lack of theoretical justification of the rate of convergence of Bayesian estimates. We present a new Bayesian single index model with an associated MCMC algorithm that incorporates an efficient Metropolis-Hastings (MH) step for the conditional distribution of the index vector. Our method leads to a model with good interpretations and prediction, implementable Bayesian inference, fast convergence of the MCMC and a first-time extension to accommodate missing covariates. We also obtain, for the first time, the set of sufficient conditions for obtaining the optimal rate of posterior convergence of the overall regression function. We illustrate the practical advantages of our method and computational tool via reanalysis of an environmental study.

20.
AAPS J ; 21(4): 68, 2019 05 28.
Artigo em Inglês | MEDLINE | ID: mdl-31140019

RESUMO

Body weight is the primary covariate in pharmacokinetics of many drugs and dramatically changes during the first weeks of life of neonates. The objective of this study is to determine if missing body weights in preterm and term neonates affect estimates of model parameters and which methods can be used to improve performance of a population pharmacokinetic model of paracetamol. Data for our analysis were obtained from previously published studies on the pharmacokinetics of intravenous paracetamol in neonates. We adopted a population model of body weight change in neonates to implement three previously introduced methods of handling missing covariates based on data imputation, likelihood function modification, and full random effects modeling. All models were implemented in NONMEM 7.4, and population parameters were estimated using the FOCE method. Our major finding was that missing body weights minimally affect population estimates of pharmacokinetic parameters but do affect the covariate relationship parameters, particularly the one describing dependence of clearance on body weight. None of the tested methods changed estimates of between-subject variability nor impacted the predictive performance of the model. Our analysis shows that a modeling approach towards handling missing covariates allows borrowing information gathered in various studies as long as they target the same population. This approach is particularly useful for handling time-dependent missing covariates.


Assuntos
Acetaminofen/farmacocinética , Analgésicos não Narcóticos/farmacocinética , Peso Corporal , Modelos Biológicos , Acetaminofen/administração & dosagem , Acetaminofen/sangue , Analgésicos não Narcóticos/administração & dosagem , Analgésicos não Narcóticos/sangue , Cálculos da Dosagem de Medicamento , Humanos , Recém-Nascido , Recém-Nascido Prematuro , Injeções Intravenosas , Funções Verossimilhança , Dinâmica não Linear , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA