Pesquisa | BVS Integralidade em Saúde

Using Super Learner Prediction Modeling to Improve High-dimensional Propensity Score Estimation.

Wyss, Richard; Schneeweiss, Sebastian; van der Laan, Mark; Lendle, Samuel D; Ju, Cheng; Franklin, Jessica M.

Epidemiology ; 29(1): 96-106, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-28991001

RESUMO

The high-dimensional propensity score is a semiautomated variable selection algorithm that can supplement expert knowledge to improve confounding control in nonexperimental medical studies utilizing electronic healthcare databases. Although the algorithm can be used to generate hundreds of patient-level variables and rank them by their potential confounding impact, it remains unclear how to select the optimal number of variables for adjustment. We used plasmode simulations based on empirical data to discuss and evaluate data-adaptive approaches for variable selection and prediction modeling that can be combined with the high-dimensional propensity score to improve confounding control in large healthcare databases. We considered approaches that combine the high-dimensional propensity score with Super Learner prediction modeling, a scalable version of collaborative targeted maximum-likelihood estimation, and penalized regression. We evaluated performance using bias and mean squared error (MSE) in effect estimates. Results showed that the high-dimensional propensity score can be sensitive to the number of variables included for adjustment and that severe overfitting of the propensity score model can negatively impact the properties of effect estimates. Combining the high-dimensional propensity score with Super Learner was the most consistent strategy, in terms of reducing bias and MSE in the effect estimates, and may be promising for semiautomated data-adaptive propensity score estimation in high-dimensional covariate datasets.

Assuntos

Algoritmos , Modelos Estatísticos , Pontuação de Propensão , Simulação por Computador , Fatores de Confusão Epidemiológicos , Bases de Dados Factuais , Humanos , Funções Verossimilhança , Modelos Logísticos , Análise de Regressão

Identification and efficient estimation of the natural direct effect among the untreated.

Lendle, Samuel D; Subbaraman, Meenakshi S; van der Laan, Mark J.

Biometrics ; 69(2): 310-7, 2013 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-23607645

RESUMO

The natural direct effect (NDE), or the effect of an exposure on an outcome if an intermediate variable was set to the level it would have been in the absence of the exposure, is often of interest to investigators. In general, the statistical parameter associated with the NDE is difficult to estimate in the non-parametric model, particularly when the intermediate variable is continuous or high dimensional. In this article, we introduce a new causal parameter called the natural direct effect among the untreated, discuss identifiability assumptions, propose a sensitivity analysis for some of the assumptions, and show that this new parameter is equivalent to the NDE in a randomized controlled trial. We also present a targeted minimum loss estimator (TMLE), a locally efficient, double robust substitution estimator for the statistical parameter associated with this causal parameter. The TMLE can be applied to problems with continuous and high dimensional intermediate variables, and can be used to estimate the NDE in a randomized controlled trial with such data. Additionally, we define and discuss the estimation of three related causal parameters: the natural direct effect among the treated, the indirect effect among the untreated and the indirect effect among the treated.

Assuntos

Biometria/métodos , Alcoolismo/psicologia , Alcoolismo/terapia , Causalidade , Humanos , Modelos Estatísticos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Projetos de Pesquisa , Estatísticas não Paramétricas , Resultado do Tratamento

Group testing for case identification with correlated responses.

Lendle, Samuel D; Hudgens, Michael G; Qaqish, Bahjat F.

Biometrics ; 68(2): 532-40, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-21950447

RESUMO

This article examines group testing procedures where units within a group (or pool) may be correlated. The expected number of tests per unit (i.e., efficiency) of hierarchical- and matrix-based procedures is derived based on a class of models of exchangeable binary random variables. The effect on efficiency of the arrangement of correlated units within pools is then examined. In general, when correlated units are arranged in the same pool, the expected number of tests per unit decreases, sometimes substantially, relative to arrangements that ignore information about correlation.

Assuntos

Biometria/métodos , Testes Diagnósticos de Rotina/estatística & dados numéricos , Vacinas contra a AIDS/imunologia , Algoritmos , Mapeamento de Epitopos/estatística & dados numéricos , Antígenos HIV/imunologia , Humanos , Programas de Rastreamento/estatística & dados numéricos , Modelos Estatísticos , Método de Monte Carlo , Linfócitos T/imunologia

Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods.

Ju, Cheng; Combs, Mary; Lendle, Samuel D; Franklin, Jessica M; Wyss, Richard; Schneeweiss, Sebastian; van der Laan, Mark J.

J Appl Stat ; 46(12): 2216-2236, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-32843815

RESUMO

The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a "library" of candidate prediction models. While SL has been widely studied in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of SL in its ability to predict the propensity score (PS), the conditional probability of treatment assignment given baseline covariates, using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also proposed a novel strategy for prediction modeling that combines SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases.

Scalable collaborative targeted learning for high-dimensional data.

Ju, Cheng; Gruber, Susan; Lendle, Samuel D; Chambaz, Antoine; Franklin, Jessica M; Wyss, Richard; Schneeweiss, Sebastian; van der Laan, Mark J.

Stat Methods Med Res ; 28(2): 532-554, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-28936917

RESUMO

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well-behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation procedure. The original instantiation of the collaborative targeted minimum loss-based estimation template can be presented as a greedy forward stepwise collaborative targeted minimum loss-based estimation algorithm. It does not scale well when the number p of covariates increases drastically. This motivates the introduction of a novel instantiation of the collaborative targeted minimum loss-based estimation template where the covariates are pre-ordered. Its time complexity is O(p) as opposed to the original O(p2) , a remarkable gain. We propose two pre-ordering strategies and suggest a rule of thumb to develop other meaningful strategies. Because it is usually unclear a priori which pre-ordering strategy to choose, we also introduce another instantiation called SL-C-TMLE algorithm that enables the data-driven choice of the better pre-ordering strategy given the problem at hand. Its time complexity is O(p) as well. The computational burden and relative performance of these algorithms were compared in simulation studies involving fully synthetic data or partially synthetic data based on a real world large electronic health database; and in analyses of three real, large electronic health databases. In all analyses involving electronic health databases, the greedy collaborative targeted minimum loss-based estimation algorithm is unacceptably slow. Simulation studies seem to indicate that our scalable collaborative targeted minimum loss-based estimation and SL-C-TMLE algorithms work well. All C-TMLEs are publicly available in a Julia software package.

Assuntos

Modelos Estatísticos , Idoso , Algoritmos , Anti-Inflamatórios não Esteroides/efeitos adversos , Simulação por Computador , Hemorragia Gastrointestinal/induzido quimicamente , Humanos , Estudos Observacionais como Assunto , Úlcera Péptica/induzido quimicamente , Úlcera Péptica Perfurada/induzido quimicamente , Pontuação de Propensão

Targeted maximum likelihood estimation in safety analysis.

Lendle, Samuel D; Fireman, Bruce; van der Laan, Mark J.

J Clin Epidemiol ; 66(8 Suppl): S91-8, 2013 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-23849159

RESUMO

OBJECTIVES: To compare the performance of a targeted maximum likelihood estimator (TMLE) and a collaborative TMLE (CTMLE) to other estimators in a drug safety analysis, including a regression-based estimator, propensity score (PS)-based estimators, and an alternate doubly robust (DR) estimator in a real example and simulations. STUDY DESIGN AND SETTING: The real data set is a subset of observational data from Kaiser Permanente Northern California formatted for use in active drug safety surveillance. Both the real and simulated data sets include potential confounders, a treatment variable indicating use of one of two antidiabetic treatments and an outcome variable indicating occurrence of an acute myocardial infarction (AMI). RESULTS: In the real data example, there is no difference in AMI rates between treatments. In simulations, the double robustness property is demonstrated: DR estimators are consistent if either the initial outcome regression or PS estimator is consistent, whereas other estimators are inconsistent if the initial estimator is not consistent. In simulations with near-positivity violations, CTMLE performs well relative to other estimators by adaptively estimating the PS. CONCLUSION: Each of the DR estimators was consistent, and TMLE and CTMLE had the smallest mean squared error in simulations.

Assuntos

Causalidade , Hipoglicemiantes/uso terapêutico , Funções Verossimilhança , Modelos Estatísticos , Vigilância de Produtos Comercializados/estatística & dados numéricos , Algoritmos , Simulação por Computador , Fatores de Confusão Epidemiológicos , Interpretação Estatística de Dados , Diabetes Mellitus/tratamento farmacológico , Humanos , Infarto do Miocárdio/epidemiologia , Pontuação de Propensão , Projetos de Pesquisa , Resultado do Tratamento

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa