Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
Stat Methods Med Res ; 28(2): 532-554, 2019 02.
Article in English | MEDLINE | ID: mdl-28936917

ABSTRACT

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well-behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation procedure. The original instantiation of the collaborative targeted minimum loss-based estimation template can be presented as a greedy forward stepwise collaborative targeted minimum loss-based estimation algorithm. It does not scale well when the number p of covariates increases drastically. This motivates the introduction of a novel instantiation of the collaborative targeted minimum loss-based estimation template where the covariates are pre-ordered. Its time complexity is O(p) as opposed to the original O(p2) , a remarkable gain. We propose two pre-ordering strategies and suggest a rule of thumb to develop other meaningful strategies. Because it is usually unclear a priori which pre-ordering strategy to choose, we also introduce another instantiation called SL-C-TMLE algorithm that enables the data-driven choice of the better pre-ordering strategy given the problem at hand. Its time complexity is O(p) as well. The computational burden and relative performance of these algorithms were compared in simulation studies involving fully synthetic data or partially synthetic data based on a real world large electronic health database; and in analyses of three real, large electronic health databases. In all analyses involving electronic health databases, the greedy collaborative targeted minimum loss-based estimation algorithm is unacceptably slow. Simulation studies seem to indicate that our scalable collaborative targeted minimum loss-based estimation and SL-C-TMLE algorithms work well. All C-TMLEs are publicly available in a Julia software package.


Subject(s)
Models, Statistical , Aged , Algorithms , Anti-Inflammatory Agents, Non-Steroidal/adverse effects , Computer Simulation , Gastrointestinal Hemorrhage/chemically induced , Humans , Observational Studies as Topic , Peptic Ulcer/chemically induced , Peptic Ulcer Perforation/chemically induced , Propensity Score
2.
J Appl Stat ; 46(12): 2216-2236, 2019.
Article in English | MEDLINE | ID: mdl-32843815

ABSTRACT

The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a "library" of candidate prediction models. While SL has been widely studied in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of SL in its ability to predict the propensity score (PS), the conditional probability of treatment assignment given baseline covariates, using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also proposed a novel strategy for prediction modeling that combines SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases.

3.
Epidemiology ; 29(1): 96-106, 2018 01.
Article in English | MEDLINE | ID: mdl-28991001

ABSTRACT

The high-dimensional propensity score is a semiautomated variable selection algorithm that can supplement expert knowledge to improve confounding control in nonexperimental medical studies utilizing electronic healthcare databases. Although the algorithm can be used to generate hundreds of patient-level variables and rank them by their potential confounding impact, it remains unclear how to select the optimal number of variables for adjustment. We used plasmode simulations based on empirical data to discuss and evaluate data-adaptive approaches for variable selection and prediction modeling that can be combined with the high-dimensional propensity score to improve confounding control in large healthcare databases. We considered approaches that combine the high-dimensional propensity score with Super Learner prediction modeling, a scalable version of collaborative targeted maximum-likelihood estimation, and penalized regression. We evaluated performance using bias and mean squared error (MSE) in effect estimates. Results showed that the high-dimensional propensity score can be sensitive to the number of variables included for adjustment and that severe overfitting of the propensity score model can negatively impact the properties of effect estimates. Combining the high-dimensional propensity score with Super Learner was the most consistent strategy, in terms of reducing bias and MSE in the effect estimates, and may be promising for semiautomated data-adaptive propensity score estimation in high-dimensional covariate datasets.


Subject(s)
Algorithms , Models, Statistical , Propensity Score , Computer Simulation , Confounding Factors, Epidemiologic , Databases, Factual , Humans , Likelihood Functions , Logistic Models , Regression Analysis
4.
J Clin Epidemiol ; 66(8 Suppl): S91-8, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23849159

ABSTRACT

OBJECTIVES: To compare the performance of a targeted maximum likelihood estimator (TMLE) and a collaborative TMLE (CTMLE) to other estimators in a drug safety analysis, including a regression-based estimator, propensity score (PS)-based estimators, and an alternate doubly robust (DR) estimator in a real example and simulations. STUDY DESIGN AND SETTING: The real data set is a subset of observational data from Kaiser Permanente Northern California formatted for use in active drug safety surveillance. Both the real and simulated data sets include potential confounders, a treatment variable indicating use of one of two antidiabetic treatments and an outcome variable indicating occurrence of an acute myocardial infarction (AMI). RESULTS: In the real data example, there is no difference in AMI rates between treatments. In simulations, the double robustness property is demonstrated: DR estimators are consistent if either the initial outcome regression or PS estimator is consistent, whereas other estimators are inconsistent if the initial estimator is not consistent. In simulations with near-positivity violations, CTMLE performs well relative to other estimators by adaptively estimating the PS. CONCLUSION: Each of the DR estimators was consistent, and TMLE and CTMLE had the smallest mean squared error in simulations.


Subject(s)
Causality , Hypoglycemic Agents/therapeutic use , Likelihood Functions , Models, Statistical , Product Surveillance, Postmarketing/statistics & numerical data , Algorithms , Computer Simulation , Confounding Factors, Epidemiologic , Data Interpretation, Statistical , Diabetes Mellitus/drug therapy , Humans , Myocardial Infarction/epidemiology , Propensity Score , Research Design , Treatment Outcome
5.
Biometrics ; 69(2): 310-7, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23607645

ABSTRACT

The natural direct effect (NDE), or the effect of an exposure on an outcome if an intermediate variable was set to the level it would have been in the absence of the exposure, is often of interest to investigators. In general, the statistical parameter associated with the NDE is difficult to estimate in the non-parametric model, particularly when the intermediate variable is continuous or high dimensional. In this article, we introduce a new causal parameter called the natural direct effect among the untreated, discuss identifiability assumptions, propose a sensitivity analysis for some of the assumptions, and show that this new parameter is equivalent to the NDE in a randomized controlled trial. We also present a targeted minimum loss estimator (TMLE), a locally efficient, double robust substitution estimator for the statistical parameter associated with this causal parameter. The TMLE can be applied to problems with continuous and high dimensional intermediate variables, and can be used to estimate the NDE in a randomized controlled trial with such data. Additionally, we define and discuss the estimation of three related causal parameters: the natural direct effect among the treated, the indirect effect among the untreated and the indirect effect among the treated.


Subject(s)
Biometry/methods , Alcoholism/psychology , Alcoholism/therapy , Causality , Humans , Models, Statistical , Randomized Controlled Trials as Topic/statistics & numerical data , Research Design , Statistics, Nonparametric , Treatment Outcome
6.
Biometrics ; 68(2): 532-40, 2012 Jun.
Article in English | MEDLINE | ID: mdl-21950447

ABSTRACT

This article examines group testing procedures where units within a group (or pool) may be correlated. The expected number of tests per unit (i.e., efficiency) of hierarchical- and matrix-based procedures is derived based on a class of models of exchangeable binary random variables. The effect on efficiency of the arrangement of correlated units within pools is then examined. In general, when correlated units are arranged in the same pool, the expected number of tests per unit decreases, sometimes substantially, relative to arrangements that ignore information about correlation.


Subject(s)
Biometry/methods , Diagnostic Tests, Routine/statistics & numerical data , AIDS Vaccines/immunology , Algorithms , Epitope Mapping/statistics & numerical data , HIV Antigens/immunology , Humans , Mass Screening/statistics & numerical data , Models, Statistical , Monte Carlo Method , T-Lymphocytes/immunology
SELECTION OF CITATIONS
SEARCH DETAIL
...