Search | VHL Search Portal

1.

Child wasting and concurrent stunting in low- and middle-income countries.

Mertens, Andrew; Benjamin-Chung, Jade; Colford, John M; Hubbard, Alan E; van der Laan, Mark J; Coyle, Jeremy; Sofrygin, Oleg; Cai, Wilson; Jilek, Wendy; Rosete, Sonali; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jung, Esther; Chung, Esther O; Malenica, Ivana; Hejazi, Nima; Li, Haodong; Hafen, Ryan; Subramoney, Vishak; Häggström, Jonas; Norman, Thea; Christian, Parul; Brown, Kenneth H; Arnold, Benjamin F.

Nature ; 621(7979): 558-567, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37704720

ABSTRACT

Sustainable Development Goal 2.2-to end malnutrition by 2030-includes the elimination of child wasting, defined as a weight-for-length z-score that is more than two standard deviations below the median of the World Health Organization standards for child growth1. Prevailing methods to measure wasting rely on cross-sectional surveys that cannot measure onset, recovery and persistence-key features that inform preventive interventions and estimates of disease burden. Here we analyse 21 longitudinal cohorts and show that wasting is a highly dynamic process of onset and recovery, with incidence peaking between birth and 3 months. Many more children experience an episode of wasting at some point during their first 24 months than prevalent cases at a single point in time suggest. For example, at the age of 24 months, 5.6% of children were wasted, but by the same age (24 months), 29.2% of children had experienced at least one wasting episode and 10.0% had experienced two or more episodes. Children who were wasted before the age of 6 months had a faster recovery and shorter episodes than did children who were wasted at older ages; however, early wasting increased the risk of later growth faltering, including concurrent wasting and stunting (low length-for-age z-score), and thus increased the risk of mortality. In diverse populations with high seasonal rainfall, the population average weight-for-length z-score varied substantially (more than 0.5 z in some cohorts), with the lowest mean z-scores occurring during the rainiest months; this indicates that seasonally targeted interventions could be considered. Our results show the importance of establishing interventions to prevent wasting from birth to the age of 6 months, probably through improved maternal nutrition, to complement current programmes that focus on children aged 6-59 months.

Subject(s)

Cachexia , Developing Countries , Growth Disorders , Malnutrition , Child, Preschool , Humans , Infant , Infant, Newborn , Cachexia/epidemiology , Cachexia/mortality , Cachexia/prevention & control , Cross-Sectional Studies , Growth Disorders/epidemiology , Growth Disorders/mortality , Growth Disorders/prevention & control , Incidence , Longitudinal Studies , Malnutrition/epidemiology , Malnutrition/mortality , Malnutrition/prevention & control , Rain , Seasons

2.

Early-childhood linear growth faltering in low- and middle-income countries.

Benjamin-Chung, Jade; Mertens, Andrew; Colford, John M; Hubbard, Alan E; van der Laan, Mark J; Coyle, Jeremy; Sofrygin, Oleg; Cai, Wilson; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jilek, Wendy; Jung, Esther; Chung, Esther O; Rosete, Sonali; Hejazi, Nima; Malenica, Ivana; Li, Haodong; Hafen, Ryan; Subramoney, Vishak; Häggström, Jonas; Norman, Thea; Brown, Kenneth H; Christian, Parul; Arnold, Benjamin F.

Nature ; 621(7979): 550-557, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37704719

ABSTRACT

Globally, 149 million children under 5 years of age are estimated to be stunted (length more than 2 standard deviations below international growth standards)1,2. Stunting, a form of linear growth faltering, increases the risk of illness, impaired cognitive development and mortality. Global stunting estimates rely on cross-sectional surveys, which cannot provide direct information about the timing of onset or persistence of growth faltering-a key consideration for defining critical windows to deliver preventive interventions. Here we completed a pooled analysis of longitudinal studies in low- and middle-income countries (n = 32 cohorts, 52,640 children, ages 0-24 months), allowing us to identify the typical age of onset of linear growth faltering and to investigate recurrent faltering in early life. The highest incidence of stunting onset occurred from birth to the age of 3 months, with substantially higher stunting at birth in South Asia. From 0 to 15 months, stunting reversal was rare; children who reversed their stunting status frequently relapsed, and relapse rates were substantially higher among children born stunted. Early onset and low reversal rates suggest that improving children's linear growth will require life course interventions for women of childbearing age and a greater emphasis on interventions for children under 6 months of age.

Subject(s)

Developing Countries , Growth Disorders , Adult , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Asia, Southern/epidemiology , Cognition , Cross-Sectional Studies , Developing Countries/statistics & numerical data , Developmental Disabilities/epidemiology , Developmental Disabilities/mortality , Developmental Disabilities/prevention & control , Growth Disorders/epidemiology , Growth Disorders/mortality , Growth Disorders/prevention & control , Longitudinal Studies , Mothers

3.

Causes and consequences of child growth faltering in low-resource settings.

Mertens, Andrew; Benjamin-Chung, Jade; Colford, John M; Coyle, Jeremy; van der Laan, Mark J; Hubbard, Alan E; Rosete, Sonali; Malenica, Ivana; Hejazi, Nima; Sofrygin, Oleg; Cai, Wilson; Li, Haodong; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jung, Esther; Chung, Esther O; Jilek, Wendy; Subramoney, Vishak; Hafen, Ryan; Häggström, Jonas; Norman, Thea; Brown, Kenneth H; Christian, Parul; Arnold, Benjamin F.

Nature ; 621(7979): 568-576, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37704722

ABSTRACT

Growth faltering in children (low length for age or low weight for length) during the first 1,000 days of life (from conception to 2 years of age) influences short-term and long-term health and survival1,2. Interventions such as nutritional supplementation during pregnancy and the postnatal period could help prevent growth faltering, but programmatic action has been insufficient to eliminate the high burden of stunting and wasting in low- and middle-income countries. Identification of age windows and population subgroups on which to focus will benefit future preventive efforts. Here we use a population intervention effects analysis of 33 longitudinal cohorts (83,671 children, 662,763 measurements) and 30 separate exposures to show that improving maternal anthropometry and child condition at birth accounted for population increases in length-for-age z-scores of up to 0.40 and weight-for-length z-scores of up to 0.15 by 24 months of age. Boys had consistently higher risk of all forms of growth faltering than girls. Early postnatal growth faltering predisposed children to subsequent and persistent growth faltering. Children with multiple growth deficits exhibited higher mortality rates from birth to 2 years of age than children without growth deficits (hazard ratios 1.9 to 8.7). The importance of prenatal causes and severe consequences for children who experienced early growth faltering support a focus on pre-conception and pregnancy as a key opportunity for new preventive interventions.

Subject(s)

Cachexia , Developing Countries , Growth Disorders , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Pregnancy , Cachexia/economics , Cachexia/epidemiology , Cachexia/etiology , Cachexia/prevention & control , Cohort Studies , Developing Countries/economics , Developing Countries/statistics & numerical data , Dietary Supplements , Growth Disorders/epidemiology , Growth Disorders/prevention & control , Longitudinal Studies , Mothers , Sex Factors , Malnutrition/economics , Malnutrition/epidemiology , Malnutrition/etiology , Malnutrition/prevention & control , Anthropometry

4.

Nonparametric causal mediation analysis for stochastic interventional (in)direct effects.

Hejazi, Nima S; Rudolph, Kara E; Van Der Laan, Mark J; Díaz, Iván.

Biostatistics ; 24(3): 686-707, 2023 Jul 14.

Article in English | MEDLINE | ID: mdl-35102366

ABSTRACT

Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary exposures and static interventions and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by exposure. We present a theoretical study of an (in)direct effect decomposition of the population intervention effect, defined by stochastic interventions jointly applied to the exposure and mediators. In contrast to existing proposals, our causal effects can be evaluated regardless of whether an exposure is categorical or continuous and remain well-defined even in the presence of intermediate confounders affected by exposure. Our (in)direct effects are identifiable without a restrictive assumption on cross-world counterfactual independencies, allowing for substantive conclusions drawn from them to be validated in randomized controlled trials. Beyond the novel effects introduced, we provide a careful study of nonparametric efficiency theory relevant for the construction of flexible, multiply robust estimators of our (in)direct effects, while avoiding undue restrictions induced by assuming parametric models of nuisance parameter functionals. To complement our nonparametric estimation strategy, we introduce inferential techniques for constructing confidence intervals and hypothesis tests, and discuss open-source software, the $\texttt{medshift}$$\texttt{R}$ package, implementing the proposed methodology. Application of our (in)direct effects and their nonparametric estimators is illustrated using data from a comparative effectiveness trial examining the direct and indirect effects of pharmacological therapeutics on relapse to opioid use disorder.

Subject(s)

Mediation Analysis , Models, Statistical , Humans , Models, Theoretical , Causality

5.

A flexible approach for predictive biomarker discovery.

Boileau, Philippe; Qi, Nina Ting; van der Laan, Mark J; Dudoit, Sandrine; Leng, Ning.

Biostatistics ; 24(4): 1085-1105, 2023 10 18.

Article in English | MEDLINE | ID: mdl-35861622

ABSTRACT

An endeavor central to precision medicine is predictive biomarker discovery; they define patient subpopulations which stand to benefit most, or least, from a given treatment. The identification of these biomarkers is often the byproduct of the related but fundamentally different task of treatment rule estimation. Using treatment rule estimation methods to identify predictive biomarkers in clinical trials where the number of covariates exceeds the number of participants often results in high false discovery rates. The higher than expected number of false positives translates to wasted resources when conducting follow-up experiments for drug target identification and diagnostic assay development. Patient outcomes are in turn negatively affected. We propose a variable importance parameter for directly assessing the importance of potentially predictive biomarkers and develop a flexible nonparametric inference procedure for this estimand. We prove that our estimator is double robust and asymptotically linear under loose conditions in the data-generating process, permitting valid inference about the importance metric. The statistical guarantees of the method are verified in a thorough simulation study representative of randomized control trials with moderate and high-dimensional covariate vectors. Our procedure is then used to discover predictive biomarkers from among the tumor gene expression data of metastatic renal cell carcinoma patients enrolled in recently completed clinical trials. We find that our approach more readily discerns predictive from nonpredictive biomarkers than procedures whose primary purpose is treatment rule estimation. An open-source software implementation of the methodology, the uniCATE R package, is briefly introduced.

Subject(s)

Biomedical Research , Carcinoma, Renal Cell , Kidney Neoplasms , Humans , Carcinoma, Renal Cell/diagnosis , Carcinoma, Renal Cell/genetics , Kidney Neoplasms/diagnosis , Kidney Neoplasms/genetics , Biomarkers , Computer Simulation

6.

Adaptive sequential surveillance with network and temporal dependence.

Malenica, Ivana; Coyle, Jeremy R; van der Laan, Mark J; Petersen, Maya L.

Biometrics ; 80(1)2024 Jan 29.

Article in English | MEDLINE | ID: mdl-38281772

ABSTRACT

Strategic test allocation is important for control of both emerging and existing pandemics (eg, COVID-19, HIV). It supports effective epidemic control by (1) reducing transmission via identifying cases and (2) tracking outbreak dynamics to inform targeted interventions. However, infectious disease surveillance presents unique statistical challenges. For instance, the true outcome of interest (positive infection status) is often a latent variable. In addition, presence of both network and temporal dependence reduces data to a single observation. In this work, we study an adaptive sequential design, which allows for unspecified dependence among individuals and across time. Our causal parameter is the mean latent outcome we would have obtained, if, starting at time t given the observed past, we had carried out a stochastic intervention that maximizes the outcome under a resource constraint. The key strength of the method is that we do not have to model network and time dependence: a short-term performance Online Super Learner is used to select among dependence models and randomization schemes. The proposed strategy learns the optimal choice of testing over time while adapting to the current state of the outbreak and learning across samples, through time, or both. We demonstrate the superior performance of the proposed strategy in an agent-based simulation modeling a residential university environment during the COVID-19 pandemic.

Subject(s)

COVID-19 , Communicable Diseases , Humans , Pandemics/prevention & control , COVID-19/epidemiology , Computer Simulation , Disease Outbreaks

7.

Author Correction: Child wasting and concurrent stunting in low- and middle-income countries.

Mertens, Andrew; Benjamin-Chung, Jade; Colford, John M; Hubbard, Alan E; van der Laan, Mark J; Coyle, Jeremy; Sofrygin, Oleg; Cai, Wilson; Jilek, Wendy; Rosete, Sonali; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jung, Esther; Chung, Esther O; Malenica, Ivana; Hejazi, Nima; Li, Haodong; Hafen, Ryan; Subramoney, Vishak; Häggström, Jonas; Norman, Thea; Christian, Parul; Brown, Kenneth H; Arnold, Benjamin F.

Nature ; 623(7985): E1, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37833391

8.

Author Correction: Early-childhood linear growth faltering in low- and middle-income countries.

Benjamin-Chung, Jade; Mertens, Andrew; Colford, John M; Hubbard, Alan E; van der Laan, Mark J; Coyle, Jeremy; Sofrygin, Oleg; Cai, Wilson; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jilek, Wendy; Jung, Esther; Chung, Esther O; Rosete, Sonali; Hejazi, Nima; Malenica, Ivana; Li, Haodong; Hafen, Ryan; Subramoney, Vishak; Häggström, Jonas; Norman, Thea; Brown, Kenneth H; Christian, Parul; Arnold, Benjamin F.

Nature ; 623(7985): E2, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37833392

9.

Estimation of time-specific intervention effects on continuously distributed time-to-event outcomes by targeted maximum likelihood estimation.

Rytgaard, Helene C W; Eriksson, Frank; van der Laan, Mark J.

Biometrics ; 79(4): 3038-3049, 2023 12.

Article in English | MEDLINE | ID: mdl-36988158

ABSTRACT

This work considers targeted maximum likelihood estimation (TMLE) of treatment effects on absolute risk and survival probabilities in classical time-to-event settings characterized by right-censoring and competing risks. TMLE is a general methodology combining flexible ensemble learning and semiparametric efficiency theory in a two-step procedure for substitution estimation of causal parameters. We specialize and extend the continuous-time TMLE methods for competing risks settings, proposing a targeting algorithm that iteratively updates cause-specific hazards to solve the efficient influence curve equation for the target parameter. As part of the work, we further detail and implement the recently proposed highly adaptive lasso estimator for continuous-time conditional hazards with L1 -penalized Poisson regression. The resulting estimation procedure benefits from relying solely on very mild nonparametric restrictions on the statistical model, thus providing a novel tool for machine-learning-based semiparametric causal inference for continuous-time time-to-event data. We apply the methods to a publicly available dataset on follicular cell lymphoma where subjects are followed over time until disease relapse or death without relapse. The data display important time-varying effects that can be captured by the highly adaptive lasso. In our simulations that are designed to imitate the data, we compare our methods to a similar approach based on random survival forests and to the discrete-time TMLE.

Subject(s)

Algorithms , Models, Statistical , Humans , Likelihood Functions , Machine Learning , Recurrence

10.

Nonparametric inverse-probability-weighted estimators based on the highly adaptive lasso.

Ertefaie, Ashkan; Hejazi, Nima S; van der Laan, Mark J.

Biometrics ; 79(2): 1029-1041, 2023 06.

Article in English | MEDLINE | ID: mdl-35839293

ABSTRACT

Inverse-probability-weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudopopulation in which selection biases are eliminated. Despite their ease of use, these estimators require the correct specification of a model for the weighting mechanism, are known to be inefficient, and suffer from the curse of dimensionality. We propose a class of nonparametric inverse-probability-weighted estimators in which the weighting mechanism is estimated via undersmoothing of the highly adaptive lasso, a nonparametric regression function proven to converge at nearly n - 1 / 3 $ n^{-1/3}$ -rate to the true weighting mechanism. We demonstrate that our estimators are asymptotically linear with variance converging to the nonparametric efficiency bound. Unlike doubly robust estimators, our procedures require neither derivation of the efficient influence function nor specification of the conditional outcome model. Our theoretical developments have broad implications for the construction of efficient inverse-probability-weighted estimators in large statistical models and a variety of problem settings. We assess the practical performance of our estimators in simulation studies and demonstrate use of our proposed methodology with data from a large-scale epidemiologic study.

Subject(s)

Models, Statistical , Probability , Computer Simulation , Selection Bias , Causality

11.

Efficient targeted learning of heterogeneous treatment effects for multiple subgroups.

Wei, Waverly; Petersen, Maya; van der Laan, Mark J; Zheng, Zeyu; Wu, Chong; Wang, Jingshen.

Biometrics ; 79(3): 1934-1946, 2023 09.

Article in English | MEDLINE | ID: mdl-36416173

ABSTRACT

In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treatment effects via parametric modeling and can thus be susceptible to model mis-specifications. In this paper, we take a model-free semiparametric perspective and aim to efficiently evaluate the heterogeneous treatment effects of multiple subgroups simultaneously under the one-step targeted maximum-likelihood estimation (TMLE) framework. When the number of subgroups is large, we further expand this path of research by looking at a variation of the one-step TMLE that is robust to the presence of small estimated propensity scores in finite samples. From our simulations, our method demonstrates substantial finite sample improvements compared to conventional methods. In a case study, our method unveils the potential treatment effect heterogeneity of rs12916-T allele (a proxy for statin usage) in decreasing Alzheimer's disease risk.

Subject(s)

Machine Learning , Precision Medicine , Humans , Likelihood Functions , Computer Simulation , Propensity Score

12.

Personalized online ensemble machine learning with applications for dynamic data streams.

Malenica, Ivana; Phillips, Rachael V; Chambaz, Antoine; Hubbard, Alan E; Pirracchio, Romain; van der Laan, Mark J.

Stat Med ; 42(7): 1013-1044, 2023 03 30.

Article in English | MEDLINE | ID: mdl-36897184

ABSTRACT

In this work we introduce the personalized online super learner (POSL), an online personalizable ensemble machine learning algorithm for streaming data. POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized, that is, optimization with respect to subject ID, to many individuals, that is, optimization with respect to common baseline covariates. As an online algorithm, POSL learns in real time. As a super learner, POSL is grounded in statistical optimality theory and can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed/offline algorithms that are not updated during POSL's fitting procedure, pooled algorithms that learn from many individuals' time series, and individualized algorithms that learn from within a single time series. POSL's ensembling of the candidates can depend on the amount of data collected, the stationarity of the time series, and the mutual characteristics of a group of time series. Depending on the underlying data-generating process and the information available in the data, POSL is able to adapt to learning across samples, through time, or both. For a range of simulations that reflect realistic forecasting scenarios and in a medical application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for both short and long time series, and it's able to adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time series dynamically enter and exit.

Subject(s)

Algorithms , Machine Learning , Humans

13.

Defining and estimating effects in cluster randomized trials: A methods comparison.

Benitez, Alejandra; Petersen, Maya L; van der Laan, Mark J; Santos, Nicole; Butrick, Elizabeth; Walker, Dilys; Ghosh, Rakesh; Otieno, Phelgona; Waiswa, Peter; Balzer, Laura B.

Stat Med ; 42(19): 3443-3466, 2023 08 30.

Article in English | MEDLINE | ID: mdl-37308115

ABSTRACT

Across research disciplines, cluster randomized trials (CRTs) are commonly implemented to evaluate interventions delivered to groups of participants, such as communities and clinics. Despite advances in the design and analysis of CRTs, several challenges remain. First, there are many possible ways to specify the causal effect of interest (eg, at the individual-level or at the cluster-level). Second, the theoretical and practical performance of common methods for CRT analysis remain poorly understood. Here, we present a general framework to formally define an array of causal effects in terms of summary measures of counterfactual outcomes. Next, we provide a comprehensive overview of CRT estimators, including the t-test, generalized estimating equations (GEE), augmented-GEE, and targeted maximum likelihood estimation (TMLE). Using finite sample simulations, we illustrate the practical performance of these estimators for different causal effects and when, as commonly occurs, there are limited numbers of clusters of different sizes. Finally, our application to data from the Preterm Birth Initiative (PTBi) study demonstrates the real-world impact of varying cluster sizes and targeting effects at the cluster-level or at the individual-level. Specifically, the relative effect of the PTBi intervention was 0.81 at the cluster-level, corresponding to a 19% reduction in outcome incidence, and was 0.66 at the individual-level, corresponding to a 34% reduction in outcome risk. Given its flexibility to estimate a variety of user-specified effects and ability to adaptively adjust for covariates for precision gains while maintaining Type-I error control, we conclude TMLE is a promising tool for CRT analysis.

Subject(s)

Premature Birth , Infant, Newborn , Female , Humans , Computer Simulation , Randomized Controlled Trials as Topic , Sample Size , Causality , Cluster Analysis

14.

Expert-augmented machine learning.

Gennatas, Efstathios D; Friedman, Jerome H; Ungar, Lyle H; Pirracchio, Romain; Eaton, Eric; Reichmann, Lara G; Interian, Yannet; Luna, José Marcio; Simone, Charles B; Auerbach, Andrew; Delgado, Elier; van der Laan, Mark J; Solberg, Timothy D; Valdes, Gilmer.

Proc Natl Acad Sci U S A ; 117(9): 4571-4577, 2020 03 03.

Article in English | MEDLINE | ID: mdl-32071251

ABSTRACT

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications.

Subject(s)

Expert Systems , Machine Learning/standards , Medical Informatics/methods , Data Management/methods , Database Management Systems , Medical Informatics/standards

15.

Data-Adaptive Selection of the Propensity Score Truncation Level for Inverse-Probability-Weighted and Targeted Maximum Likelihood Estimators of Marginal Point Treatment Effects.

Gruber, Susan; Phillips, Rachael V; Lee, Hana; van der Laan, Mark J.

Am J Epidemiol ; 191(9): 1640-1651, 2022 08 22.

Article in English | MEDLINE | ID: mdl-35512316

ABSTRACT

Inverse probability weighting (IPW) and targeted maximum likelihood estimation (TMLE) are methodologies that can adjust for confounding and selection bias and are often used for causal inference. Both estimators rely on the positivity assumption that within strata of confounders there is a positive probability of receiving treatment at all levels under consideration. Practical applications of IPW require finite inverse probability (IP) weights. TMLE requires that propensity scores (PS) be bounded away from 0 and 1. Although truncation can improve variance and finite sample bias, this artificial distortion of the IP weights and PS distribution introduces asymptotic bias. As sample size grows, truncation-induced bias eventually swamps variance, rendering nominal confidence interval coverage and hypothesis tests invalid. We present a simple truncation strategy based on the sample size, n, that sets the upper bound on IP weights at $\sqrt{\textit{n}}$ ln n/5. For TMLE, the lower bound on the PS should be set to 5/($\sqrt{\textit{n}}$ ln n/5). Our strategy was designed to optimize the mean squared error of the parameter estimate. It naturally extends to data structures with missing outcomes. Simulation studies and a data analysis demonstrate our strategy's ability to minimize both bias and mean squared error in comparison with other common strategies, including the popular but flawed quantile-based heuristic.

Subject(s)

Propensity Score , Bias , Causality , Computer Simulation , Humans , Likelihood Functions

16.

Evaluating the robustness of targeted maximum likelihood estimators via realistic simulations in nutrition intervention trials.

Li, Haodong; Rosete, Sonali; Coyle, Jeremy; Phillips, Rachael V; Hejazi, Nima S; Malenica, Ivana; Arnold, Benjamin F; Benjamin-Chung, Jade; Mertens, Andrew; Colford, John M; van der Laan, Mark J; Hubbard, Alan E.

Stat Med ; 41(12): 2132-2165, 2022 05 30.

Article in English | MEDLINE | ID: mdl-35172378

ABSTRACT

Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that can increase robustness, by adding a layer of cross-validation (cross-validated targeted maximum likelihood estimation and double machine learning, as applied to substitution and estimating equation approaches, respectively). While these methods have been evaluated individually on simulated and experimental data sets, a comprehensive analysis of their performance across real data based simulations have yet to be conducted. In this work, we benchmark multiple widely used methods for estimation of the average treatment effect using ten different nutrition intervention studies data. A nonparametric regression method, undersmoothed highly adaptive lasso, is used to generate the simulated distribution which preserves important features from the observed data and reproduces a set of true target parameters. For each simulated data, we apply the methods above to estimate the average treatment effects as well as their standard errors and resulting confidence intervals. Based on the analytic results, a general recommendation is put forth for use of the cross-validated variants of both substitution and estimating equation estimators. We conclude that the additional layer of cross-validation helps in avoiding unintentional over-fitting of nuisance parameter functionals and leads to more robust inferences.

Subject(s)

Machine Learning , Research Design , Causality , Computer Simulation , Humans , Likelihood Functions , Models, Statistical , Regression Analysis

17.

Targeted maximum likelihood estimation for causal inference in survival and competing risks analysis.

Rytgaard, Helene C W; van der Laan, Mark J.

Lifetime Data Anal ; 2022 Nov 07.

Article in English | MEDLINE | ID: mdl-36336732

ABSTRACT

Targeted maximum likelihood estimation (TMLE) provides a general methodology for estimation of causal parameters in presence of high-dimensional nuisance parameters. Generally, TMLE consists of a two-step procedure that combines data-adaptive nuisance parameter estimation with semiparametric efficiency and rigorous statistical inference obtained via a targeted update step. In this paper, we demonstrate the practical applicability of TMLE based causal inference in survival and competing risks settings where event times are not confined to take place on a discrete and finite grid. We focus on estimation of causal effects of time-fixed treatment decisions on survival and absolute risk probabilities, considering different univariate and multidimensional parameters. Besides providing a general guidance to using TMLE for survival and competing risks analysis, we further describe how the previous work can be extended with the use of loss-based cross-validated estimation, also known as super learning, of the conditional hazards. We illustrate the usage of the considered methods using publicly available data from a trial on adjuvant chemotherapy for colon cancer. R software code to implement all considered algorithms and to reproduce all analyses is available in an accompanying online appendix on Github.

18.

Transporting stochastic direct and indirect effects to new populations.

Rudolph, Kara E; Levy, Jonathan; van der Laan, Mark J.

Biometrics ; 77(1): 197-211, 2021 03.

Article in English | MEDLINE | ID: mdl-32277465

ABSTRACT

Transported mediation effects may contribute to understanding how interventions work differently when applied to new populations. However, we are not aware of any estimators for such effects. Thus, we propose two doubly robust, efficient estimators of transported stochastic (also called randomized interventional) direct and indirect effects. We demonstrate their finite sample properties in a simulation study. We then apply the preferred substitution estimator to longitudinal data from the Moving to Opportunity Study, a large-scale housing voucher experiment, to transport stochastic indirect effect estimates of voucher receipt in childhood on subsequent risk of mental health or substance use disorder mediated through parental employment across sites, thereby gaining understanding of drivers of the site differences.

Subject(s)

Substance-Related Disorders , Computer Simulation , Humans , Mental Health

19.

Efficient nonparametric inference on the effects of stochastic interventions under two-phase sampling, with applications to vaccine efficacy trials.

Hejazi, Nima S; van der Laan, Mark J; Janes, Holly E; Gilbert, Peter B; Benkeser, David C.

Biometrics ; 77(4): 1241-1253, 2021 12.

Article in English | MEDLINE | ID: mdl-32949147

ABSTRACT

The advent and subsequent widespread availability of preventive vaccines has altered the course of public health over the past century. Despite this success, effective vaccines to prevent many high-burden diseases, including human immunodeficiency virus (HIV), have been slow to develop. Vaccine development can be aided by the identification of immune response markers that serve as effective surrogates for clinically significant infection or disease endpoints. However, measuring immune response marker activity is often costly, which has motivated the usage of two-phase sampling for immune response evaluation in clinical trials of preventive vaccines. In such trials, the measurement of immunological markers is performed on a subset of trial participants, where enrollment in this second phase is potentially contingent on the observed study outcome and other participant-level information. We propose nonparametric methodology for efficiently estimating a counterfactual parameter that quantifies the impact of a given immune response marker on the subsequent probability of infection. Along the way, we fill in theoretical gaps pertaining to the asymptotic behavior of nonparametric efficient estimators in the context of two-phase sampling, including a multiple robustness property enjoyed by our estimators. Techniques for constructing confidence intervals and hypothesis tests are presented, and an open source software implementation of the methodology, the txshift R package, is introduced. We illustrate the proposed techniques using data from a recent preventive HIV vaccine efficacy trial.

Subject(s)

AIDS Vaccines , HIV Infections , Clinical Trials as Topic , HIV Infections/prevention & control , Humans , Probability , Vaccine Efficacy

20.

Exploiting nonsystematic covariate monitoring to broaden the scope of evidence about the causal effects of adaptive treatment strategies.

Kreif, Noémi; Sofrygin, Oleg; Schmittdiel, Julie A; Adams, Alyce S; Grant, Richard W; Zhu, Zheng; van der Laan, Mark J; Neugebauer, Romain.

Biometrics ; 77(1): 329-342, 2021 03.

Article in English | MEDLINE | ID: mdl-32297311

ABSTRACT

In studies based on electronic health records (EHR), the frequency of covariate monitoring can vary by covariate type, across patients, and over time, which can limit the generalizability of inferences about the effects of adaptive treatment strategies. In addition, monitoring is a health intervention in itself with costs and benefits, and stakeholders may be interested in the effect of monitoring when adopting adaptive treatment strategies. This paper demonstrates how to exploit nonsystematic covariate monitoring in EHR-based studies to both improve the generalizability of causal inferences and to evaluate the health impact of monitoring when evaluating adaptive treatment strategies. Using a real world, EHR-based, comparative effectiveness research (CER) study of patients with type II diabetes mellitus, we illustrate how the evaluation of joint dynamic treatment and static monitoring interventions can improve CER evidence and describe two alternate estimation approaches based on inverse probability weighting (IPW). First, we demonstrate the poor performance of the standard estimator of the effects of joint treatment-monitoring interventions, due to a large decrease in data support and concerns over finite-sample bias from near-violations of the positivity assumption (PA) for the monitoring process. Second, we detail an alternate IPW estimator using a no direct effect assumption. We demonstrate that this estimator can improve efficiency but at the potential cost of increase in bias from violations of the PA for the treatment process.

Subject(s)

Diabetes Mellitus, Type 2 , Bias , Causality , Diabetes Mellitus, Type 2/drug therapy , Electronic Health Records , Humans , Probability

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL