Search | VHL Regional Portal

1.

Targeted Learning with an Undersmoothed Lasso Propensity Score Model for Large-Scale Covariate Adjustment in Healthcare Database Studies.

Wyss, Richard; van der Laan, Mark; Gruber, Susan; Shi, Xu; Lee, Hana; Dutcher, Sarah K; Nelson, Jennifer C; Toh, Sengwee; Russo, Massimiliano; Wang, Shirley V; Desai, Rishi J; Lin, Kueiyu Joshua.

Am J Epidemiol ; 2024 Mar 21.

Article in English | MEDLINE | ID: mdl-38517025

ABSTRACT

Lasso regression is widely used for large-scale propensity score (PS) estimation in healthcare database studies. In these settings, previous work has shown that undersmoothing (overfitting) Lasso PS models can improve confounding control, but it can also cause problems of non-overlap in covariate distributions. It remains unclear how to select the degree of undersmoothing when fitting large-scale Lasso PS models to improve confounding control while avoiding issues that can result from reduced covariate overlap. Here, we used simulations to evaluate the performance of using collaborative-controlled targeted learning to data-adaptively select the degree of undersmoothing when fitting large-scale PS models within both singly and doubly robust frameworks to reduce bias in causal estimators. Simulations showed that collaborative learning can data-adaptively select the degree of undersmoothing to reduce bias in estimated treatment effects. Results further showed that when fitting undersmoothed Lasso PS-models, the use of cross-fitting was important for avoiding non-overlap in covariate distributions and reducing bias in causal estimates.

2.

Adaptive sequential surveillance with network and temporal dependence.

Malenica, Ivana; Coyle, Jeremy R; van der Laan, Mark J; Petersen, Maya L.

Biometrics ; 80(1)2024 Jan 29.

Article in English | MEDLINE | ID: mdl-38281772

ABSTRACT

Strategic test allocation is important for control of both emerging and existing pandemics (eg, COVID-19, HIV). It supports effective epidemic control by (1) reducing transmission via identifying cases and (2) tracking outbreak dynamics to inform targeted interventions. However, infectious disease surveillance presents unique statistical challenges. For instance, the true outcome of interest (positive infection status) is often a latent variable. In addition, presence of both network and temporal dependence reduces data to a single observation. In this work, we study an adaptive sequential design, which allows for unspecified dependence among individuals and across time. Our causal parameter is the mean latent outcome we would have obtained, if, starting at time t given the observed past, we had carried out a stochastic intervention that maximizes the outcome under a resource constraint. The key strength of the method is that we do not have to model network and time dependence: a short-term performance Online Super Learner is used to select among dependence models and randomization schemes. The proposed strategy learns the optimal choice of testing over time while adapting to the current state of the outbreak and learning across samples, through time, or both. We demonstrate the superior performance of the proposed strategy in an agent-based simulation modeling a residential university environment during the COVID-19 pandemic.

Subject(s)

COVID-19 , Communicable Diseases , Humans , Pandemics/prevention & control , COVID-19/epidemiology , Computer Simulation , Disease Outbreaks

3.

Case study of semaglutide and cardiovascular outcomes: An application of the Causal Roadmap to a hybrid design for augmenting an RCT control arm with real-world data.

Dang, Lauren E; Fong, Edwin; Tarp, Jens Magelund; Clemmensen, Kim Katrine Bjerring; Ravn, Henrik; Kvist, Kajsa; Buse, John B; van der Laan, Mark; Petersen, Maya.

J Clin Transl Sci ; 7(1): e231, 2023.

Article in English | MEDLINE | ID: mdl-38028337

ABSTRACT

Introduction: Increasing interest in real-world evidence has fueled the development of study designs incorporating real-world data (RWD). Using the Causal Roadmap, we specify three designs to evaluate the difference in risk of major adverse cardiovascular events (MACE) with oral semaglutide versus standard-of-care: (1) the actual sequence of non-inferiority and superiority randomized controlled trials (RCTs), (2) a single RCT, and (3) a hybrid randomized-external data study. Methods: The hybrid design considers integration of the PIONEER 6 RCT with RWD controls using the experiment-selector cross-validated targeted maximum likelihood estimator. We evaluate 95% confidence interval coverage, power, and average patient time during which participants would be precluded from receiving a glucagon-like peptide-1 receptor agonist (GLP1-RA) for each design using simulations. Finally, we estimate the effect of oral semaglutide on MACE for the hybrid PIONEER 6-RWD analysis. Results: In simulations, Designs 1 and 2 performed similarly. The tradeoff between decreased coverage and patient time without the possibility of a GLP1-RA for Designs 1 and 3 depended on the simulated bias. In real data analysis using Design 3, external controls were integrated in 84% of cross-validation folds, resulting in an estimated risk difference of -1.53%-points (95% CI -2.75%-points to -0.30%-points). Conclusions: The Causal Roadmap helps investigators to minimize potential bias in studies using RWD and to quantify tradeoffs between study designs. The simulation results help to interpret the level of evidence provided by the real data analysis in support of the superiority of oral semaglutide versus standard-of-care for cardiovascular risk reduction.

4.

Author Correction: Child wasting and concurrent stunting in low- and middle-income countries.

Mertens, Andrew; Benjamin-Chung, Jade; Colford, John M; Hubbard, Alan E; van der Laan, Mark J; Coyle, Jeremy; Sofrygin, Oleg; Cai, Wilson; Jilek, Wendy; Rosete, Sonali; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jung, Esther; Chung, Esther O; Malenica, Ivana; Hejazi, Nima; Li, Haodong; Hafen, Ryan; Subramoney, Vishak; Häggström, Jonas; Norman, Thea; Christian, Parul; Brown, Kenneth H; Arnold, Benjamin F.

Nature ; 623(7985): E1, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37833391

5.

Author Correction: Early-childhood linear growth faltering in low- and middle-income countries.

Benjamin-Chung, Jade; Mertens, Andrew; Colford, John M; Hubbard, Alan E; van der Laan, Mark J; Coyle, Jeremy; Sofrygin, Oleg; Cai, Wilson; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jilek, Wendy; Jung, Esther; Chung, Esther O; Rosete, Sonali; Hejazi, Nima; Malenica, Ivana; Li, Haodong; Hafen, Ryan; Subramoney, Vishak; Häggström, Jonas; Norman, Thea; Brown, Kenneth H; Christian, Parul; Arnold, Benjamin F.

Nature ; 623(7985): E2, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37833392

6.

Targeted malaria elimination interventions reduce Plasmodium falciparum infections up to 3 kilometers away.

Benjamin-Chung, Jade; Li, Haodong; Nguyen, Anna; Heitmann, Gabriella Barratt; Bennett, Adam; Ntuku, Henry; Prach, Lisa M; Tambo, Munyaradzi; Wu, Lindsey; Drakeley, Chris; Gosling, Roly; Mumbengegwi, Davis; Kleinschmidt, Immo; Smith, Jennifer L; Hubbard, Alan; van der Laan, Mark; Hsiang, Michelle S.

medRxiv ; 2023 Nov 30.

Article in English | MEDLINE | ID: mdl-37790419

ABSTRACT

Malaria elimination interventions in low-transmission settings aim to extinguish hot spots and prevent transmission to nearby areas. In malaria elimination settings, the World Health Organization recommends reactive, focal interventions targeted to the area near malaria cases shortly after they are detected. A key question is whether these interventions reduce transmission to nearby uninfected or asymptomatic individuals who did not receive interventions. Here, we measured direct effects (among intervention recipients) and spillover effects (among non-recipients) of reactive, focal interventions delivered within 500m of confirmed malaria index cases in a cluster-randomized trial in Namibia. The trial delivered malaria chemoprevention (artemether lumefantrine) and vector control (indoor residual spraying with Actellic) separately and in combination using a factorial design. We compared incidence, infection prevalence, and seroprevalence between study arms among intervention recipients (direct effects) and non-recipients (spillover effects) up to 3 km away from index cases. We calculated incremental cost-effectiveness ratios accounting for spillover effects. The combined chemoprevention and vector control intervention produced direct effects and spillover effects. In the primary analysis among non-recipients within 1 km from index cases, the combined intervention reduced malaria incidence by 43% (95% CI 20%, 59%). In secondary analyses among non-recipients 500m-3 km from interventions, the combined intervention reduced infection by 79% (6%, 95%) and seroprevalence 34% (20%, 45%). Accounting for spillover effects increased the cost-effectiveness of the combined intervention by 37%. Our findings provide the first evidence that targeting hot spots with combined chemoprevention and vector control interventions can indirectly benefit non-recipients up to 3 km away.

7.

A causal roadmap for generating high-quality real-world evidence.

Dang, Lauren E; Gruber, Susan; Lee, Hana; Dahabreh, Issa J; Stuart, Elizabeth A; Williamson, Brian D; Wyss, Richard; Díaz, Iván; Ghosh, Debashis; Kiciman, Emre; Alemayehu, Demissie; Hoffman, Katherine L; Vossen, Carla Y; Huml, Raymond A; Ravn, Henrik; Kvist, Kajsa; Pratley, Richard; Shih, Mei-Chiung; Pennello, Gene; Martin, David; Waddy, Salina P; Barr, Charles E; Akacha, Mouna; Buse, John B; van der Laan, Mark; Petersen, Maya.

J Clin Transl Sci ; 7(1): e212, 2023.

Article in English | MEDLINE | ID: mdl-37900353

ABSTRACT

Increasing emphasis on the use of real-world evidence (RWE) to support clinical policy and regulatory decision-making has led to a proliferation of guidance, advice, and frameworks from regulatory agencies, academia, professional societies, and industry. A broad spectrum of studies use real-world data (RWD) to produce RWE, ranging from randomized trials with outcomes assessed using RWD to fully observational studies. Yet, many proposals for generating RWE lack sufficient detail, and many analyses of RWD suffer from implausible assumptions, other methodological flaws, or inappropriate interpretations. The Causal Roadmap is an explicit, itemized, iterative process that guides investigators to prespecify study design and analysis plans; it addresses a wide range of guidance within a single framework. By supporting the transparent evaluation of causal assumptions and facilitating objective comparisons of design and analysis choices based on prespecified criteria, the Roadmap can help investigators to evaluate the quality of evidence that a given study is likely to produce, specify a study to generate high-quality RWE, and communicate effectively with regulatory agencies and other stakeholders. This paper aims to disseminate and extend the Causal Roadmap framework for use by clinical and translational researchers; three companion papers demonstrate applications of the Causal Roadmap for specific use cases.

8.

Child wasting and concurrent stunting in low- and middle-income countries.

Mertens, Andrew; Benjamin-Chung, Jade; Colford, John M; Hubbard, Alan E; van der Laan, Mark J; Coyle, Jeremy; Sofrygin, Oleg; Cai, Wilson; Jilek, Wendy; Rosete, Sonali; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jung, Esther; Chung, Esther O; Malenica, Ivana; Hejazi, Nima; Li, Haodong; Hafen, Ryan; Subramoney, Vishak; Häggström, Jonas; Norman, Thea; Christian, Parul; Brown, Kenneth H; Arnold, Benjamin F.

Nature ; 621(7979): 558-567, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37704720

ABSTRACT

Sustainable Development Goal 2.2-to end malnutrition by 2030-includes the elimination of child wasting, defined as a weight-for-length z-score that is more than two standard deviations below the median of the World Health Organization standards for child growth1. Prevailing methods to measure wasting rely on cross-sectional surveys that cannot measure onset, recovery and persistence-key features that inform preventive interventions and estimates of disease burden. Here we analyse 21 longitudinal cohorts and show that wasting is a highly dynamic process of onset and recovery, with incidence peaking between birth and 3 months. Many more children experience an episode of wasting at some point during their first 24 months than prevalent cases at a single point in time suggest. For example, at the age of 24 months, 5.6% of children were wasted, but by the same age (24 months), 29.2% of children had experienced at least one wasting episode and 10.0% had experienced two or more episodes. Children who were wasted before the age of 6 months had a faster recovery and shorter episodes than did children who were wasted at older ages; however, early wasting increased the risk of later growth faltering, including concurrent wasting and stunting (low length-for-age z-score), and thus increased the risk of mortality. In diverse populations with high seasonal rainfall, the population average weight-for-length z-score varied substantially (more than 0.5 z in some cohorts), with the lowest mean z-scores occurring during the rainiest months; this indicates that seasonally targeted interventions could be considered. Our results show the importance of establishing interventions to prevent wasting from birth to the age of 6 months, probably through improved maternal nutrition, to complement current programmes that focus on children aged 6-59 months.

Subject(s)

Cachexia , Developing Countries , Growth Disorders , Malnutrition , Child, Preschool , Humans , Infant , Infant, Newborn , Cachexia/epidemiology , Cachexia/mortality , Cachexia/prevention & control , Cross-Sectional Studies , Growth Disorders/epidemiology , Growth Disorders/mortality , Growth Disorders/prevention & control , Incidence , Longitudinal Studies , Malnutrition/epidemiology , Malnutrition/mortality , Malnutrition/prevention & control , Rain , Seasons

9.

Early-childhood linear growth faltering in low- and middle-income countries.

Benjamin-Chung, Jade; Mertens, Andrew; Colford, John M; Hubbard, Alan E; van der Laan, Mark J; Coyle, Jeremy; Sofrygin, Oleg; Cai, Wilson; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jilek, Wendy; Jung, Esther; Chung, Esther O; Rosete, Sonali; Hejazi, Nima; Malenica, Ivana; Li, Haodong; Hafen, Ryan; Subramoney, Vishak; Häggström, Jonas; Norman, Thea; Brown, Kenneth H; Christian, Parul; Arnold, Benjamin F.

Nature ; 621(7979): 550-557, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37704719

ABSTRACT

Globally, 149 million children under 5 years of age are estimated to be stunted (length more than 2 standard deviations below international growth standards)1,2. Stunting, a form of linear growth faltering, increases the risk of illness, impaired cognitive development and mortality. Global stunting estimates rely on cross-sectional surveys, which cannot provide direct information about the timing of onset or persistence of growth faltering-a key consideration for defining critical windows to deliver preventive interventions. Here we completed a pooled analysis of longitudinal studies in low- and middle-income countries (n = 32 cohorts, 52,640 children, ages 0-24 months), allowing us to identify the typical age of onset of linear growth faltering and to investigate recurrent faltering in early life. The highest incidence of stunting onset occurred from birth to the age of 3 months, with substantially higher stunting at birth in South Asia. From 0 to 15 months, stunting reversal was rare; children who reversed their stunting status frequently relapsed, and relapse rates were substantially higher among children born stunted. Early onset and low reversal rates suggest that improving children's linear growth will require life course interventions for women of childbearing age and a greater emphasis on interventions for children under 6 months of age.

Subject(s)

Developing Countries , Growth Disorders , Adult , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Asia, Southern/epidemiology , Cognition , Cross-Sectional Studies , Developing Countries/statistics & numerical data , Developmental Disabilities/epidemiology , Developmental Disabilities/mortality , Developmental Disabilities/prevention & control , Growth Disorders/epidemiology , Growth Disorders/mortality , Growth Disorders/prevention & control , Longitudinal Studies , Mothers

10.

Causes and consequences of child growth faltering in low-resource settings.

Mertens, Andrew; Benjamin-Chung, Jade; Colford, John M; Coyle, Jeremy; van der Laan, Mark J; Hubbard, Alan E; Rosete, Sonali; Malenica, Ivana; Hejazi, Nima; Sofrygin, Oleg; Cai, Wilson; Li, Haodong; Nguyen, Anna; Pokpongkiat, Nolan N; Djajadi, Stephanie; Seth, Anmol; Jung, Esther; Chung, Esther O; Jilek, Wendy; Subramoney, Vishak; Hafen, Ryan; Häggström, Jonas; Norman, Thea; Brown, Kenneth H; Christian, Parul; Arnold, Benjamin F.

Nature ; 621(7979): 568-576, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37704722

ABSTRACT

Growth faltering in children (low length for age or low weight for length) during the first 1,000 days of life (from conception to 2 years of age) influences short-term and long-term health and survival1,2. Interventions such as nutritional supplementation during pregnancy and the postnatal period could help prevent growth faltering, but programmatic action has been insufficient to eliminate the high burden of stunting and wasting in low- and middle-income countries. Identification of age windows and population subgroups on which to focus will benefit future preventive efforts. Here we use a population intervention effects analysis of 33 longitudinal cohorts (83,671 children, 662,763 measurements) and 30 separate exposures to show that improving maternal anthropometry and child condition at birth accounted for population increases in length-for-age z-scores of up to 0.40 and weight-for-length z-scores of up to 0.15 by 24 months of age. Boys had consistently higher risk of all forms of growth faltering than girls. Early postnatal growth faltering predisposed children to subsequent and persistent growth faltering. Children with multiple growth deficits exhibited higher mortality rates from birth to 2 years of age than children without growth deficits (hazard ratios 1.9 to 8.7). The importance of prenatal causes and severe consequences for children who experienced early growth faltering support a focus on pre-conception and pregnancy as a key opportunity for new preventive interventions.

Subject(s)

Cachexia , Developing Countries , Growth Disorders , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Pregnancy , Cachexia/economics , Cachexia/epidemiology , Cachexia/etiology , Cachexia/prevention & control , Cohort Studies , Developing Countries/economics , Developing Countries/statistics & numerical data , Dietary Supplements , Growth Disorders/epidemiology , Growth Disorders/prevention & control , Longitudinal Studies , Mothers , Sex Factors , Malnutrition/economics , Malnutrition/epidemiology , Malnutrition/etiology , Malnutrition/prevention & control , Anthropometry

11.

Evaluating and improving real-world evidence with Targeted Learning.

Gruber, Susan; Phillips, Rachael V; Lee, Hana; Concato, John; van der Laan, Mark.

BMC Med Res Methodol ; 23(1): 178, 2023 08 02.

Article in English | MEDLINE | ID: mdl-37533017

ABSTRACT

BACKGROUND: The Targeted Learning roadmap provides a systematic guide for generating and evaluating real-world evidence (RWE). From a regulatory perspective, RWE arises from diverse sources such as randomized controlled trials that make use of real-world data, observational studies, and other study designs. This paper illustrates a principled approach to assessing the validity and interpretability of RWE. METHODS: We applied the roadmap to a published observational study of the dose-response association between ritodrine hydrochloride and pulmonary edema among women pregnant with twins in Japan. The goal was to identify barriers to causal effect estimation beyond unmeasured confounding reported by the study's authors, and to explore potential options for overcoming the barriers that robustify results. RESULTS: Following the roadmap raised issues that led us to formulate alternative causal questions that produced more reliable, interpretable RWE. The process revealed a lack of information in the available data to identify a causal dose-response curve. However, under explicit assumptions the effect of treatment with any amount of ritodrine versus none, albeit a less ambitious parameter, can be estimated from data. CONCLUSIONS: Before RWE can be used in support of clinical and regulatory decision-making, its quality and reliability must be systematically evaluated. The TL roadmap prescribes how to carry out a thorough, transparent, and realistic assessment of RWE. We recommend this approach be a routine part of any decision-making process.

Subject(s)

Research Design , Female , Humans , Reproducibility of Results , Japan , Randomized Controlled Trials as Topic

12.

CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R.

McCoy, David; Hubbard, Alan; Van der Laan, Mark.

J Open Source Softw ; 8(82)2023.

Article in English | MEDLINE | ID: mdl-37398941

ABSTRACT

Statistical causal inference of mixed exposures has been limited by reliance on parametric models and, until recently, by researchers considering only one exposure at a time, usually estimated as a beta coefficient in a generalized linear regression model (GLM). This independent assessment of exposures poorly estimates the joint impact of a collection of the same exposures in a realistic exposure setting. Marginal methods for mixture variable selection such as ridge/lasso regression are biased by linear assumptions and the interactions modeled are chosen by the user. Clustering methods such as principal component regression lose both interpretability and valid inference. Newer mixture methods such as quantile g-computation (Keil et al., 2020) are biased by linear/additive assumptions. More flexible methods such as Bayesian kernel machine regression (BKMR)(Bobb et al., 2014) are sensitive to the choice of tuning parameters, are computationally taxing and lack an interpretable and robust summary statistic of dose-response relationships. No methods currently exist which finds the best flexible model to adjust for covariates while applying a non-parametric model that targets for interactions in a mixture and delivers valid inference for a target parameter. Non-parametric methods such as decision trees are a useful tool to evaluate combined exposures by finding partitions in the joint-exposure (mixture) space that best explain the variance in an outcome. However, current methods using decision trees to assess statistical inference for interactions are biased and are prone to overfitting by using the full data to both identify nodes in the tree and make statistical inference given these nodes. Other methods have used an independent test set to derive inference which does not use the full data. The CVtreeMLE R package provides researchers in (bio)statistics, epidemiology, and environmental health sciences with access to state-of-the-art statistical methodology for evaluating the causal effects of a data-adaptively determined mixed exposure using decision trees. Our target audience are those analysts who would normally use a potentially biased GLM based model for a mixed exposure. Instead, we hope to provide users with a non-parametric statistical machine where users simply specify the exposures, covariates and outcome, CVtreeMLE then determines if a best fitting decision tree exists and delivers interpretable results.

13.

Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions.

Boileau, Philippe; Hejazi, Nima S; van der Laan, Mark J; Dudoit, Sandrine.

J Comput Graph Stat ; 32(2): 601-612, 2023.

Article in English | MEDLINE | ID: mdl-37273839

ABSTRACT

The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional regimes do not admit such a convenience. Thus, a variety of estimators have been derived to overcome the shortcomings of the canonical estimator in such settings. Yet, selecting an optimal estimator from among the plethora available remains an open challenge. Using the framework of cross-validated loss-based estimation, we develop the theoretical underpinnings of just such an estimator selection procedure. We propose a general class of loss functions for covariance matrix estimation and establish accompanying finite-sample risk bounds and conditions for the asymptotic optimality of the cross-validation selector. In numerical experiments, we demonstrate the optimality of our proposed selector in moderate sample sizes and across diverse data-generating processes. The practical benefits of our procedure are highlighted in a dimension reduction application to single-cell transcriptome sequencing data.

14.

Defining and estimating effects in cluster randomized trials: A methods comparison.

Benitez, Alejandra; Petersen, Maya L; van der Laan, Mark J; Santos, Nicole; Butrick, Elizabeth; Walker, Dilys; Ghosh, Rakesh; Otieno, Phelgona; Waiswa, Peter; Balzer, Laura B.

Stat Med ; 42(19): 3443-3466, 2023 08 30.

Article in English | MEDLINE | ID: mdl-37308115

ABSTRACT

Across research disciplines, cluster randomized trials (CRTs) are commonly implemented to evaluate interventions delivered to groups of participants, such as communities and clinics. Despite advances in the design and analysis of CRTs, several challenges remain. First, there are many possible ways to specify the causal effect of interest (eg, at the individual-level or at the cluster-level). Second, the theoretical and practical performance of common methods for CRT analysis remain poorly understood. Here, we present a general framework to formally define an array of causal effects in terms of summary measures of counterfactual outcomes. Next, we provide a comprehensive overview of CRT estimators, including the t-test, generalized estimating equations (GEE), augmented-GEE, and targeted maximum likelihood estimation (TMLE). Using finite sample simulations, we illustrate the practical performance of these estimators for different causal effects and when, as commonly occurs, there are limited numbers of clusters of different sizes. Finally, our application to data from the Preterm Birth Initiative (PTBi) study demonstrates the real-world impact of varying cluster sizes and targeting effects at the cluster-level or at the individual-level. Specifically, the relative effect of the PTBi intervention was 0.81 at the cluster-level, corresponding to a 19% reduction in outcome incidence, and was 0.66 at the individual-level, corresponding to a 34% reduction in outcome risk. Given its flexibility to estimate a variety of user-specified effects and ability to adaptively adjust for covariates for precision gains while maintaining Type-I error control, we conclude TMLE is a promising tool for CRT analysis.

Subject(s)

Premature Birth , Infant, Newborn , Female , Humans , Computer Simulation , Randomized Controlled Trials as Topic , Sample Size , Causality , Cluster Analysis

15.

Practical considerations for specifying a super learner.

Phillips, Rachael V; van der Laan, Mark J; Lee, Hana; Gruber, Susan.

Int J Epidemiol ; 52(4): 1276-1285, 2023 08 02.

Article in English | MEDLINE | ID: mdl-36905602

ABSTRACT

Common tasks encountered in epidemiology, including disease incidence estimation and causal inference, rely on predictive modelling. Constructing a predictive model can be thought of as learning a prediction function (a function that takes as input covariate data and outputs a predicted value). Many strategies for learning prediction functions from data (learners) are available, from parametric regressions to machine learning algorithms. It can be challenging to choose a learner, as it is impossible to know in advance which one is the most suitable for a particular dataset and prediction task. The super learner (SL) is an algorithm that alleviates concerns over selecting the one 'right' learner by providing the freedom to consider many, such as those recommended by collaborators, used in related research or specified by subject-matter experts. Also known as stacking, SL is an entirely prespecified and flexible approach for predictive modelling. To ensure the SL is well specified for learning the desired prediction function, the analyst does need to make a few important choices. In this educational article, we provide step-by-step guidelines for making these decisions, walking the reader through each of them and providing intuition along the way. In doing so, we aim to empower the analyst to tailor the SL specification to their prediction task, thereby ensuring their SL performs as well as possible. A flowchart provides a concise, easy-to-follow summary of key suggestions and heuristics, based on our accumulated experience and guided by SL optimality theory.

Subject(s)

Algorithms , Machine Learning , Humans

16.

Estimation of time-specific intervention effects on continuously distributed time-to-event outcomes by targeted maximum likelihood estimation.

Rytgaard, Helene C W; Eriksson, Frank; van der Laan, Mark J.

Biometrics ; 79(4): 3038-3049, 2023 12.

Article in English | MEDLINE | ID: mdl-36988158

ABSTRACT

This work considers targeted maximum likelihood estimation (TMLE) of treatment effects on absolute risk and survival probabilities in classical time-to-event settings characterized by right-censoring and competing risks. TMLE is a general methodology combining flexible ensemble learning and semiparametric efficiency theory in a two-step procedure for substitution estimation of causal parameters. We specialize and extend the continuous-time TMLE methods for competing risks settings, proposing a targeting algorithm that iteratively updates cause-specific hazards to solve the efficient influence curve equation for the target parameter. As part of the work, we further detail and implement the recently proposed highly adaptive lasso estimator for continuous-time conditional hazards with L1 -penalized Poisson regression. The resulting estimation procedure benefits from relying solely on very mild nonparametric restrictions on the statistical model, thus providing a novel tool for machine-learning-based semiparametric causal inference for continuous-time time-to-event data. We apply the methods to a publicly available dataset on follicular cell lymphoma where subjects are followed over time until disease relapse or death without relapse. The data display important time-varying effects that can be captured by the highly adaptive lasso. In our simulations that are designed to imitate the data, we compare our methods to a similar approach based on random survival forests and to the discrete-time TMLE.

Subject(s)

Algorithms , Models, Statistical , Humans , Likelihood Functions , Machine Learning , Recurrence

17.

Personalized online ensemble machine learning with applications for dynamic data streams.

Malenica, Ivana; Phillips, Rachael V; Chambaz, Antoine; Hubbard, Alan E; Pirracchio, Romain; van der Laan, Mark J.

Stat Med ; 42(7): 1013-1044, 2023 03 30.

Article in English | MEDLINE | ID: mdl-36897184

ABSTRACT

In this work we introduce the personalized online super learner (POSL), an online personalizable ensemble machine learning algorithm for streaming data. POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized, that is, optimization with respect to subject ID, to many individuals, that is, optimization with respect to common baseline covariates. As an online algorithm, POSL learns in real time. As a super learner, POSL is grounded in statistical optimality theory and can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed/offline algorithms that are not updated during POSL's fitting procedure, pooled algorithms that learn from many individuals' time series, and individualized algorithms that learn from within a single time series. POSL's ensembling of the candidates can depend on the amount of data collected, the stationarity of the time series, and the mutual characteristics of a group of time series. Depending on the underlying data-generating process and the information available in the data, POSL is able to adapt to learning across samples, through time, or both. For a range of simulations that reflect realistic forecasting scenarios and in a medical application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for both short and long time series, and it's able to adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time series dynamically enter and exit.

Subject(s)

Algorithms , Machine Learning , Humans

18.

Efficient targeted learning of heterogeneous treatment effects for multiple subgroups.

Wei, Waverly; Petersen, Maya; van der Laan, Mark J; Zheng, Zeyu; Wu, Chong; Wang, Jingshen.

Biometrics ; 79(3): 1934-1946, 2023 09.

Article in English | MEDLINE | ID: mdl-36416173

ABSTRACT

In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treatment effects via parametric modeling and can thus be susceptible to model mis-specifications. In this paper, we take a model-free semiparametric perspective and aim to efficiently evaluate the heterogeneous treatment effects of multiple subgroups simultaneously under the one-step targeted maximum-likelihood estimation (TMLE) framework. When the number of subgroups is large, we further expand this path of research by looking at a variation of the one-step TMLE that is robust to the presence of small estimated propensity scores in finite samples. From our simulations, our method demonstrates substantial finite sample improvements compared to conventional methods. In a case study, our method unveils the potential treatment effect heterogeneity of rs12916-T allele (a proxy for statin usage) in decreasing Alzheimer's disease risk.

Subject(s)

Machine Learning , Precision Medicine , Humans , Likelihood Functions , Computer Simulation , Propensity Score

19.

A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology.

Hejazi, Nima S; Boileau, Philippe; van der Laan, Mark J; Hubbard, Alan E.

Stat Methods Med Res ; 32(3): 539-554, 2023 03.

Article in English | MEDLINE | ID: mdl-36573044

ABSTRACT

The widespread availability of high-dimensional biological data has made the simultaneous screening of many biological characteristics a central problem in computational and high-dimensional biology. As the dimensionality of datasets continues to grow, so too does the complexity of identifying biomarkers linked to exposure patterns. The statistical analysis of such data often relies upon parametric modeling assumptions motivated by convenience, inviting opportunities for model misspecification. While estimation frameworks incorporating flexible, data adaptive regression strategies can mitigate this, their standard variance estimators are often unstable in high-dimensional settings, resulting in inflated Type-I error even after standard multiple testing corrections. We adapt a shrinkage approach compatible with parametric modeling strategies to semiparametric variance estimators of a family of efficient, asymptotically linear estimators of causal effects, defined by counterfactual exposure contrasts. Augmenting the inferential stability of these estimators in high-dimensional settings yields a data adaptive approach for robustly uncovering stable causal associations, even when sample sizes are limited. Our generalized variance estimator is evaluated against appropriate alternatives in numerical experiments, and an open source R/Bioconductor package, biotmle, is introduced. The proposal is demonstrated in an analysis of high-dimensional DNA methylation data from an observational study on the epigenetic effects of tobacco smoking.

Subject(s)

Biology , Research Design , Sample Size , Causality

20.

Statistics, philosophy, and health: the SMAC 2021 webconference.

Savy, Nicolas; Moodie, Erica Em; Drouet, Isabelle; Chambaz, Antoine; Falissard, Bruno; Kosorok, Michael R; Krakow, Elizabeth F; Mayo, Deborah G; Senn, Stephen; Van der Laan, Mark.

Int J Biostat ; 19(2): 261-270, 2023 11 01.

Article in English | MEDLINE | ID: mdl-36476947

ABSTRACT

SMAC 2021 was a webconference organized in June 2021. The aim of this conference was to bring together data scientists, (bio)statisticians, philosophers, and any person interested in the questions of causality and Bayesian statistics, ranging from technical to philosophical aspects. This webconference consisted of keynote speakers and contributed speakers, and closed with a round-table organized in an unusual fashion. Indeed, organisers asked world renowned scientists to prepare two videos: a short video presenting a question of interest to them and a longer one presenting their point of view on the question. The first video served as a "teaser" for the conference and the second were presented during the conference as an introduction to the round-table. These videos and this round-table generated original scientific insights and discussion worthy of being shared with the community which we do by means of this paper.

Subject(s)

Philosophy , Humans , Bayes Theorem , Causality

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL