RESUMEN
AIMS/HYPOTHESIS: A precision medicine approach in type 2 diabetes could enhance targeting specific glucose-lowering therapies to individual patients most likely to benefit. We aimed to use the recently developed Bayesian causal forest (BCF) method to develop and validate an individualised treatment selection algorithm for two major type 2 diabetes drug classes, sodium-glucose cotransporter 2 inhibitors (SGLT2i) and glucagon-like peptide-1 receptor agonists (GLP1-RA). METHODS: We designed a predictive algorithm using BCF to estimate individual-level conditional average treatment effects for 12-month glycaemic outcome (HbA1c) between SGLT2i and GLP1-RA, based on routine clinical features of 46,394 people with type 2 diabetes in primary care in England (Clinical Practice Research Datalink; 27,319 for model development, 19,075 for hold-out validation), with additional external validation in 2252 people with type 2 diabetes from Scotland (SCI-Diabetes [Tayside & Fife]). Differences in glycaemic outcome with GLP1-RA by sex seen in clinical data were replicated in clinical trial data (HARMONY programme: liraglutide [n=389] and albiglutide [n=1682]). As secondary outcomes, we evaluated the impacts of targeting therapy based on glycaemic response on weight change, tolerability and longer-term risk of new-onset microvascular complications, macrovascular complications and adverse kidney events. RESULTS: Model development identified marked heterogeneity in glycaemic response, with 4787 (17.5%) of the development cohort having a predicted HbA1c benefit >3 mmol/mol (>0.3%) with SGLT2i over GLP1-RA and 5551 (20.3%) having a predicted HbA1c benefit >3 mmol/mol with GLP1-RA over SGLT2i. Calibration was good in hold-back validation, and external validation in an independent Scottish dataset identified clear differences in glycaemic outcomes between those predicted to benefit from each therapy. Sex, with women markedly more responsive to GLP1-RA, was identified as a major treatment effect modifier in both the UK observational datasets and in clinical trial data: HARMONY-7 liraglutide (GLP1-RA): 4.4 mmol/mol (95% credible interval [95% CrI] 2.2, 6.3) (0.4% [95% CrI 0.2, 0.6]) greater response in women than men. Targeting the two therapies based on predicted glycaemic response was also associated with improvements in short-term tolerability and long-term risk of new-onset microvascular complications. CONCLUSIONS/INTERPRETATION: Precision medicine approaches can facilitate effective individualised treatment choice between SGLT2i and GLP1-RA therapies, and the use of routinely collected clinical features for treatment selection could support low-cost deployment in many countries.
Asunto(s)
Diabetes Mellitus Tipo 2 , Inhibidores del Cotransportador de Sodio-Glucosa 2 , Masculino , Humanos , Femenino , Diabetes Mellitus Tipo 2/complicaciones , Inhibidores del Cotransportador de Sodio-Glucosa 2/uso terapéutico , Inhibidores del Cotransportador de Sodio-Glucosa 2/farmacología , Hipoglucemiantes/efectos adversos , Agonistas Receptor de Péptidos Similares al Glucagón , Liraglutida/uso terapéutico , Teorema de Bayes , Glucosa , Fenotipo , Receptor del Péptido 1 Similar al GlucagónRESUMEN
Assessing heterogeneous treatment effects (HTEs) is an essential task in epidemiology. The recent integration of machine learning into causal inference has provided a new, flexible tool for evaluating complex HTEs: causal forest. In a recent paper, Jawadekar et al (Am J Epidemiol. 2023;192(7):1155-1165) introduced this innovative approach and offered practical guidelines for applied users. Building on their work, this commentary provides additional insights and guidance to promote the understanding and application of causal forest in epidemiologic research. We start with conceptual clarifications, differentiating between honesty and cross-fitting, and exploring the interpretation of estimated conditional average treatment effects. We then delve into practical considerations not addressed by Jawadekar et al, including motivations for estimating HTEs, calibration approaches, and ways to leverage causal forest output with examples from simulated data. We conclude by outlining challenges to consider for future advancements and applications of causal forest in epidemiologic research.
Asunto(s)
Causalidad , Aprendizaje Automático , Humanos , Estudios Epidemiológicos , Métodos Epidemiológicos , Modelos EstadísticosRESUMEN
An endeavor central to precision medicine is predictive biomarker discovery; they define patient subpopulations which stand to benefit most, or least, from a given treatment. The identification of these biomarkers is often the byproduct of the related but fundamentally different task of treatment rule estimation. Using treatment rule estimation methods to identify predictive biomarkers in clinical trials where the number of covariates exceeds the number of participants often results in high false discovery rates. The higher than expected number of false positives translates to wasted resources when conducting follow-up experiments for drug target identification and diagnostic assay development. Patient outcomes are in turn negatively affected. We propose a variable importance parameter for directly assessing the importance of potentially predictive biomarkers and develop a flexible nonparametric inference procedure for this estimand. We prove that our estimator is double robust and asymptotically linear under loose conditions in the data-generating process, permitting valid inference about the importance metric. The statistical guarantees of the method are verified in a thorough simulation study representative of randomized control trials with moderate and high-dimensional covariate vectors. Our procedure is then used to discover predictive biomarkers from among the tumor gene expression data of metastatic renal cell carcinoma patients enrolled in recently completed clinical trials. We find that our approach more readily discerns predictive from nonpredictive biomarkers than procedures whose primary purpose is treatment rule estimation. An open-source software implementation of the methodology, the uniCATE R package, is briefly introduced.
Asunto(s)
Investigación Biomédica , Carcinoma de Células Renales , Neoplasias Renales , Humanos , Carcinoma de Células Renales/diagnóstico , Carcinoma de Células Renales/genética , Neoplasias Renales/diagnóstico , Neoplasias Renales/genética , Biomarcadores , Simulación por ComputadorRESUMEN
Methods have been developed for transporting evidence from randomised controlled trials (RCTs) to target populations. However, these approaches allow only for differences in characteristics observed in the RCT and real-world data (overt heterogeneity). These approaches do not recognise heterogeneity of treatment effects (HTE) according to unmeasured characteristics (essential heterogeneity). We use a target trial design and apply a local instrumental variable (LIV) approach to electronic health records from the Clinical Practice Research Datalink, and examine both forms of heterogeneity in assessing the comparative effectiveness of two second-line treatments for type 2 diabetes mellitus. We first estimate individualised estimates of HTE across the entire target population defined by applying eligibility criteria from national guidelines (n = 13,240) within an overall target trial framework. We define a subpopulation who meet a published RCT's eligibility criteria ('RCT-eligible', n = 6497), and a subpopulation who do not ('RCT-ineligible', n = 6743). We compare average treatment effects for pre-specified subgroups within the RCT-eligible subpopulation, the RCT-ineligible subpopulation, and within the overall target population. We find differences across these subpopulations in the magnitude of subgroup-level treatment effects, but that the direction of estimated effects is stable. Our results highlight that LIV methods can provide useful evidence about treatment effect heterogeneity including for those subpopulations excluded from RCTs.
RESUMEN
Health care quality improvement (QI) initiatives are being implemented by a number of low- and middle-income countries. However, there is concern that these policies may not reduce, or may even worsen, inequities in access to high-quality care. Few studies have examined the distributional impact of QI programmes. We study the Ideal Clinic Realization and Maintenance program implemented in health facilities in South Africa, assessing whether the effects of the program are sensitive to previous quality performance. Implementing difference-in-difference-in-difference and changes-in-changes approaches we estimate the effect of the program on quality across the distribution of past facility quality performance. We find that the largest gains are realized by facilities with higher baseline quality, meaning this policy may have led to a worsening of pre-existing inequity in health care quality. Our study highlights that the full consequences of QI programmes cannot be gauged solely from examination of the mean impact.
RESUMEN
Government-led national comprehensive demonstration cities for Energy Conservation and Emission Reduction Fiscal Policy (ECERFP) are pivotal for China in addressing environmental governance. Using a panel dataset covering 278 Chinese cities from 2003 to 2019, this study adopts the staggered difference-in-differences (DID) approach to investigate the synergistic impacts of ECERFP on pollution and carbon reduction. The findings indicate that ECERFP contributes to a 3% improvement in pollution reduction performance, a 1.5% enhancement in carbon reduction performance, and a 4% overall increase in combined pollution and carbon reduction efforts. Furthermore, the study examines the heterogeneous effects of ECERFP on environmental performance. ECERFP significantly influences the synergistic efforts in pollution and carbon reduction by fostering green innovation, enhancing energy allocation, and optimizing industrial structures. This study both theoretically and empirically outlines the specific pathways and mechanisms through which "incentive-based" green fiscal policy promotes synergistic pollution and carbon reduction, thus providing a pragmatic foundation for enhancing the role of fiscal policy in environmental governance.
Asunto(s)
Conservación de los Recursos Energéticos , China , Conservación de los Recursos Energéticos/economía , Conservación de los Recursos Energéticos/métodos , Política Fiscal , Política Ambiental/legislación & jurisprudencia , Contaminación Ambiental/prevención & control , Contaminación Ambiental/legislación & jurisprudencia , CiudadesRESUMEN
INTRODUCTION: Sodium-glucose cotransporter 2 (SGLT2) inhibitors exhibit potential benefits in reducing dementia risk, yet the optimal beneficiary subgroups remain uncertain. METHODS: Individuals with type 2 diabetes (T2D) initiating either SGLT2 inhibitor or sulfonylurea were identified from OneFlorida+ Clinical Research Network (2016-2022). A doubly robust learning was deployed to estimate risk difference (RD) and 95% confidence interval (CI) of all-cause dementia. RESULTS: Among 35,458 individuals with T2D, 1.8% in the SGLT2 inhibitor group and 4.7% in the sulfonylurea group developed all-cause dementia over a 3.2-year follow-up, yielding a lower risk for SGLT2 inhibitors (RD, -2.5%; 95% CI, -3.0% to -2.1%). Hispanic ethnicity and chronic kidney disease were identified as the two important variables to define four subgroups in which RD ranged from -4.3% (-5.5 to -3.2) to -0.9% (-1.9 to 0.2). DISCUSSION: Compared to sulfonylureas, SGLT2 inhibitors were associated with a reduced risk of all-cause dementia, but the association varied among different subgroups. HIGHLIGHTS: New users of sodium-glucose cotransporter 2 (SGLT2) inhibitors were significantly associated with a lower risk of all-cause dementia as compared to those of sulfonylureas. The association varied among different subgroups defined by Hispanic ethnicity and chronic kidney disease. A significantly lower risk of Alzheimer's disease and vascular dementia was observed among new users of SGLT2 inhibitors compared to those of sulfonylureas.
Asunto(s)
Demencia , Diabetes Mellitus Tipo 2 , Inhibidores del Cotransportador de Sodio-Glucosa 2 , Humanos , Inhibidores del Cotransportador de Sodio-Glucosa 2/uso terapéutico , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Masculino , Femenino , Demencia/epidemiología , Anciano , Estudios de Cohortes , Compuestos de Sulfonilurea/uso terapéutico , Persona de Mediana Edad , Factores de Riesgo , Hipoglucemiantes/uso terapéutico , Insuficiencia Renal Crónica/tratamiento farmacológico , Heterogeneidad del Efecto del TratamientoRESUMEN
Enhancing the green economy efficiency (GEE) is crucial for building a sustainable economy. How can the rapidly advancing digital transformation contribute to this process? The paper empirically examines the direct and spatial spillover effects of digital transformation on cities' GEE in China. This study utilizes the National E-commerce Pilot City (NEPC) policy as a quasi-natural experiment of regional digital transformation and employs the staggered difference-in-differences (DID) method with heterogeneous effects. The findings reveal that (i) implementing the NEPC policy significantly increases urban GEE by 2.6%, corresponding to a 16% increase in the mean of GEE. This effect is particularly pronounced in non-resource-based cities and cities with high Internet penetration. (ii) The mechanism test shows that the pilot policy positively affects GEE by promoting green structural transformation, enhancing green innovation, and strengthening public environmental concerns. (iii) The study highlights a positive spatial spillover effect of the NEPC policy on the GEE of nonpilot cities. (iv) The adoption of the NEPC plays a pivotal role in advancing energy use and carbon emission efficiency. This paper expands the existing knowledge on the green development effects of the digital economy while offering valuable policy insights for building an "Inclusive Green Economy".
Asunto(s)
Carbono , Comercio , China , Ciudades , Internet , Desarrollo Económico , EficienciaRESUMEN
Psychologists are increasingly interested in whether treatment effects vary in randomized controlled trials. A number of tests have been proposed in the causal inference literature to test for such heterogeneity, which differ in the sample statistic they use (either using the variance terms of the experimental and control group, their empirical distribution functions, or specific quantiles), and in whether they make distributional assumptions or are based on a Fisher randomization procedure. In this manuscript, we present the results of a simulation study in which we examine the performance of the different tests while varying the amount of treatment effect heterogeneity, the type of underlying distribution, the sample size, and whether an additional covariate is considered. Altogether, our results suggest that researchers should use a randomization test to optimally control for type 1 errors. Furthermore, all tests studied are associated with low power in case of small and moderate samples even when the heterogeneity of the treatment effect is substantial. This suggests that current tests for treatment effect heterogeneity require much larger samples than those collected in current research.
Asunto(s)
Ensayos Clínicos Controlados Aleatorios como Asunto , Humanos , Tamaño de la Muestra , Proyectos de Investigación , Simulación por Computador , Modelos Estadísticos , Interpretación Estadística de Datos , Heterogeneidad del Efecto del TratamientoRESUMEN
A stepped-wedge cluster randomized trial (CRT) is a unidirectional crossover study in which timings of treatment initiation for clusters are randomized. Because the timing of treatment initiation is different for each cluster, an emerging question is whether the treatment effect depends on the exposure time, namely, the time duration since the initiation of treatment. Existing approaches for assessing exposure-time treatment effect heterogeneity either assume a parametric functional form of exposure time or model the exposure time as a categorical variable, in which case the number of parameters increases with the number of exposure-time periods, leading to a potential loss in efficiency. In this article, we propose a new model formulation for assessing treatment effect heterogeneity over exposure time. Rather than a categorical term for each level of exposure time, the proposed model includes a random effect to represent varying treatment effects by exposure time. This allows for pooling information across exposure-time periods and may result in more precise average and exposure-time-specific treatment effect estimates. In addition, we develop an accompanying permutation test for the variance component of the heterogeneous treatment effect parameters. We conduct simulation studies to compare the proposed model and permutation test to alternative methods to elucidate their finite-sample operating characteristics, and to generate practical guidance on model choices for assessing exposure-time treatment effect heterogeneity in stepped-wedge CRTs.
Asunto(s)
Proyectos de Investigación , Estudios Cruzados , Análisis por Conglomerados , Ensayos Clínicos Controlados Aleatorios como Asunto , Tamaño de la MuestraRESUMEN
When data are available from individual patients receiving either a treatment or a control intervention in a randomized trial, various statistical and machine learning methods can be used to develop models for predicting future outcomes under the two conditions, and thus to predict treatment effect at the patient level. These predictions can subsequently guide personalized treatment choices. Although several methods for validating prediction models are available, little attention has been given to measuring the performance of predictions of personalized treatment effect. In this article, we propose a range of measures that can be used to this end. We start by defining two dimensions of model accuracy for treatment effects, for a single outcome: discrimination for benefit and calibration for benefit. We then amalgamate these two dimensions into an additional concept, decision accuracy, which quantifies the model's ability to identify patients for whom the benefit from treatment exceeds a given threshold. Subsequently, we propose a series of performance measures related to these dimensions and discuss estimating procedures, focusing on randomized data. Our methods are applicable for continuous or binary outcomes, for any type of prediction model, as long as it uses baseline covariates to predict outcomes under treatment and control. We illustrate all methods using two simulated datasets and a real dataset from a trial in depression. We implement all methods in the R package predieval. Results suggest that the proposed measures can be useful in evaluating and comparing the performance of competing models in predicting individualized treatment effect.
Asunto(s)
Modelos Estadísticos , Medicina de Precisión , Ensayos Clínicos Controlados Aleatorios como Asunto , Humanos , Resultado del Tratamiento , Reglas de Decisión ClínicaRESUMEN
Moderation analysis is an integral part of precision medicine research. Concerning moderation analysis with categorical outcomes, we start with an interesting observation, which shows that heterogeneous treatment effects could be equivalently estimated via a role exchange between the outcome and the treatment variable in logistic regression models. Hence two estimators of moderating effects can be obtained. We then established the joint asymptotic normality for the two estimators, on which basis refined inference can be made for moderation analysis. The improved precision is helpful in addressing the lack-of-power problem that is common in search of moderators. The above-mentioned results hold for both experimental and observational data. We investigate the proposed method by simulation and provide an illustration with data from a randomized trial on wart treatment.
Asunto(s)
Medicina de Precisión , Humanos , Simulación por Computador , Modelos LogísticosRESUMEN
An important consideration in the design and analysis of randomized trials is the need to account for outcome observations being positively correlated within groups or clusters. Two notable types of designs with this consideration are individually randomized group treatment trials and cluster randomized trials. While sample size methods for testing the average treatment effect are available for both types of designs, methods for detecting treatment effect modification are relatively limited. In this article, we present new sample size formulas for testing treatment effect modification based on either a univariate or multivariate effect modifier in both individually randomized group treatment and cluster randomized trials with a continuous outcome but any types of effect modifier, while accounting for differences across study arms in the outcome variance, outcome intracluster correlation coefficient (ICC) and the cluster size. We consider cases where the effect modifier can be measured at either the individual level or cluster level, and with a univariate effect modifier, our closed-form sample size expressions provide insights into the optimal allocation of groups or clusters to maximize design efficiency. Overall, our results show that the required sample size for testing treatment effect heterogeneity with an individual-level effect modifier can be affected by unequal ICCs and variances between arms, and accounting for such between-arm heterogeneity can lead to more accurate sample size determination. We use simulations to validate our sample size formulas and illustrate their application in the context of two real trials: an individually randomized group treatment trial (the AWARE study) and a cluster randomized trial (the K-DPP study).
Asunto(s)
Proyectos de Investigación , Humanos , Tamaño de la Muestra , Análisis por Conglomerados , Ensayos Clínicos Controlados Aleatorios como AsuntoRESUMEN
OBJECTIVE: Precision medicine requires reliable identification of variation in patient-level outcomes with different available treatments, often termed treatment effect heterogeneity. We aimed to evaluate the comparative utility of individualized treatment selection strategies based on predicted individual-level treatment effects from a causal forest machine learning algorithm and a penalized regression model. METHODS: Cohort study characterizing individual-level glucose-lowering response (6 month reduction in HbA1c) in people with type 2 diabetes initiating SGLT2-inhibitor or DPP4-inhibitor therapy. Model development set comprised 1,428 participants in the CANTATA-D and CANTATA-D2 randomised clinical trials of SGLT2-inhibitors versus DPP4-inhibitors. For external validation, calibration of observed versus predicted differences in HbA1c in patient strata defined by size of predicted HbA1c benefit was evaluated in 18,741 patients in UK primary care (Clinical Practice Research Datalink). RESULTS: Heterogeneity in treatment effects was detected in clinical trial participants with both approaches (proportion predicted to have a benefit on SGLT2-inhibitor therapy over DPP4-inhibitor therapy: causal forest: 98.6%; penalized regression: 81.7%). In validation, calibration was good with penalized regression but sub-optimal with causal forest. A strata with an HbA1c benefit > 10 mmol/mol with SGLT2-inhibitors (3.7% of patients, observed benefit 11.0 mmol/mol [95%CI 8.0-14.0]) was identified using penalized regression but not causal forest, and a much larger strata with an HbA1c benefit 5-10 mmol with SGLT2-inhibitors was identified with penalized regression (regression: 20.9% of patients, observed benefit 7.8 mmol/mol (95%CI 6.7-8.9); causal forest 11.6%, observed benefit 8.7 mmol/mol (95%CI 7.4-10.1). CONCLUSIONS: Consistent with recent results for outcome prediction with clinical data, when evaluating treatment effect heterogeneity researchers should not rely on causal forest or other similar machine learning algorithms alone, and must compare outputs with standard regression, which in this evaluation was superior.
Asunto(s)
Diabetes Mellitus Tipo 2 , Inhibidores de la Dipeptidil-Peptidasa IV , Inhibidores del Cotransportador de Sodio-Glucosa 2 , Humanos , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Hemoglobina Glucada , Estudios de Cohortes , Medicina de Precisión , Dipeptidil Peptidasa 4/uso terapéutico , Transportador 2 de Sodio-Glucosa/uso terapéutico , Hipoglucemiantes/uso terapéutico , Inhibidores de la Dipeptidil-Peptidasa IV/uso terapéutico , Inhibidores del Cotransportador de Sodio-Glucosa 2/uso terapéutico , Resultado del TratamientoRESUMEN
The relationship between fiscal regimes and urban industrial pollution emissions is unclear. This paper aims to explore the effects and mechanisms of fiscal centralization on urban industrial pollution emissions and environmental quality. Using the vertical reform of environmental administrations (VREA) in China as a quasi-natural experiment of fiscal centralization, this study applies a staggered difference-in-differences (DID) model to explore the differences in industrial pollution emissions between centralization cities and decentralization cities. The main findings are: (1) VREA significantly inhibits regional industrial pollution emissions, and the reform effect increases over time. This conclusion still holds after considering a series of robustness issues. (2) Industrial sulfur dioxide (SO2) and solid particulate emissions in the fiscal centralization cities have decreased significantly by 0.3281% and 0.2240%, respectively. However, there is no significant change in industrial wastewater discharges. (3) Environmental regulations, environmental expenditures, and pollution control investments of local governments are the main channels through which VREA reduces industrial pollution emissions. (4) The effects of VREA are more significant in central and western cities and small cities. (5) Relative to decentralization cities, centralization cities have improved air and water quality by 0.0825% and 0.1628%, respectively. These findings help to accurately assess the effects of fiscal centralization on regional environmental governance and provide a decision-making reference for further deepening environmental centralization reform in China.
Asunto(s)
Contaminación del Aire , Conservación de los Recursos Naturales , Política Ambiental , Contaminación Ambiental/prevención & control , Contaminación Ambiental/análisis , Polvo , Ciudades , China , Calidad del Agua , Contaminación del Aire/prevención & control , Contaminación del Aire/análisis , Desarrollo EconómicoRESUMEN
Social scientists have long been interested in the varying responses to a specific intervention, motivating the enterprise of heterogeneous treatment effects (HTE) analysis. Over the past five decades, the rapid development of HTE methods, from conventional multiplicative interactions in linear models to explorations based on machine learning techniques, has been witnessed. This article presents a systematic review of major HTE methods, including multiplicative interaction modeling, generalized additive modeling, propensity-score-based methods, marginal treatment effect, separate LASSO constraints, causal trees, causal forests, Bayesian additive regression trees, and meta-learners (i.e., the S-learner, T-learner, X-learner, and R-learner). These methods, as described roughly in a chronological order to emphasize methodological developments, are addressed to highlight their respective strengths and limitations. Following an illustrative example, this article reflects on future methodological developments.
Asunto(s)
Aprendizaje Automático , Humanos , Teorema de BayesRESUMEN
Testing multiple treatments for heterogeneous (varying) effectiveness with respect to many underlying risk factors requires many pairwise tests; we would like to instead automatically discover and visualize patient archetypes and predictors of treatment effectiveness using multitask machine learning. In this paper, we present a method to estimate these heterogeneous treatment effects with an interpretable hierarchical framework that uses additive models to visualize expected treatment benefits as a function of patient factors (identifying personalized treatment benefits) and concurrent treatments (identifying combinatorial treatment benefits). This method achieves state-of-the-art predictive power for COVID-19 in-hospital mortality and interpretable identification of heterogeneous treatment benefits. We first validate this method on the large public MIMIC-IV dataset of ICU patients to test recovery of heterogeneous treatment effects. Next we apply this method to a proprietary dataset of over 3000 patients hospitalized for COVID-19, and find evidence of heterogeneous treatment effectiveness predicted largely by indicators of inflammation and thrombosis risk: patients with few indicators of thrombosis risk benefit most from treatments against inflammation, while patients with few indicators of inflammation risk benefit most from treatments against thrombosis. This approach provides an automated methodology to discover heterogeneous and individualized effectiveness of treatments.
Asunto(s)
COVID-19 , Humanos , Inflamación , Aprendizaje Automático , Factores de Riesgo , Resultado del TratamientoRESUMEN
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms-such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks-to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods.
RESUMEN
A wide range of machine-learning-based approaches have been developed in the past decade, increasing our ability to accurately model nonlinear and nonadditive response surfaces. This has improved performance for inferential tasks such as estimating average treatment effects in situations where standard parametric models may not fit the data well. These methods have also shown promise for the related task of identifying heterogeneous treatment effects. However, the estimation of both overall and heterogeneous treatment effects can be hampered when data are structured within groups if we fail to correctly model the dependence between observations. Most machine learning methods do not readily accommodate such structure. This paper introduces a new algorithm, stan4bart, that combines the flexibility of Bayesian Additive Regression Trees (BART) for fitting nonlinear response surfaces with the computational and statistical efficiencies of using Stan for the parametric components of the model. We demonstrate how stan4bart can be used to estimate average, subgroup, and individual-level treatment effects with stronger performance than other flexible approaches that ignore the multilevel structure of the data as well as multilevel approaches that have strict parametric forms.
RESUMEN
Treatment effects vary across different patients, and estimation of this variability is essential for clinical decision-making. We aimed to develop a model estimating the benefit of alternative treatment options for individual patients, extending a risk modeling approach in a network meta-analysis framework. We propose a two-stage prediction model for heterogeneous treatment effects by combining prognosis research and network meta-analysis methods where individual patient data are available. In the first stage, a prognostic model to predict the baseline risk of the outcome. In the second stage, we use the baseline risk score from the first stage as a single prognostic factor and effect modifier in a network meta-regression model. We apply the approach to a network meta-analysis of three randomized clinical trials comparing the relapses in Natalizumab, Glatiramer Acetate, and Dimethyl Fumarate, including 3590 patients diagnosed with relapsing-remitting multiple sclerosis. We find that the baseline risk score modifies the relative and absolute treatment effects. Several patient characteristics, such as age and disability status, impact the baseline risk of relapse, which in turn moderates the benefit expected for each of the treatments. For high-risk patients, the treatment that minimizes the risk of relapse in 2 years is Natalizumab, whereas Dimethyl Fumarate might be a better option for low-risk patients. Our approach can be easily extended to all outcomes of interest and has the potential to inform a personalized treatment approach.