Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 109
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 121(23): e2322376121, 2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38809705

RESUMEN

In this article, we develop CausalEGM, a deep learning framework for nonlinear dimension reduction and generative modeling of the dependency among covariate features affecting treatment and response. CausalEGM can be used for estimating causal effects in both binary and continuous treatment settings. By learning a bidirectional transformation between the high-dimensional covariate space and a low-dimensional latent space and then modeling the dependencies of different subsets of the latent variables on the treatment and response, CausalEGM can extract the latent covariate features that affect both treatment and response. By conditioning on these features, one can mitigate the confounding effect of the high dimensional covariate on the estimation of the causal relation between treatment and response. In a series of experiments, the proposed method is shown to achieve superior performance over existing methods in both binary and continuous treatment settings. The improvement is substantial when the sample size is large and the covariate is of high dimension. Finally, we established excess risk bounds and consistency results for our method, and discuss how our approach is related to and improves upon other dimension reduction approaches in causal inference.

2.
Biostatistics ; 24(2): 518-537, 2023 04 14.
Artículo en Inglés | MEDLINE | ID: mdl-34676400

RESUMEN

Instrumental variable (IV) methods allow us the opportunity to address unmeasured confounding in causal inference. However, most IV methods are only applicable to discrete or continuous outcomes with very few IV methods for censored survival outcomes. In this article, we propose nonparametric estimators for the local average treatment effect on survival probabilities under both covariate-dependent and outcome-dependent censoring. We provide an efficient influence function-based estimator and a simple estimation procedure when the IV is either binary or continuous. The proposed estimators possess double-robustness properties and can easily incorporate nonparametric estimation using machine learning tools. In simulation studies, we demonstrate the flexibility and double robustness of our proposed estimators under various plausible scenarios. We apply our method to the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial for estimating the causal effect of screening on survival probabilities and investigate the causal contrasts between the two interventions under different censoring assumptions.


Asunto(s)
Simulación por Computador , Humanos , Causalidad , Probabilidad
3.
Biostatistics ; 24(2): 309-326, 2023 04 14.
Artículo en Inglés | MEDLINE | ID: mdl-34382066

RESUMEN

Scientists frequently generalize population level causal quantities such as average treatment effect from a source population to a target population. When the causal effects are heterogeneous, differences in subject characteristics between the source and target populations may make such a generalization difficult and unreliable. Reweighting or regression can be used to adjust for such differences when generalizing. However, these methods typically suffer from large variance if there is limited covariate distribution overlap between the two populations. We propose a generalizability score to address this issue. The score can be used as a yardstick to select target subpopulations for generalization. A simplified version of the score avoids using any outcome information and thus can prevent deliberate biases associated with inadvertent access to such information. Both simulation studies and real data analysis demonstrate convincing results for such selection.


Asunto(s)
Proyectos de Investigación , Humanos , Puntaje de Propensión , Simulación por Computador , Causalidad , Sesgo
4.
Biostatistics ; 24(4): 985-999, 2023 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-35791753

RESUMEN

When evaluating the effectiveness of a treatment, policy, or intervention, the desired measure of efficacy may be expensive to collect, not routinely available, or may take a long time to occur. In these cases, it is sometimes possible to identify a surrogate outcome that can more easily, quickly, or cheaply capture the effect of interest. Theory and methods for evaluating the strength of surrogate markers have been well studied in the context of a single surrogate marker measured in the course of a randomized clinical study. However, methods are lacking for quantifying the utility of surrogate markers when the dimension of the surrogate grows. We propose a robust and efficient method for evaluating a set of surrogate markers that may be high-dimensional. Our method does not require treatment to be randomized and may be used in observational studies. Our approach draws on a connection between quantifying the utility of a surrogate marker and the most fundamental tools of causal inference-namely, methods for robust estimation of the average treatment effect. This connection facilitates the use of modern methods for estimating treatment effects, using machine learning to estimate nuisance functions and relaxing the dependence on model specification. We demonstrate that our proposed approach performs well, demonstrate connections between our approach and certain mediation effects, and illustrate it by evaluating whether gene expression can be used as a surrogate for immune activation in an Ebola study.


Asunto(s)
Modelos Estadísticos , Humanos , Biomarcadores , Causalidad , Simulación por Computador
5.
Biometrics ; 80(3)2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-39011739

RESUMEN

Electronic health records and other sources of observational data are increasingly used for drawing causal inferences. The estimation of a causal effect using these data not meant for research purposes is subject to confounding and irregularly-spaced covariate-driven observation times affecting the inference. A doubly-weighted estimator accounting for these features has previously been proposed that relies on the correct specification of two nuisance models used for the weights. In this work, we propose a novel consistent multiply robust estimator and demonstrate analytically and in comprehensive simulation studies that it is more flexible and more efficient than the only alternative estimator proposed for the same setting. It is further applied to data from the Add Health study in the United States to estimate the causal effect of therapy counseling on alcohol consumption in American adolescents.


Asunto(s)
Simulación por Computador , Modelos Estadísticos , Estudios Observacionales como Asunto , Humanos , Estudios Observacionales como Asunto/estadística & datos numéricos , Adolescente , Causalidad , Estados Unidos , Interpretación Estadística de Datos , Registros Electrónicos de Salud/estadística & datos numéricos , Biometría/métodos , Consumo de Bebidas Alcohólicas
6.
Biometrics ; 80(2)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38640436

RESUMEN

Several epidemiological studies have provided evidence that long-term exposure to fine particulate matter (pm2.5) increases mortality rate. Furthermore, some population characteristics (e.g., age, race, and socioeconomic status) might play a crucial role in understanding vulnerability to air pollution. To inform policy, it is necessary to identify groups of the population that are more or less vulnerable to air pollution. In causal inference literature, the group average treatment effect (GATE) is a distinctive facet of the conditional average treatment effect. This widely employed metric serves to characterize the heterogeneity of a treatment effect based on some population characteristics. In this paper, we introduce a novel Confounder-Dependent Bayesian Mixture Model (CDBMM) to characterize causal effect heterogeneity. More specifically, our method leverages the flexibility of the dependent Dirichlet process to model the distribution of the potential outcomes conditionally to the covariates and the treatment levels, thus enabling us to: (i) identify heterogeneous and mutually exclusive population groups defined by similar GATEs in a data-driven way, and (ii) estimate and characterize the causal effects within each of the identified groups. Through simulations, we demonstrate the effectiveness of our method in uncovering key insights about treatment effects heterogeneity. We apply our method to claims data from Medicare enrollees in Texas. We found six mutually exclusive groups where the causal effects of pm2.5 on mortality rate are heterogeneous.


Asunto(s)
Contaminantes Atmosféricos , Contaminación del Aire , Estados Unidos/epidemiología , Contaminantes Atmosféricos/efectos adversos , Contaminantes Atmosféricos/análisis , Teorema de Bayes , Medicare , Contaminación del Aire/efectos adversos , Contaminación del Aire/análisis , Material Particulado/efectos adversos , Material Particulado/análisis , Exposición a Riesgos Ambientales/efectos adversos
7.
Stat Med ; 43(8): 1640-1659, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38351516

RESUMEN

The regression discontinuity (RD) design is a widely utilized approach for assessing treatment effects. It involves assigning treatment based on the value of an observed covariate in relation to a fixed threshold. Although the RD design has been widely employed across various problems, its application to specific data types has received limited attention. For instance, there has been little research on utilizing the RD design when the outcome variable exhibits zero-inflation. This study introduces a novel RD estimator using local likelihood, which overcomes the limitations of the local linear regression model, a popular approach for estimating treatment effects in RD design, by considering the data type of the outcome variable. To determine the optimal bandwidth, we propose a modified Ludwig-Miller cross validation method. A set of simulations is carried out, involving binary, count, and zero-inflated outcome variables, to showcase the superior performance of the suggested method over local linear regression models. Subsequently, the proposed local likelihood model is employed on HIV care data, where antiretroviral therapy eligibility is determined by a CD4 count threshold. A comparison is made between the results obtained using the local likelihood model and those obtained using local linear regression.


Asunto(s)
Fármacos Anti-VIH , Infecciones por VIH , Humanos , Sudáfrica , Fármacos Anti-VIH/uso terapéutico , Infecciones por VIH/tratamiento farmacológico , Modelos Lineales , Proyectos de Investigación
8.
BMC Med Res Methodol ; 24(1): 250, 2024 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-39462370

RESUMEN

PURPOSE: We aim to thoroughly compare past and current methods that leverage baseline covariate information to estimate the average treatment effect (ATE) using data from of randomized clinical trials (RCTs). We especially focus on their performance, efficiency gain, and power. METHODS: We compared 6 different methods using extensive Monte-Carlo simulation studies: the unadjusted estimator, i.e., analysis of variance (ANOVA), the analysis of covariance (ANCOVA), the analysis of heterogeneous covariance (ANHECOVA), the inverse probability weighting (IPW), the augmented inverse probability weighting (AIPW), and the overlap weighting (OW) as well as the augmented overlap weighting (AOW) estimators. The performance of these methods is assessed using the relative bias (RB), the root mean square error (RMSE), the model-based standard error (SE) estimation, the coverage probability (CP), and the statistical power. RESULTS: Even with a well-executed randomization, adjusting for baseline covariates by an appropriate method can be a good practice. When the outcome model(s) used in a covariate-adjusted method is closer to the correctly specified model(s), the efficiency and power gained can be substantial. We also found that most covariate-adjusted methods can suffer from the high-dimensional curse, i.e., when the number of covariates is relatively high compared to the sample size, they can have poor performance (along with lower efficiency) in estimating ATE. Among the different methods we compared, the OW performs the best overall with smaller RMSEs and smaller model-based SEs, which also result in higher power when the true effect is non-zero. Furthermore, the OW is more robust when dealing with the high-dimensional issue. CONCLUSION: To effectively use covariate adjustment methods, understanding their nature is important for practical investigators. Our study shows that outcome model misspecification and high-dimension are two main burdens in a covariate adjustment method to gain higher efficiency and power. When these factors are appropriately considered, e.g., performing some variable selections if the data dimension is high before adjusting covariate, these methods are expected to be useful.


Asunto(s)
Método de Montecarlo , Ensayos Clínicos Controlados Aleatorios como Asunto , Humanos , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Ensayos Clínicos Controlados Aleatorios como Asunto/estadística & datos numéricos , Análisis de Varianza , Modelos Estadísticos , Simulación por Computador , Interpretación Estadística de Datos
9.
BMC Med Res Methodol ; 24(1): 122, 2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38831393

RESUMEN

BACKGROUND: Two propensity score (PS) based balancing covariate methods, the overlap weighting method (OW) and the fine stratification method (FS), produce superb covariate balance. OW has been compared with various weighting methods while FS has been compared with the traditional stratification method and various matching methods. However, no study has yet compared OW and FS. In addition, OW has not yet been evaluated in large claims data with low prevalence exposure and with low frequency outcomes, a context in which optimal use of balancing methods is critical. In the study, we aimed to compare OW and FS using real-world data and simulations with low prevalence exposure and with low frequency outcomes. METHODS: We used the Texas State Medicaid claims data on adult beneficiaries with diabetes in 2012 as an empirical example (N = 42,628). Based on its real-world research question, we estimated an average treatment effect of health center vs. non-health center attendance in the total population. We also performed simulations to evaluate their relative performance. To preserve associations between covariates, we used the plasmode approach to simulate outcomes and/or exposures with N = 4,000. We simulated both homogeneous and heterogeneous treatment effects with various outcome risks (1-30% or observed: 27.75%) and/or exposure prevalence (2.5-30% or observed:10.55%). We used a weighted generalized linear model to estimate the exposure effect and the cluster-robust standard error (SE) method to estimate its SE. RESULTS: In the empirical example, we found that OW had smaller standardized mean differences in all covariates (range: OW: 0.0-0.02 vs. FS: 0.22-3.26) and Mahalanobis balance distance (MB) (< 0.001 vs. > 0.049) than FS. In simulations, OW also achieved smaller MB (homogeneity: <0.04 vs. > 0.04; heterogeneity: 0.0-0.11 vs. 0.07-0.29), relative bias (homogeneity: 4.04-56.20 vs. 20-61.63; heterogeneity: 7.85-57.6 vs. 15.0-60.4), square root of mean squared error (homogeneity: 0.332-1.308 vs. 0.385-1.365; heterogeneity: 0.263-0.526 vs 0.313-0.620), and coverage probability (homogeneity: 0.0-80.4% vs. 0.0-69.8%; heterogeneity: 0.0-97.6% vs. 0.0-92.8%), than FS, in most cases. CONCLUSIONS: These findings suggest that OW can yield nearly perfect covariate balance and therefore enhance the accuracy of average treatment effect estimation in the total population.


Asunto(s)
Puntaje de Propensión , Humanos , Masculino , Femenino , Estados Unidos , Adulto , Persona de Mediana Edad , Texas/epidemiología , Diabetes Mellitus/epidemiología , Medicaid/estadística & datos numéricos , Simulación por Computador , Revisión de Utilización de Seguros/estadística & datos numéricos
10.
BMC Public Health ; 24(1): 2829, 2024 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-39407154

RESUMEN

BACKGROUND: Police officers are at a high risk of noise-induced hearing loss (NIHL) owing to the nature of their work. Therefore, this study aimed to compare the risk of NIHL in police officers and controls. METHODS: This study used the National Health Insurance claims data of workers aged 25-65 years obtained from 2005 to 2015. The case group comprised police officers, while the control group comprised general workers and public officers. The study followed a three-phase cohort design. The standardized incidence ratio (SIR) was calculated using an indirect standardization method based on age. Propensity score matching was performed using the greedy matching method, with a police officer-to-control group ratio of 1:3. Cox regression analysis was performed for each matched control group. Statistical significance was determined by a lower limit of greater than 1, based on the 95% confidence interval (CI). RESULTS: The SIR values for police officers were 1.62 (95% CI: 1.44-1.82) compared with general workers and 1.78 (95% CI: 1.66-1.73) compared with public officers. Police officers exhibited an increased risk of NIHL compared with general workers (hazard ratio (HR): 1.71, 95% CI: 1.49-1.98) and public officers (HR: 2.19, 95% CI: 1.88-2.56). CONCLUSIONS: It is necessary to prevent NIHL by reducing occupational noise exposure through measures such as wearing earplugs, improving shooting training methods, and improving the shift work system.


Asunto(s)
Pérdida Auditiva Provocada por Ruido , Enfermedades Profesionales , Policia , Puntaje de Propensión , Humanos , Policia/estadística & datos numéricos , Masculino , Persona de Mediana Edad , República de Corea/epidemiología , Adulto , Pérdida Auditiva Provocada por Ruido/epidemiología , Anciano , Enfermedades Profesionales/epidemiología , Estudios de Cohortes , Factores de Riesgo , Incidencia , Exposición Profesional/efectos adversos , Exposición Profesional/estadística & datos numéricos , Ruido en el Ambiente de Trabajo/efectos adversos , Ruido en el Ambiente de Trabajo/estadística & datos numéricos , Medición de Riesgo
11.
Biometrics ; 79(4): 3179-3190, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-36645231

RESUMEN

In this paper, we focus on estimating the average treatment effect (ATE) of a target population when individual-level data from a source population and summary-level data (e.g., first or second moments of certain covariates) from the target population are available. In the presence of the heterogeneous treatment effect, the ATE of the target population can be different from that of the source population when distributions of treatment effect modifiers are dissimilar in these two populations, a phenomenon also known as covariate shift. Many methods have been developed to adjust for covariate shift, but most require individual covariates from a representative target sample. We develop a weighting approach based on the summary-level information from the target sample to adjust for possible covariate shift in effect modifiers. In particular, weights of the treated and control groups within a source sample are calibrated by the summary-level information of the target sample. Our approach also seeks additional covariate balance between the treated and control groups in the source sample. We study the asymptotic behavior of the corresponding weighted estimator for the target population ATE under a wide range of conditions. The theoretical implications are confirmed in simulation studies and a real-data application.


Asunto(s)
Entropía , Simulación por Computador , Causalidad , Puntaje de Propensión
12.
Stat Med ; 42(10): 1542-1564, 2023 05 10.
Artículo en Inglés | MEDLINE | ID: mdl-36815690

RESUMEN

Linkage between drug claims data and clinical outcome allows a data-driven experimental approach to drug repurposing. We develop an estimation procedure based on generalized random forests for estimation of time-point specific average treatment effects in a time-to-event setting with competing risks. To handle right-censoring, we propose a two-step procedure for estimation, applying inverse probability weighting to construct time-point specific weighted outcomes as input for the generalized random forest. The generalized random forests adaptively handle covariate effects on the treatment assignment by applying a splitting rule that targets a causal parameter. Using simulated data we demonstrate that the method is effective for a causal search through a list of treatments to be ranked according to the magnitude of their effect on clinical outcome. We illustrate the method using the Danish national health registries where it is of interest to discover drugs with an unexpected protective effect against relapse of severe depression.


Asunto(s)
Bosques Aleatorios , Humanos , Probabilidad
13.
Stat Med ; 42(11): 1760-1778, 2023 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-36863006

RESUMEN

Matching is a popular design for inferring causal effect with observational data. Unlike model-based approaches, it is a nonparametric method to group treated and control subjects with similar characteristics together, hence to re-create a randomization-like scenario. The application of matched design for real world data may be limited by: (1) the causal estimand of interest; (2) the sample size of different treatment arms. We propose a flexible design of matching, based on the idea of template matching, to overcome these challenges. It first identifies the template group which is representative of the target population, then match subjects from the original data to this template group and make inference. We provide theoretical justification on how it unbiasedly estimates the average treatment effect using matched pairs and the average treatment effect on the treated when the treatment group has a bigger sample size. We also propose using the triplet matching algorithm to improve matching quality and devise a practical strategy to select the template size. One major advantage of matched design is that it allows both randomization-based or model-based inference, with the former being more robust. For the commonly used binary outcome in medical research, we adopt a randomization inference framework of attributable effects in matched data, which allows heterogeneous effects and can incorporate sensitivity analysis for unmeasured confounding. We apply our design and analytical strategy to a trauma care evaluation study.


Asunto(s)
Investigación Biomédica , Estudios Observacionales como Asunto , Humanos , Algoritmos , Causalidad , Proyectos de Investigación , Tamaño de la Muestra
14.
Stat Med ; 42(1): 33-51, 2023 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-36336460

RESUMEN

In observational studies, causal inference relies on several key identifying assumptions. One identifiability condition is the positivity assumption, which requires the probability of treatment be bounded away from 0 and 1. That is, for every covariate combination, it should be possible to observe both treated and control subjects the covariate distributions should overlap between treatment arms. If the positivity assumption is violated, population-level causal inference necessarily involves some extrapolation. Ideally, a greater amount of uncertainty about the causal effect estimate should be reflected in such situations. With that goal in mind, we construct a Gaussian process model for estimating treatment effects in the presence of practical violations of positivity. Advantages of our method include minimal distributional assumptions, a cohesive model for estimating treatment effects, and more uncertainty associated with areas in the covariate space where there is less overlap. We assess the performance of our approach with respect to bias and efficiency using simulation studies. The method is then applied to a study of critically ill female patients to examine the effect of undergoing right heart catheterization.


Asunto(s)
Modelos Estadísticos , Humanos , Femenino , Probabilidad , Simulación por Computador , Sesgo
15.
Stat Med ; 42(15): 2637-2660, 2023 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-37012676

RESUMEN

Most propensity score (PS) analysis methods rely on a correctly specified parametric PS model, which may result in biased estimation of the average treatment effect (ATE) when the model is misspecified. More flexible nonparametric models for treatment assignment alleviate this issue, but they do not always guarantee covariate balance. Methods that force balance in the means of covariates and their transformations between the treatment groups, termed global balance in this article, do not always lead to unbiased estimation of ATE. Their estimated propensity scores only ensure global balance but not the balancing property, which is defined as the conditional independence between treatment assignment and covariates given the propensity score. The balancing property implies not only global balance but also local balance-the mean balance of covariates in propensity score stratified sub-populations. Local balance implies global balance, but the reverse is false. We propose the propensity score with local balance (PSLB) methodology, which incorporates nonparametric propensity score models and optimizes local balance. Extensive numerical studies showed that the proposed method can substantially outperform existing methods that estimate the propensity score by optimizing global balance, when the model is misspecified. The proposed method is implemented in the R package PSLB.


Asunto(s)
Modelos Estadísticos , Humanos , Puntaje de Propensión , Simulación por Computador
16.
Stat Med ; 42(21): 3764-3785, 2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37339777

RESUMEN

Cluster randomized trials (CRTs) are studies where treatment is randomized at the cluster level but outcomes are typically collected at the individual level. When CRTs are employed in pragmatic settings, baseline population characteristics may moderate treatment effects, leading to what is known as heterogeneous treatment effects (HTEs). Pre-specified, hypothesis-driven HTE analyses in CRTs can enable an understanding of how interventions may impact subpopulation outcomes. While closed-form sample size formulas have recently been proposed, assuming known intracluster correlation coefficients (ICCs) for both the covariate and outcome, guidance on optimal cluster randomized designs to ensure maximum power with pre-specified HTE analyses has not yet been developed. We derive new design formulas to determine the cluster size and number of clusters to achieve the locally optimal design (LOD) that minimizes variance for estimating the HTE parameter given a budget constraint. Given the LODs are based on covariate and outcome-ICC values that are usually unknown, we further develop the maximin design for assessing HTE, identifying the combination of design resources that maximize the relative efficiency of the HTE analysis in the worst case scenario. In addition, given the analysis of the average treatment effect is often of primary interest, we also establish optimal designs to accommodate multiple objectives by combining considerations for studying both the average and heterogeneous treatment effects. We illustrate our methods using the context of the Kerala Diabetes Prevention Program CRT, and provide an R Shiny app to facilitate calculation of optimal designs under a wide range of design parameters.


Asunto(s)
Proyectos de Investigación , Humanos , Análisis por Conglomerados , Tamaño de la Muestra , Ensayos Clínicos Controlados Aleatorios como Asunto
17.
BMC Med Res Methodol ; 23(1): 297, 2023 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-38102563

RESUMEN

BACKGROUND: Across studies of average treatment effects, some population subgroups consistently have lower representation than others which can lead to discrepancies in how well results generalize. METHODS: We develop a framework for quantifying inequity due to systemic disparities in sample representation and a method for mitigation during data analysis. Assuming subgroup treatment effects are exchangeable, an unbiased sample average treatment effect estimator will have lower mean-squared error, on average across studies, for subgroups with less representation when treatment effects vary. We present a method for estimating average treatment effects in representation-adjusted samples which enables subgroups to optimally leverage information from the full sample rather than only their own subgroup's data. Two approaches for specifying representation adjustment are offered-one minimizes average mean-squared error for each subgroup separately and the other balances minimization of mean-squared error and equal representation. We conduct simulation studies to compare the performance of the proposed estimators to several subgroup-specific estimators. RESULTS: We find that the proposed estimators generally provide lower mean squared error, particularly for smaller subgroups, relative to the other estimators. As a case study, we apply this method to a subgroup analysis from a published study. CONCLUSIONS: We recommend the use of the proposed estimators to mitigate the impact of disparities in representation, though structural change is ultimately needed.


Asunto(s)
Modelos Estadísticos , Humanos , Simulación por Computador
18.
BMC Med Res Methodol ; 23(1): 231, 2023 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-37821829

RESUMEN

BACKGROUND: In observational studies, double robust or multiply robust (MR) approaches provide more protection from model misspecification than the inverse probability weighting and g-computation for estimating the average treatment effect (ATE). However, the approaches are based on parametric models, leading to biased estimates when all models are incorrectly specified. Nonparametric methods, such as machine learning or nonparametric double robust approaches, are robust to model misspecification, but the efficiency of nonparametric methods is low. METHOD: In the study, we proposed an improved MR method combining parametric and nonparametric models based on the previous MR method (Han, JASA 109(507):1159-73, 2014) to improve the robustness to model misspecification and the efficiency. We performed comprehensive simulations to evaluate the performance of the proposed method. RESULTS: Our simulation study showed that the MR estimators with only outcome regression (OR) models, where one of the models was a nonparametric model, were the most recommended because of the robustness to model misspecification and the lowest root mean square error (RMSE) when including a correct parametric OR model. And the performance of the recommended estimators was comparative, even if all parametric models were misspecified. As an application, the proposed method was used to estimate the effect of social activity on depression levels in the China Health and Retirement Longitudinal Study dataset. CONCLUSIONS: The proposed estimator with nonparametric and parametric models is more robust to model misspecification.


Asunto(s)
Aprendizaje Automático , Modelos Estadísticos , Humanos , Estudios Longitudinales , Simulación por Computador , Probabilidad
19.
Eur J Epidemiol ; 38(2): 123-133, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36626100

RESUMEN

Most work on extending (generalizing or transporting) inferences from a randomized trial to a target population has focused on estimating average treatment effects (i.e., averaged over the target population's covariate distribution). Yet, in the presence of strong effect modification by baseline covariates, the average treatment effect in the target population may be less relevant for guiding treatment decisions. Instead, the conditional average treatment effect (CATE) as a function of key effect modifiers may be a more useful estimand. Recent work on estimating target population CATEs using baseline covariate, treatment, and outcome data from the trial and covariate data from the target population only allows for the examination of heterogeneity over distinct subgroups. We describe flexible pseudo-outcome regression modeling methods for estimating target population CATEs conditional on discrete or continuous baseline covariates when the trial is embedded in a sample from the target population (i.e., in nested trial designs). We construct pointwise confidence intervals for the CATE at a specific value of the effect modifiers and uniform confidence bands for the CATE function. Last, we illustrate the methods using data from the Coronary Artery Surgery Study (CASS) to estimate CATEs given history of myocardial infarction and baseline ejection fraction value in the target population of all trial-eligible patients with stable ischemic heart disease.


Asunto(s)
Infarto del Miocardio , Humanos , Análisis de Regresión , Proyectos de Investigación
20.
J Biomed Inform ; 137: 104256, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36455806

RESUMEN

Big data and (deep) machine learning have been ambitious tools in digital medicine, but these tools focus mainly on association. Intervention in medicine is about the causal effects. The average treatment effect has long been studied as a measure of causal effect, assuming that all populations have the same effect size. However, no "one-size-fits-all" treatment seems to work in some complex diseases. Treatment effects may vary by patient. Estimating heterogeneous treatment effects (HTE) may have a high impact on developing personalized treatment. Lots of advanced machine learning models for estimating HTE have emerged in recent years, but there has been limited translational research into the real-world healthcare domain. To fill the gap, we reviewed and compared eleven recent HTE estimation methodologies, including meta-learner, representation learning models, and tree-based models. We performed a comprehensive benchmark experiment based on nationwide healthcare claim data with application to Alzheimer's disease drug repurposing. We provided some challenges and opportunities in HTE estimation analysis in the healthcare domain to close the gap between innovative HTE models and deployment to real-world healthcare problems.


Asunto(s)
Benchmarking , Aprendizaje Automático , Humanos , Ensayos Clínicos Controlados Aleatorios como Asunto , Causalidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA