Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 49
Filter
Add more filters

Publication year range
1.
Biostatistics ; 25(2): 289-305, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-36977366

ABSTRACT

Causally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but from which covariate information can be obtained. In such analyses, a key practical challenge is the presence of systematically missing data when some trials have collected data on one or more baseline covariates, but other trials have not, such that the covariate information is missing for all participants in the latter. In this article, we provide identification results for potential (counterfactual) outcome means and average treatment effects in the target population when covariate data are systematically missing from some of the trials in the meta-analysis. We propose three estimators for the average treatment effect in the target population, examine their asymptotic properties, and show that they have good finite-sample performance in simulation studies. We use the estimators to analyze data from two large lung cancer screening trials and target population data from the National Health and Nutrition Examination Survey (NHANES). To accommodate the complex survey design of the NHANES, we modify the methods to incorporate survey sampling weights and allow for clustering.


Subject(s)
Early Detection of Cancer , Lung Neoplasms , Humans , Nutrition Surveys , Lung Neoplasms/epidemiology , Computer Simulation , Research Design
2.
Biostatistics ; 25(2): 323-335, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-37475638

ABSTRACT

The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the subgroup discovery for longitudinal data algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus who are at higher risk of weight gain when receiving dolutegravir (DTG)-containing antiretroviral therapies (ARTs) versus when receiving non-DTG-containing ARTs.


Subject(s)
Electronic Health Records , HIV Infections , Heterocyclic Compounds, 3-Ring , Piperazines , Pyridones , Humans , Treatment Effect Heterogeneity , Oxazines , HIV Infections/drug therapy
3.
Biostatistics ; 24(3): 728-742, 2023 Jul 14.
Article in English | MEDLINE | ID: mdl-35389429

ABSTRACT

Prediction models are often built and evaluated using data from a population that differs from the target population where model-derived predictions are intended to be used in. In this article, we present methods for evaluating model performance in the target population when some observations are right censored. The methods assume that outcome and covariate data are available from a source population used for model development and covariates, but no outcome data, are available from the target population. We evaluate the finite sample performance of the proposed estimators using simulations and apply the methods to transport a prediction model built using data from a lung cancer screening trial to a nationally representative population of participants eligible for lung cancer screening.


Subject(s)
Early Detection of Cancer , Lung Neoplasms , Humans , Models, Statistical , Computer Simulation
4.
Eur J Epidemiol ; 2024 May 10.
Article in English | MEDLINE | ID: mdl-38724763

ABSTRACT

Investigators often believe that relative effect measures conditional on covariates, such as risk ratios and mean ratios, are "transportable" across populations. Here, we examine the identification of causal effects in a target population using an assumption that conditional relative effect measures are transportable from a trial to the target population. We show that transportability for relative effect measures is largely incompatible with transportability for difference effect measures, unless the treatment has no effect on average or one is willing to make even stronger transportability assumptions that imply the transportability of both relative and difference effect measures. We then describe how marginal (population-averaged) causal estimands in a target population can be identified under the assumption of transportability of relative effect measures, when we are interested in the effectiveness of a new experimental treatment in a target population where the only treatment in use is the control treatment evaluated in the trial. We extend these results to consider cases where the control treatment evaluated in the trial is only one of the treatments in use in the target population, under an additional partial exchangeability assumption in the target population (i.e., an assumption of no unmeasured confounding in the target population with respect to potential outcomes under the control treatment in the trial). We also develop identification results that allow for the covariates needed for transportability of relative effect measures to be only a small subset of the covariates needed to control confounding in the target population. Last, we propose estimators that can be easily implemented in standard statistical software and illustrate their use using data from a comprehensive cohort study of stable ischemic heart disease.

5.
Am J Epidemiol ; 192(2): 296-304, 2023 02 01.
Article in English | MEDLINE | ID: mdl-35872598

ABSTRACT

We considered methods for transporting a prediction model for use in a new target population, both when outcome and covariate data for model development are available from a source population that has a different covariate distribution compared with the target population and when covariate data (but not outcome data) are available from the target population. We discuss how to tailor the prediction model to account for differences in the data distribution between the source population and the target population. We also discuss how to assess the model's performance (e.g., by estimating the mean squared prediction error) in the target population. We provide identifiability results for measures of model performance in the target population for a potentially misspecified prediction model under a sampling design where the source and the target population samples are obtained separately. We introduce the concept of prediction error modifiers that can be used to reason about tailoring measures of model performance to the target population. We illustrate the methods in simulated data and apply them to transport a prediction model for lung cancer diagnosis from the National Lung Screening Trial to the nationally representative target population of trial-eligible individuals in the National Health and Nutrition Examination Survey.


Subject(s)
Models, Theoretical , Nutrition Surveys , Humans , Lung Neoplasms/diagnosis
6.
Am J Epidemiol ; 192(10): 1688-1700, 2023 10 10.
Article in English | MEDLINE | ID: mdl-37147861

ABSTRACT

Accurate forecasts can inform response to outbreaks. Most efforts in influenza forecasting have focused on predicting influenza-like activity, with fewer on influenza-related hospitalizations. We conducted a simulation study to evaluate a super learner's predictions of 3 seasonal measures of influenza hospitalizations in the United States: peak hospitalization rate, peak hospitalization week, and cumulative hospitalization rate. We trained an ensemble machine learning algorithm on 15,000 simulated hospitalization curves and generated weekly predictions. We compared the performance of the ensemble (weighted combination of predictions from multiple prediction algorithms), the best-performing individual prediction algorithm, and a naive prediction (median of a simulated outcome distribution). Ensemble predictions performed similarly to the naive predictions early in the season but consistently improved as the season progressed for all prediction targets. The best-performing prediction algorithm in each week typically had similar predictive accuracy compared with the ensemble, but the specific prediction algorithm selected varied by week. An ensemble super learner improved predictions of influenza-related hospitalizations, relative to a naive prediction. Future work should examine the super learner's performance using additional empirical data on influenza-related predictors (e.g., influenza-like illness). The algorithm should also be tailored to produce prospective probabilistic forecasts of selected prediction targets.


Subject(s)
Hospitalization , Influenza, Human , Humans , Computer Simulation , Forecasting , Influenza, Human/epidemiology , Prospective Studies , Seasons , United States/epidemiology , Machine Learning , Public Health Surveillance
7.
Epidemiol Rev ; 2023 Feb 08.
Article in English | MEDLINE | ID: mdl-36752592

ABSTRACT

Comparisons between randomized trial analyses and observational analyses that attempt to address similar research questions have generated many controversies in epidemiology and the social sciences. There has been little consensus on when such comparisons are reasonable, what their implications are for the validity of observational analyses, or whether trial and observational analyses can be integrated to address effectiveness questions. Here, we consider methods for using observational analyses to complement trial analyses when assessing treatment effectiveness. First, we review the framework for designing observational analyses that emulate target trials and present an evidence map of its recent applications. We then review approaches for estimating the average treatment effect in the target population underlying the emulation: using observational analyses of the emulation data alone; and using transportability analyses to extend inferences from a trial to the target population. We explain how comparing treatment effect estimates from the emulation against those from the trial can provide evidence on whether observational analyses can be trusted to deliver valid estimates of effectiveness - a process we refer to as benchmarking - and, in some cases, allow the joint analysis of the trial and observational data. We illustrate different approaches using a simplified example of a pragmatic trial and its emulation in registry data. We conclude that synthesizing trial and observational data - in transportability, benchmarking, or joint analyses - can leverage their complementary strengths to enhance learning about comparative effectiveness, through a process combining quantitative methods and epidemiological judgements.

8.
J Gen Intern Med ; 38(4): 954-960, 2023 03.
Article in English | MEDLINE | ID: mdl-36175761

ABSTRACT

BACKGROUND: Low-value healthcare is costly and inefficient and may adversely affect patient outcomes. Despite increases in low-value service use, little is known about how the receipt of low-value care differs across payers. OBJECTIVE: To evaluate differences in the use of low-value care between patients with commercial versus Medicaid coverage. DESIGN: Retrospective observational analysis of the 2017 Rhode Island All-payer Claims Database, estimating the probability of receiving each of 14 low-value services between commercial and Medicaid enrollees, adjusting for patient sociodemographic and clinical characteristics. Ensemble machine learning minimized the possibility of model misspecification. PARTICIPANTS: Medicaid and commercial enrollees aged 18-64 with continuous coverage and an encounter at which they were at risk of receiving a low-value service. INTERVENTION: Enrollment in Medicaid or Commercial insurance. MAIN MEASURES: Use of one of 14 validated measures of low-value care. KEY RESULTS: Among 110,609 patients, Medicaid enrollees were younger, had more comorbidities, and were more likely to be female than commercial enrollees. Medicaid enrollees had higher rates of use for 7 low-value care measures, and those with commercial coverage had higher rates for 5 measures. Across all measures of low-value care, commercial enrollees received more (risk difference [RD] 6.8 percentage points; CI: 6.6 to 7.0) low-value services than their counterparts with Medicaid. Commercial enrollees were also more likely to receive low-value services typically performed in the emergency room (RD 11.4 percentage points; CI: 10.7 to 12.2) and services that were less expensive (RD 15.3 percentage points; CI 14.6 to 16.0). CONCLUSION: Differences in the provision of low-value care varied across measures, though average use was slightly higher among commercial than Medicaid enrollees. This difference was more pronounced for less expensive services indicating that financial incentives may not be the sole driver of low-value care.


Subject(s)
Low-Value Care , Medicaid , United States/epidemiology , Humans , Female , Male , Retrospective Studies , Delivery of Health Care , Rhode Island
9.
Biometrics ; 79(3): 2382-2393, 2023 09.
Article in English | MEDLINE | ID: mdl-36385607

ABSTRACT

We propose methods for estimating the area under the receiver operating characteristic (ROC) curve (AUC) of a prediction model in a target population that differs from the source population that provided the data used for original model development. If covariates that are associated with model performance, as measured by the AUC, have a different distribution in the source and target populations, then AUC estimators that only use data from the source population will not reflect model performance in the target population. Here, we provide identification results for the AUC in the target population when outcome and covariate data are available from the sample of the source population, but only covariate data are available from the sample of the target population. In this setting, we propose three estimators for the AUC in the target population and show that they are consistent and asymptotically normal. We evaluate the finite-sample performance of the estimators using simulations and use them to estimate the AUC in a nationally representative target population from the National Health and Nutrition Examination Survey for a lung cancer risk prediction model developed using source population data from the National Lung Screening Trial.


Subject(s)
Models, Statistical , ROC Curve , Nutrition Surveys , Area Under Curve
10.
Biometrics ; 79(2): 1057-1072, 2023 06.
Article in English | MEDLINE | ID: mdl-35789478

ABSTRACT

We present methods for causally interpretable meta-analyses that combine information from multiple randomized trials to draw causal inferences for a target population of substantive interest. We consider identifiability conditions, derive implications of the conditions for the law of the observed data, and obtain identification results for transporting causal inferences from a collection of independent randomized trials to a new target population in which experimental data may not be available. We propose an estimator for the potential outcome mean in the target population under each treatment studied in the trials. The estimator uses covariate, treatment, and outcome data from the collection of trials, but only covariate data from the target population sample. We show that it is doubly robust in the sense that it is consistent and asymptotically normal when at least one of the models it relies on is correctly specified. We study the finite sample properties of the estimator in simulation studies and demonstrate its implementation using data from a multicenter randomized trial.


Subject(s)
Models, Statistical , Randomized Controlled Trials as Topic , Computer Simulation , Causality
11.
Eur J Epidemiol ; 38(2): 123-133, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36626100

ABSTRACT

Most work on extending (generalizing or transporting) inferences from a randomized trial to a target population has focused on estimating average treatment effects (i.e., averaged over the target population's covariate distribution). Yet, in the presence of strong effect modification by baseline covariates, the average treatment effect in the target population may be less relevant for guiding treatment decisions. Instead, the conditional average treatment effect (CATE) as a function of key effect modifiers may be a more useful estimand. Recent work on estimating target population CATEs using baseline covariate, treatment, and outcome data from the trial and covariate data from the target population only allows for the examination of heterogeneity over distinct subgroups. We describe flexible pseudo-outcome regression modeling methods for estimating target population CATEs conditional on discrete or continuous baseline covariates when the trial is embedded in a sample from the target population (i.e., in nested trial designs). We construct pointwise confidence intervals for the CATE at a specific value of the effect modifiers and uniform confidence bands for the CATE function. Last, we illustrate the methods using data from the Coronary Artery Surgery Study (CASS) to estimate CATEs given history of myocardial infarction and baseline ejection fraction value in the target population of all trial-eligible patients with stable ischemic heart disease.


Subject(s)
Myocardial Infarction , Humans , Regression Analysis , Research Design
12.
Prev Sci ; 24(8): 1648-1658, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37726579

ABSTRACT

Evidence synthesis involves drawing conclusions from trial samples that may differ from the target population of interest, and there is often heterogeneity among trials in sample characteristics, treatment implementation, study design, and assessment of covariates. Stitching together this patchwork of evidence requires subject-matter knowledge, a clearly defined target population, and guidance on how to weigh evidence from different trials. Transportability analysis has provided formal identifiability conditions required to make unbiased causal inference in the target population. In this manuscript, we review these conditions along with an additional assumption required to address systematic missing data. The identifiability conditions highlight the importance of accounting for differences in treatment effect modifiers between the populations underlying the trials and the target population. We perform simulations to evaluate the bias of conventional random effect models and multiply imputed estimates using the pooled trials sample and describe causal estimators that explicitly address trial-to-target differences in key covariates in the context of systematic missing data. Results indicate that the causal transportability estimators are unbiased when treatment effect modifiers are accounted for in the analyses. Results also highlight the importance of carefully evaluating identifiability conditions for each trial to reduce bias due to differences in participant characteristics between trials and the target population. Bias can be limited by adjusting for covariates that are strongly correlated with missing treatment effect modifiers, including data from trials that do not differ from the target on treatment modifiers, and removing trials that do differ from the target and did not assess a modifier.


Subject(s)
Health Services Needs and Demand , Research Design , Humans , Bias , Causality , Knowledge
13.
Am J Epidemiol ; 191(7): 1283-1289, 2022 06 27.
Article in English | MEDLINE | ID: mdl-34736280

ABSTRACT

In this paper, we consider methods for generating draws of a binary random variable whose expectation conditional on covariates follows a logistic regression model with known covariate coefficients. We examine approximations for finding a "balancing intercept," that is, a value for the intercept of the logistic model that leads to a desired marginal expectation for the binary random variable. We show that a recently proposed analytical approximation can produce inaccurate results, especially when targeting more extreme marginal expectations or when the linear predictor of the regression model has high variance. We then formulate the balancing intercept as a solution to an integral equation, implement a numerical approximation for solving the equation based on Monte Carlo methods, and show that the approximation works well in practice. Our approach to the basic problem of the balancing intercept provides an example of a broadly applicable strategy for formulating and solving problems that arise in the design of simulation studies used to evaluate or teach epidemiologic methods.


Subject(s)
Monte Carlo Method , Computer Simulation , Humans , Logistic Models
14.
Am J Epidemiol ; 2022 Feb 28.
Article in English | MEDLINE | ID: mdl-35225329

ABSTRACT

Methods for extending - generalizing or transporting - inferences from a randomized trial to a target population involve conditioning on a large set of covariates that is sufficient for rendering the randomized and non-randomized groups exchangeable. Yet, decision-makers are often interested in examining treatment effects in subgroups of the target population defined in terms of only a few discrete covariates. Here, we propose methods for estimating subgroup-specific potential outcome means and average treatment effects in generalizability and transportability analyses, using outcome model-based (g-formula), weighting, and augmented weighting estimators. We consider estimating subgroup-specific average treatment effects in the target population and its non-randomized subset, and provide methods that are appropriate both for nested and non-nested trial designs. As an illustration, we apply the methods to data from the Coronary Artery Surgery Study to compare the effect of surgery plus medical therapy versus medical therapy alone for chronic coronary artery disease in subgroups defined by history of myocardial infarction.

15.
Biostatistics ; 22(2): 283-297, 2021 04 10.
Article in English | MEDLINE | ID: mdl-31420983

ABSTRACT

We consider the problem of designing a confirmatory randomized trial for comparing two treatments versus a common control in two disjoint subpopulations. The subpopulations could be defined in terms of a biomarker or disease severity measured at baseline. The goal is to determine which treatments benefit which subpopulations. We develop a new class of adaptive enrichment designs tailored to solving this problem. Adaptive enrichment designs involve a preplanned rule for modifying enrollment based on accruing data in an ongoing trial. At the interim analysis after each stage, for each subpopulation, the preplanned rule may decide to stop enrollment or to stop randomizing participants to one or more study arms. The motivation for this adaptive feature is that interim data may indicate that a subpopulation, such as those with lower disease severity at baseline, is unlikely to benefit from a particular treatment while uncertainty remains for the other treatment and/or subpopulation. We optimize these adaptive designs to have the minimum expected sample size under power and Type I error constraints. We compare the performance of the optimized adaptive design versus an optimized nonadaptive (single stage) design. Our approach is demonstrated in simulation studies that mimic features of a completed trial of a medical device for treating heart failure. The optimized adaptive design has $25\%$ smaller expected sample size compared to the optimized nonadaptive design; however, the cost is that the optimized adaptive design has $8\%$ greater maximum sample size. Open-source software that implements the trial design optimization is provided, allowing users to investigate the tradeoffs in using the proposed adaptive versus standard designs.


Subject(s)
Research Design , Software , Computer Simulation , Humans , Sample Size , Uncertainty
16.
Biometrics ; 78(2): 624-635, 2022 06.
Article in English | MEDLINE | ID: mdl-33527341

ABSTRACT

We introduce causal interaction tree (CIT) algorithms for finding subgroups of individuals with heterogeneous treatment effects in observational data. The CIT algorithms are extensions of the classification and regression tree algorithm that use splitting criteria based on subgroup-specific treatment effect estimators appropriate for observational data. We describe inverse probability weighting, g-formula, and doubly robust estimators of subgroup-specific treatment effects, derive their asymptotic properties, and use them to construct splitting criteria for the CIT algorithms. We study the performance of the algorithms in simulations and implement them to analyze data from an observational study that evaluated the effectiveness of right heart catheterization for critically ill patients.


Subject(s)
Algorithms , Models, Statistical , Causality , Computer Simulation , Humans , Probability
17.
Biometrics ; 78(2): 649-659, 2022 06.
Article in English | MEDLINE | ID: mdl-33728637

ABSTRACT

In this paper, we present a method for conducting global sensitivity analysis of randomized trials in which binary outcomes are scheduled to be collected on participants at prespecified points in time after randomization and these outcomes may be missing in a nonmonotone fashion. We introduce a class of missing data assumptions, indexed by sensitivity parameters, which are anchored around the missing not at random assumption introduced by Robins (Statistics in Medicine, 1997). For each assumption in the class, we establish that the joint distribution of the outcomes is identifiable from the distribution of the observed data. Our estimation procedure uses the plug-in principle, where the distribution of the observed data is estimated using random forests. We establish n$\sqrt {n}$ asymptotic properties for our estimation procedure. We illustrate our methodology in the context of a randomized trial designed to evaluate a new approach to reducing substance use, assessed by testing urine samples twice weekly, among patients entering outpatient addiction treatment. We evaluate the finite sample properties of our method in a realistic simulation study. Our methods have been implemented in an R package entitled slabm.


Subject(s)
Research Design , Substance-Related Disorders , Computer Simulation , Data Interpretation, Statistical , Humans , Randomized Controlled Trials as Topic , Substance-Related Disorders/therapy
18.
Prev Sci ; 23(3): 403-414, 2022 04.
Article in English | MEDLINE | ID: mdl-34241752

ABSTRACT

Endowing meta-analytic results with a causal interpretation is challenging when there are differences in the distribution of effect modifiers among the populations underlying the included trials and the target population where the results of the meta-analysis will be applied. Recent work on transportability methods has described identifiability conditions under which the collection of randomized trials in a meta-analysis can be used to draw causal inferences about the target population. When the conditions hold, the methods enable estimation of causal quantities such as the average treatment effect and conditional average treatment effect in target populations that differ from the populations underlying the trial samples. The methods also facilitate comparison of treatments not directly compared in a head-to-head trial and assessment of comparative effectiveness within subgroups of the target population. We briefly describe these methods and present a worked example using individual participant data from three HIV prevention trials among adolescents in mental health care. We describe practical challenges in defining the target population, obtaining individual participant data from included trials and a sample of the target population, and addressing systematic missing data across datasets. When fully realized, methods for causally interpretable meta-analysis can provide decision-makers valid estimates of how treatments will work in target populations of substantive interest as well as in subgroups of these populations.


Subject(s)
HIV Infections , Adolescent , Causality , HIV Infections/prevention & control , Humans
19.
J Gen Intern Med ; 36(2): 265-273, 2021 02.
Article in English | MEDLINE | ID: mdl-33078300

ABSTRACT

BACKGROUND: Our objective was to assess the performance of machine learning methods to predict post-operative delirium using a prospective clinical cohort. METHODS: We analyzed data from an observational cohort study of 560 older adults (≥ 70 years) without dementia undergoing major elective non-cardiac surgery. Post-operative delirium was determined by the Confusion Assessment Method supplemented by a medical chart review (N = 134, 24%). Five machine learning algorithms and a standard stepwise logistic regression model were developed in a training sample (80% of participants) and evaluated in the remaining hold-out testing sample. We evaluated three overlapping feature sets, restricted to variables that are readily available or minimally burdensome to collect in clinical settings, including interview and medical record data. A large feature set included 71 potential predictors. A smaller set of 18 features was selected by an expert panel using a consensus process, and this smaller feature set was considered with and without a measure of pre-operative mental status. RESULTS: The area under the receiver operating characteristic curve (AUC) was higher in the large feature set conditions (range of AUC, 0.62-0.71 across algorithms) versus the selected feature set conditions (AUC range, 0.53-0.57). The restricted feature set with mental status had intermediate AUC values (range, 0.53-0.68). In the full feature set condition, algorithms such as gradient boosting, cross-validated logistic regression, and neural network (AUC = 0.71, 95% CI 0.58-0.83) were comparable with a model developed using traditional stepwise logistic regression (AUC = 0.69, 95% CI 0.57-0.82). Calibration for all models and feature sets was poor. CONCLUSIONS: We developed machine learning prediction models for post-operative delirium that performed better than chance and are comparable with traditional stepwise logistic regression. Delirium proved to be a phenotype that was difficult to predict with appreciable accuracy.


Subject(s)
Delirium , Machine Learning , Aged , Cohort Studies , Delirium/diagnosis , Delirium/epidemiology , Humans , Logistic Models , Prospective Studies
20.
Epidemiology ; 31(3): 334-344, 2020 05.
Article in English | MEDLINE | ID: mdl-32141921

ABSTRACT

We take steps toward causally interpretable meta-analysis by describing methods for transporting causal inferences from a collection of randomized trials to a new target population, one trial at a time and pooling all trials. We discuss identifiability conditions for average treatment effects in the target population and provide identification results. We show that the assumptions that allow inferences to be transported from all trials in the collection to the same target population have implications for the law underlying the observed data. We propose average treatment effect estimators that rely on different working models and provide code for their implementation in statistical software. We discuss how to use the data to examine whether transported inferences are homogeneous across the collection of trials, sketch approaches for sensitivity analysis to violations of the identifiability conditions, and describe extensions to address nonadherence in the trials. Last, we illustrate the proposed methods using data from the Hepatitis C Antiviral Long-Term Treatment Against Cirrhosis Trial.


Subject(s)
Causality , Meta-Analysis as Topic , Humans , Randomized Controlled Trials as Topic
SELECTION OF CITATIONS
SEARCH DETAIL