RESUMEN
Distributed data networks enable large-scale epidemiologic studies, but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using 2 empirical examples within a 3-site distributed data network, we tested combinations of 3 aggregate-level data-sharing approaches (risk-set, summary-table, and effect-estimate), 4 confounding adjustment methods (matching, stratification, inverse probability weighting, and matching weighting), and 2 summary scores (propensity score and disease risk score) for binary and time-to-event outcomes. We assessed the performance of combinations of these data-sharing and adjustment methods by comparing their results with results from the corresponding pooled individual-level data analysis (reference analysis). For both types of outcomes, the method combinations examined yielded results identical or comparable to the reference results in most scenarios. Within each data-sharing approach, comparability between aggregate- and individual-level data analysis depended on adjustment method; for example, risk-set data-sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across the adjustment methods examined, risk-set data-sharing generally performed better, while summary-table and effect-estimate data-sharing more often produced discrepancies in settings with rare outcomes and small sample sizes. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing of individual-level data.
Asunto(s)
Confidencialidad/normas , Agregación de Datos , Diseño de Investigaciones Epidemiológicas , Difusión de la Información/métodos , Servicios de Información , Humanos , Análisis Multivariante , Privacidad , Puntaje de PropensiónRESUMEN
BACKGROUND: This study was designed to adapt the Elixhauser comorbidity index for 4 cancer-specific populations (breast, prostate, lung, and colorectal) and compare 3 versions of the Elixhauser comorbidity score (individual comorbidities, summary comorbidity score, and cancer-specific summary comorbidity score) with 3 versions of the Charlson comorbidity score for predicting 2-year survival with 4 types of cancer. METHODS: This cohort study used Texas Cancer Registry-linked Medicare data from 2005 to 2011 for older patients diagnosed with breast (n = 19,082), prostate (n = 23,044), lung (n = 26,047), or colorectal cancer (n = 16,693). For each cancer cohort, the data were split into training and validation cohorts. In the training cohort, competing risk regression was used to model the association of Elixhauser comorbidities with 2-year noncancer mortality, and cancer-specific weights were derived for each comorbidity. In the validation cohort, competing risk regression was used to compare 3 versions of the Elixhauser comorbidity score with 3 versions of the Charlson comorbidity score. Model performance was evaluated with c statistics. RESULTS: The 2-year noncancer mortality rates were 14.5% (lung cancer), 11.5% (colorectal cancer), 5.7% (breast cancer), and 4.1% (prostate cancer). Cancer-specific Elixhauser comorbidity scores (c = 0.773 for breast cancer, c = 0.772 for prostate cancer, c = 0.579 for lung cancer, and c = 0.680 for colorectal cancer) performed slightly better than cancer-specific Charlson comorbidity scores (ie, the National Cancer Institute combined index; c = 0.762 for breast cancer, c = 0.767 for prostate cancer, c = 0.578 for lung cancer, and c = 0.674 for colorectal cancer). Individual Elixhauser comorbidities performed best (c = 0.779 for breast cancer, c = 0.783 for prostate cancer, c = 0.587 for lung cancer, and c = 0.687 for colorectal cancer). CONCLUSIONS: The cancer-specific Elixhauser comorbidity score performed as well as or slightly better than the cancer-specific Charlson comorbidity score in predicting 2-year survival. If the sample size permits, using individual Elixhauser comorbidities may be the best way to control for confounding in cancer outcomes research. Cancer 2018;124:2018-25. © 2018 American Cancer Society.
Asunto(s)
Comorbilidad , Indicadores de Salud , Neoplasias/epidemiología , Anciano , Anciano de 80 o más Años , Estudios de Cohortes , Femenino , Humanos , Masculino , Medicare/estadística & datos numéricos , Pronóstico , Estudios Retrospectivos , Medición de Riesgo/métodos , Análisis de Supervivencia , Tasa de Supervivencia , Texas/epidemiología , Estados Unidos/epidemiologíaRESUMEN
PURPOSE: We use simulations and an empirical example to evaluate the performance of disease risk score (DRS) matching compared with propensity score (PS) matching when controlling large numbers of covariates in settings involving newly introduced treatments. METHODS: We simulated a dichotomous treatment, a dichotomous outcome, and 100 baseline covariates that included both continuous and dichotomous random variables. For the empirical example, we evaluated the comparative effectiveness of dabigatran versus warfarin in preventing combined ischemic stroke and all-cause mortality. We matched treatment groups on a historically estimated DRS and again on the PS. We controlled for a high-dimensional set of covariates using 20% and 1% samples of Medicare claims data from October 2010 through December 2012. RESULTS: In simulations, matching on the DRS versus the PS generally yielded matches for more treated individuals and improved precision of the effect estimate. For the empirical example, PS and DRS matching in the 20% sample resulted in similar hazard ratios (0.88 and 0.87) and standard errors (0.04 for both methods). In the 1% sample, PS matching resulted in matches for only 92.0% of the treated population and a hazard ratio and standard error of 0.89 and 0.19, respectively, while DRS matching resulted in matches for 98.5% and a hazard ratio and standard error of 0.85 and 0.16, respectively. CONCLUSIONS: When PS distributions are separated, DRS matching can improve the precision of effect estimates and allow researchers to evaluate the treatment effect in a larger proportion of the treated population. However, accurately modeling the DRS can be challenging compared with the PS.
Asunto(s)
Investigación sobre la Eficacia Comparativa/métodos , Simulación por Computador , Dabigatrán/uso terapéutico , Puntaje de Propensión , Warfarina/uso terapéutico , Anciano , Anciano de 80 o más Años , Fibrilación Atrial/tratamiento farmacológico , Fibrilación Atrial/mortalidad , Femenino , Humanos , Masculino , Mortalidad/tendencias , Farmacoepidemiología/métodos , Accidente Cerebrovascular/mortalidad , Accidente Cerebrovascular/prevención & control , Resultado del Tratamiento , Estados Unidos/epidemiologíaRESUMEN
PURPOSE: Little is known about how disease risk score (DRS) development should proceed under different pharmacoepidemiologic follow-up strategies. In an analysis of dabigatran vs. warfarin and risk of major bleeding, we compared the results of DRS adjustment when models were developed under "intention-to-treat" (ITT) and "as-treated" (AT) approaches. METHODS: We assessed DRS model discrimination, calibration, and ability to induce prognostic balance via the "dry run analysis". AT treatment effects stratified on each DRS were compared with each other and with a propensity score (PS) stratified reference estimate. Bootstrap resampling of the historical cohort at 10 percent-90 percent sample size was performed to assess the impact of sample size on DRS estimation. RESULTS: Historically-derived DRS models fit under AT showed greater decrements in discrimination and calibration than those fit under ITT when applied to the concurrent study population. Prognostic balance was approximately equal across DRS models (-6 percent to -7 percent "pseudo-bias" on the hazard ratio scale). Hazard ratios were between 0.76 and 0.78 with all methods of DRS adjustment, while the PS stratified hazard ratio was 0.83. In resampling, AT DRS models showed more overfitting and worse prognostic balance, and led to hazard ratios further from the reference estimate than did ITT DRSs, across sample sizes. CONCLUSIONS: In a study of anticoagulant safety, DRSs developed under an AT principle showed signs of overfitting and reduced confounding control. More research is needed to determine if development of DRSs under ITT is a viable solution to overfitting in other settings.
RESUMEN
Confounding affects the causal relation among the population. Depending on whether the confounders are known, measurable or measured, they can be divided into four categories. Based on Directed Acyclic Graphs, the strategies for confounding control can be classified as (1) the broken-confounding-path method, which can be further divided into single and dual broken paths, corresponding to exposure complete intervention, restriction and stratification, (2) and the reserved-confounding-path method, which can be further divided into incomplete exposure intervention (in instrumental variable design and non-perfect random control test), mediator method and matching method. Among them, random control test, instrumental variable design or Mendelian randomized design, mediator method can meet the requirements for controlling all four types of confounders, while the restriction, stratification and matching methods are only applicable to known, measurable and measured confounders. Identifying the mechanisms of confounding control is a prerequisite for obtaining correct causal effect estimates, which will be helpful in research design.
Asunto(s)
Causalidad , Factores de Confusión Epidemiológicos , Distribución Aleatoria , Proyectos de Investigación , Humanos , Modelos Estadísticos , Ensayos Clínicos Controlados Aleatorios como AsuntoRESUMEN
Subgroup and stratification analyses have been widely applied in genetic association studies to compare the effects of different factors or control for the effects of the confounding variables associated with a disease. However, studies have not systematically provided application standards and computing methods for stratification analyses. Based on the Mantel-Haenszel and Inverse-Variant approaches and two practical computing methods described in previous studies, we propose a standard stratification method for meta-analyses that contains two sequential steps: factorial stratification analysis and confounder-controlling stratification analysis. Examples of genetic association meta-analyses are used to illustrate these points. The standard stratification analysis method identifies interacting effects on investigated factors and controls for confounding variables, and this method effectively reveals the real effects of these factors and confounding variables on a disease in an overall study population. We also discuss important issues concerning stratification for meta-analyses, such as conceptual confusion between subgroup and stratification analyses, and incorrect calculations previously used for factorial stratification analyses. This standard stratification method will have extensive applications in future research for increasing studies on the complicated relationships between genetics and disease.