RESUMEN
Least absolute shrinkage and selection operator (LASSO) regression is widely used for large-scale propensity score (PS) estimation in health-care database studies. In these settings, previous work has shown that undersmoothing (overfitting) LASSO PS models can improve confounding control, but it can also cause problems of nonoverlap in covariate distributions. It remains unclear how to select the degree of undersmoothing when fitting large-scale LASSO PS models to improve confounding control while avoiding issues that can result from reduced covariate overlap. Here, we used simulations to evaluate the performance of using collaborative-controlled targeted learning to data-adaptively select the degree of undersmoothing when fitting large-scale PS models within both singly and doubly robust frameworks to reduce bias in causal estimators. Simulations showed that collaborative learning can data-adaptively select the degree of undersmoothing to reduce bias in estimated treatment effects. Results further showed that when fitting undersmoothed LASSO PS models, the use of cross-fitting was important for avoiding nonoverlap in covariate distributions and reducing bias in causal estimates.
Asunto(s)
Puntaje de Propensión , Humanos , Bases de Datos Factuales , Simulación por Computador , Sesgo , Modelos Estadísticos , Factores de Confusión EpidemiológicosRESUMEN
We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.
Asunto(s)
Anafilaxia , Procesamiento de Lenguaje Natural , Humanos , Anafilaxia/diagnóstico , Anafilaxia/epidemiología , Aprendizaje Automático , Algoritmos , Servicio de Urgencia en Hospital , Registros Electrónicos de SaludRESUMEN
BACKGROUND: The Targeted Learning roadmap provides a systematic guide for generating and evaluating real-world evidence (RWE). From a regulatory perspective, RWE arises from diverse sources such as randomized controlled trials that make use of real-world data, observational studies, and other study designs. This paper illustrates a principled approach to assessing the validity and interpretability of RWE. METHODS: We applied the roadmap to a published observational study of the dose-response association between ritodrine hydrochloride and pulmonary edema among women pregnant with twins in Japan. The goal was to identify barriers to causal effect estimation beyond unmeasured confounding reported by the study's authors, and to explore potential options for overcoming the barriers that robustify results. RESULTS: Following the roadmap raised issues that led us to formulate alternative causal questions that produced more reliable, interpretable RWE. The process revealed a lack of information in the available data to identify a causal dose-response curve. However, under explicit assumptions the effect of treatment with any amount of ritodrine versus none, albeit a less ambitious parameter, can be estimated from data. CONCLUSIONS: Before RWE can be used in support of clinical and regulatory decision-making, its quality and reliability must be systematically evaluated. The TL roadmap prescribes how to carry out a thorough, transparent, and realistic assessment of RWE. We recommend this approach be a routine part of any decision-making process.
Asunto(s)
Proyectos de Investigación , Femenino , Humanos , Reproducibilidad de los Resultados , Japón , Ensayos Clínicos Controlados Aleatorios como AsuntoRESUMEN
Inverse probability weighting (IPW) and targeted maximum likelihood estimation (TMLE) are methodologies that can adjust for confounding and selection bias and are often used for causal inference. Both estimators rely on the positivity assumption that within strata of confounders there is a positive probability of receiving treatment at all levels under consideration. Practical applications of IPW require finite inverse probability (IP) weights. TMLE requires that propensity scores (PS) be bounded away from 0 and 1. Although truncation can improve variance and finite sample bias, this artificial distortion of the IP weights and PS distribution introduces asymptotic bias. As sample size grows, truncation-induced bias eventually swamps variance, rendering nominal confidence interval coverage and hypothesis tests invalid. We present a simple truncation strategy based on the sample size, n, that sets the upper bound on IP weights at $\sqrt{\textit{n}}$ ln n/5. For TMLE, the lower bound on the PS should be set to 5/($\sqrt{\textit{n}}$ ln n/5). Our strategy was designed to optimize the mean squared error of the parameter estimate. It naturally extends to data structures with missing outcomes. Simulation studies and a data analysis demonstrate our strategy's ability to minimize both bias and mean squared error in comparison with other common strategies, including the popular but flawed quantile-based heuristic.
Asunto(s)
Puntaje de Propensión , Sesgo , Causalidad , Simulación por Computador , Humanos , Funciones de VerosimilitudRESUMEN
BACKGROUND: Anaphylaxis is a life-threatening allergic reaction that is difficult to identify accurately with administrative data. We conducted a population-based validation study to assess the accuracy of ICD-10 diagnosis codes for anaphylaxis in outpatient, emergency department, and inpatient settings. METHODS: In an integrated healthcare system in Washington State, we obtained medical records from healthcare encounters with anaphylaxis diagnosis codes (potential events) from October 2015 to December 2018. To capture events missed by anaphylaxis diagnosis codes, we also obtained records on a sample of serious allergic and drug reactions. Two physicians determined whether potential events met established clinical criteria for anaphylaxis (validated events). RESULTS: Out of 239 potential events with anaphylaxis diagnosis codes, the overall positive predictive value (PPV) for validated events was 64% (95% CI = 58 to 70). The PPV decreased with increasing age. Common precipitants for anaphylaxis were food (39%), medications (35%), and insect bite or sting (12%). The sensitivity of emergency department and inpatient anaphylaxis diagnosis codes for all validated events was 58% (95% CI = 51 to 65), but sensitivity increased to 95% (95% CI = 74 to 99) when outpatient diagnosis codes were included. Using information from all validated events and sampling weights, the incidence rate for anaphylaxis was 3.6 events per 10,000 person-years (95% CI = 3.1 to 4.0). CONCLUSIONS: In this population-based setting, ICD-10 diagnosis codes for anaphylaxis from emergency department and inpatient settings had moderate PPV and sensitivity for validated events. These findings have implications for epidemiologic studies that seek to estimate risks of anaphylaxis using electronic health data.
Asunto(s)
Anafilaxia , Anafilaxia/diagnóstico , Anafilaxia/epidemiología , Registros Electrónicos de Salud , Humanos , Clasificación Internacional de Enfermedades , Valor Predictivo de las Pruebas , Washingtón/epidemiologíaRESUMEN
BACKGROUND: A substantial fraction of sexually transmitted infections (STIs) occur in patients who have previously been treated for an STI. We assessed whether routine electronic health record (EHR) data can predict which patients presenting with an incident STI are at greatest risk for additional STIs in the next 1 to 2 years. METHODS: We used structured EHR data on patients 15 years or older who acquired an incident STI diagnosis in 2008 to 2015 in eastern Massachusetts. We applied machine learning algorithms to model risk of acquiring ≥1 or ≥2 additional STIs diagnoses within 365 or 730 days after the initial diagnosis using more than 180 different EHR variables. We performed sensitivity analysis incorporating state health department surveillance data to assess whether improving the accuracy of identifying STI cases improved algorithm performance. RESULTS: We identified 8723 incident episodes of laboratory-confirmed gonorrhea, chlamydia, or syphilis. Bayesian Additive Regression Trees, the best-performing algorithm of any single method, had a cross-validated area under the receiver operating curve of 0.75. Receiver operating curves for this algorithm showed a poor balance between sensitivity and positive predictive value (PPV). A predictive probability threshold with a sensitivity of 91.5% had a corresponding PPV of 3.9%. A higher threshold with a PPV of 29.5% had a sensitivity of 11.7%. Attempting to improve the classification of patients with and without repeat STIs diagnoses by incorporating health department surveillance data had minimal impact on cross-validated area under the receiver operating curve. CONCLUSIONS: Machine algorithms using structured EHR data did not differentiate well between patients with and without repeat STIs diagnosis. Alternative strategies, able to account for sociobehavioral characteristics, could be explored.
Asunto(s)
Infecciones por Chlamydia , Gonorrea , Infecciones por VIH , Enfermedades de Transmisión Sexual , Sífilis , Teorema de Bayes , Infecciones por Chlamydia/diagnóstico , Infecciones por Chlamydia/epidemiología , Gonorrea/diagnóstico , Gonorrea/epidemiología , Humanos , Aprendizaje Automático , Massachusetts/epidemiología , Enfermedades de Transmisión Sexual/diagnóstico , Enfermedades de Transmisión Sexual/epidemiología , Sífilis/diagnóstico , Sífilis/epidemiologíaRESUMEN
We use simulated data to examine the consequences of depletion of susceptibles for hazard ratio (HR) estimators based on a propensity score (PS). First, we show that the depletion of susceptibles attenuates marginal HRs toward the null by amounts that increase with the incidence of the outcome, the variance of susceptibility, and the impact of susceptibility on the outcome. If susceptibility is binary then the Bross bias multiplier, originally intended to quantify bias in a risk ratio from a binary confounder, also quantifies the ratio of the instantaneous marginal HR to the conditional HR as susceptibles are depleted differentially. Second, we show how HR estimates that are conditioned on a PS tend to be between the true conditional and marginal HRs, closer to the conditional HR if treatment status is strongly associated with susceptibility and closer to the marginal HR if treatment status is weakly associated with susceptibility. We show that associations of susceptibility with the PS matter to the marginal HR in the treated (ATT) though not to the marginal HR in the entire cohort (ATE). Third, we show how the PS can be updated periodically to reduce depletion-of-susceptibles bias in conditional estimators. Although marginal estimators can hit their ATE or ATT targets consistently without updating the PS, we show how their targets themselves can be misleading as they are attenuated toward the null. Finally, we discuss implications for the interpretation of HRs and their relevance to underlying scientific and clinical questions. See video Abstract: http://links.lww.com/EDE/B727.
Asunto(s)
Sesgo , Puntaje de Propensión , Modelos de Riesgos Proporcionales , Estudios de Cohortes , HumanosRESUMEN
Human immunodeficiency virus (HIV) pre-exposure prophylaxis (PrEP) protects high risk patients from becoming infected with HIV. Clinicians need help to identify candidates for PrEP based on information routinely collected in electronic health records (EHRs). The greatest statistical challenge in developing a risk prediction model is that acquisition is extremely rare. METHODS: Data consisted of 180 covariates (demographic, diagnoses, treatments, prescriptions) extracted from records on 399 385 patient (150 cases) seen at Atrius Health (2007-2015), a clinical network in Massachusetts. Super learner is an ensemble machine learning algorithm that uses k-fold cross validation to evaluate and combine predictions from a collection of algorithms. We trained 42 variants of sophisticated algorithms, using different sampling schemes that more evenly balanced the ratio of cases to controls. We compared super learner's cross validated area under the receiver operating curve (cv-AUC) with that of each individual algorithm. RESULTS: The least absolute shrinkage and selection operator (lasso) using a 1:20 class ratio outperformed the super learner (cv-AUC = 0.86 vs 0.84). A traditional logistic regression model restricted to 23 clinician-selected main terms was slightly inferior (cv-AUC = 0.81). CONCLUSION: Machine learning was successful at developing a model to predict 1-year risk of acquiring HIV based on a physician-curated set of predictors extracted from EHRs.
Asunto(s)
Infecciones por VIH , Profilaxis Pre-Exposición , Registros Electrónicos de Salud , VIH , Infecciones por VIH/prevención & control , Humanos , Aprendizaje AutomáticoRESUMEN
Introduction: The role of acute mood states as mediating factors in cognitive impairment in patients with mania or depression is not sufficiently clear. Similarly, the extent to which cognitive impairment is trait or state-specific remains an open question. Therefore, the aim of this study was to investigate the effect of a mood-induction on attention in patients with an affective disorder.Methods: Twenty-two depressed bipolar patients, 10 manic bipolar patients, 17 with a depressive episode (MDE), and 24 healthy controls performed the Attention-Network-Test (ANT). In a within-participants design, elated and sad moods were induced by an autobiographic recall and measured on a self-report scale. Subsequently, participants performed the ANT again.Results: The modulating effect of the elated mood induction on attention was small. Only the MDE group displayed moderate improvements in selective attention and tonic alertness. Surprisingly, after the sad mood induction, patients with MDE improved moderately on phasic and tonic alertness. Phasic alertness was also enhanced in patients with mania. Finally, after the mood induction, patients with MDE showed the largest variability in attentional performance.Conclusions: Results showed only small effects of mood induction on attention. This supports the view that attention deficits reflect trait variables.
Asunto(s)
Afecto/fisiología , Atención/fisiología , Trastorno Bipolar/psicología , Trastorno Depresivo/psicología , Desempeño Psicomotor/fisiología , Adulto , Trastorno Bipolar/fisiopatología , Trastorno Depresivo/fisiopatología , Femenino , Humanos , Masculino , Recuerdo Mental/fisiología , Persona de Mediana Edad , AutoinformeRESUMEN
Postapproval drug safety studies often use propensity scores (PSs) to adjust for a large number of baseline confounders. These studies may involve examining whether treatment safety varies across subgroups. There are many ways a PS could be used to adjust for confounding in subgroup analyses. These methods have trade-offs that are not well understood. We conducted a plasmode simulation to compare relative performance of 5 methods involving PS matching for subgroup analysis, including methods frequently used in applied literature whose performance has not been previously directly compared. These methods varied as to whether the overall PS, subgroup-specific PS, or no rematching was used in subgroup analysis as well as whether subgroups were fully nested within the main analytical cohort. The evaluated PS subgroup matching methods performed similarly in terms of balance, bias, and precision in 12 simulated scenarios varying size of the cohort, prevalence of exposure and outcome, strength of relationships between baseline covariates and exposure, the true effect within subgroups, and the degree of confounding within subgroups. Each had strengths and limitations with respect to other performance metrics that could inform choice of method.
Asunto(s)
Vigilancia de Productos Comercializados/métodos , Puntaje de Propensión , Proyectos de Investigación , Antagonistas Adrenérgicos/efectos adversos , Anciano , Anciano de 80 o más Años , Angioedema/inducido químicamente , Inhibidores de la Enzima Convertidora de Angiotensina/efectos adversos , Simulación por Computador , Femenino , Humanos , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Estados Unidos , United States Food and Drug AdministrationRESUMEN
PURPOSE: Privacy-protecting analytic and data-sharing methods that minimize the disclosure risk of sensitive information are increasingly important due to the growing interest in utilizing data across multiple sources. We conducted a simulation study to examine how avoiding sharing individual-level data in a distributed data network can affect analytic results. METHODS: The base scenario had four sites of varying sizes with 5% outcome incidence, 50% treatment prevalence, and seven confounders. We varied treatment prevalence, outcome incidence, treatment effect, site size, number of sites, and covariate distribution. Confounding adjustment was conducted using propensity score or disease risk score. We compared analyses of three types of aggregate-level data requested from sites: risk-set, summary-table, or effect-estimate data (meta-analysis) with benchmark results of analysis of pooled individual-level data. We assessed bias and precision of hazard ratio estimates as well as the accuracy of standard error estimates. RESULTS: All the aggregate-level data-sharing approaches, regardless of confounding adjustment methods, successfully approximated pooled individual-level data analysis in most simulation scenarios. Meta-analysis showed minor bias when using inverse probability of treatment weights (IPTW) in infrequent exposure (5%), rare outcome (0.01%), and small site (5,000 patients) settings. SE estimates became less accurate for IPTW risk-set approach with less frequent exposure and for propensity score-matching meta-analysis approach with rare outcomes. CONCLUSIONS: Overall, we found that we can avoid sharing individual-level data and obtain valid results in many settings, although care must be taken with meta-analysis approach in infrequent exposure and rare outcome scenarios, particularly when confounding adjustment is performed with IPTW.
Asunto(s)
Seguridad Computacional , Confidencialidad , Análisis de Datos , Difusión de la Información/métodos , Sesgo , Simulación por Computador , HumanosRESUMEN
BACKGROUND: Many patients started on antibiotics for possible ventilator-associated pneumonia (VAP) do not have pneumonia. Patients with minimal and stable ventilator settings may be suitable candidates for early antibiotic discontinuation. We compared outcomes among patients with suspected VAP but minimal and stable ventilator settings treated with 1-3 days vs >3 days of antibiotics. METHODS: We identified consecutive adult patients started on antibiotics for possible VAP with daily minimum positive end-expiratory pressure of ≤5 cm H2O and fraction of inspired oxygen ≤40% for at least 3 days within a large tertiary care hospital between 2006 and 2014. We compared time to extubation alive vs ventilator death and time to hospital discharge alive vs hospital death using competing risks models among patients prescribed 1-3 days vs >3 days of antibiotics. All models were adjusted for patient demographics, comorbidities, severity of illness, clinical signs of infection, and pathogens. RESULTS: There were 1290 eligible patients, 259 treated for 1-3 days and 1031 treated for >3 days. The 2 groups had similar demographics, comorbidities, and clinical signs. There were no significant differences between groups in time to extubation alive (hazard ratio [HR], 1.16 for short- vs long-course treatment; 95% confidence interval [CI], .98-1.36), ventilator death (HR, 0.82 [95% CI, .55-1.22]), time to hospital discharge alive (HR, 1.07 [95% CI, .91-1.26]), or hospital death (HR, 0.99 [95% CI, .75-1.31]). CONCLUSIONS: Very short antibiotic courses (1-3 days) were associated with outcomes similar to longer courses (>3 days) in patients with suspected VAP but minimal and stable ventilator settings. Assessing serial ventilator settings may help clinicians identify candidates for early antibiotic discontinuation.
Asunto(s)
Antibacterianos/uso terapéutico , Neumonía Asociada al Ventilador/tratamiento farmacológico , Anciano , Antibacterianos/administración & dosificación , Biomarcadores , Comorbilidad , Femenino , Humanos , Masculino , Persona de Mediana Edad , Neumonía Asociada al Ventilador/diagnóstico , Neumonía Asociada al Ventilador/microbiología , Modelos de Riesgos Proporcionales , Estudios Retrospectivos , Factores de Tiempo , Resultado del TratamientoRESUMEN
Importance: Estimates from claims-based analyses suggest that the incidence of sepsis is increasing and mortality rates from sepsis are decreasing. However, estimates from claims data may lack clinical fidelity and can be affected by changing diagnosis and coding practices over time. Objective: To estimate the US national incidence of sepsis and trends using detailed clinical data from the electronic health record (EHR) systems of diverse hospitals. Design, Setting, and Population: Retrospective cohort study of adult patients admitted to 409 academic, community, and federal hospitals from 2009-2014. Exposures: Sepsis was identified using clinical indicators of presumed infection and concurrent acute organ dysfunction, adapting Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) criteria for objective and consistent EHR-based surveillance. Main Outcomes and Measures: Sepsis incidence, outcomes, and trends from 2009-2014 were calculated using regression models and compared with claims-based estimates using International Classification of Diseases, Ninth Revision, Clinical Modification codes for severe sepsis or septic shock. Case-finding criteria were validated against Sepsis-3 criteria using medical record reviews. Results: A total of 173â¯690 sepsis cases (mean age, 66.5 [SD, 15.5] y; 77â¯660 [42.4%] women) were identified using clinical criteria among 2â¯901â¯019 adults admitted to study hospitals in 2014 (6.0% incidence). Of these, 26â¯061 (15.0%) died in the hospital and 10â¯731 (6.2%) were discharged to hospice. From 2009-2014, sepsis incidence using clinical criteria was stable (+0.6% relative change/y [95% CI, -2.3% to 3.5%], P = .67) whereas incidence per claims increased (+10.3%/y [95% CI, 7.2% to 13.3%], P < .001). In-hospital mortality using clinical criteria declined (-3.3%/y [95% CI, -5.6% to -1.0%], P = .004), but there was no significant change in the combined outcome of death or discharge to hospice (-1.3%/y [95% CI, -3.2% to 0.6%], P = .19). In contrast, mortality using claims declined significantly (-7.0%/y [95% CI, -8.8% to -5.2%], P < .001), as did death or discharge to hospice (-4.5%/y [95% CI, -6.1% to -2.8%], P < .001). Clinical criteria were more sensitive in identifying sepsis than claims (69.7% [95% CI, 52.9% to 92.0%] vs 32.3% [95% CI, 24.4% to 43.0%], P < .001), with comparable positive predictive value (70.4% [95% CI, 64.0% to 76.8%] vs 75.2% [95% CI, 69.8% to 80.6%], P = .23). Conclusions and Relevance: In clinical data from 409 hospitals, sepsis was present in 6% of adult hospitalizations, and in contrast to claims-based analyses, neither the incidence of sepsis nor the combined outcome of death or discharge to hospice changed significantly between 2009-2014. The findings also suggest that EHR-based clinical data provide more objective estimates than claims-based data for sepsis surveillance.
Asunto(s)
Registros Electrónicos de Salud , Sepsis/epidemiología , Adulto , Anciano , Codificación Clínica , Femenino , Mortalidad Hospitalaria/tendencias , Hospitalización/tendencias , Humanos , Incidencia , Formulario de Reclamación de Seguro , Masculino , Auditoría Médica , Persona de Mediana Edad , Mortalidad/tendencias , Estudios Retrospectivos , Sepsis/mortalidad , Estados Unidos/epidemiologíaRESUMEN
Controversy over non-reproducible published research reporting a statistically significant result has produced substantial discussion in the literature. p-value calibration is a recently proposed procedure for adjusting p-values to account for both random and systematic errors that address one aspect of this problem. The method's validity rests on the key assumption that bias in an effect estimate is drawn from a normal distribution whose mean and variance can be correctly estimated. We investigated the method's control of type I and type II error rates using simulated and real-world data. Under mild violations of underlying assumptions, control of the type I error rate can be conservative, while under more extreme departures, it can be anti-conservative. The extent to which the assumption is violated in real-world data analyses is unknown. Barriers to testing the plausibility of the assumption using historical data are discussed. Our studies of the type II error rate using simulated and real-world electronic health care data demonstrated that calibrating p-values can substantially increase the type II error rate. The use of calibrated p-values may reduce the number of false-positive results, but there will be a commensurate drop in the ability to detect a true safety or efficacy signal. While p-value calibration can sometimes offer advantages in controlling the type I error rate, its adoption for routine use in studies of real-world health care datasets is premature. Separate characterizations of random and systematic errors provide a richer context for evaluating uncertainty surrounding effect estimates. Copyright © 2016 John Wiley & Sons, Ltd.
Asunto(s)
Sesgo , Interpretación Estadística de Datos , Calibración , Atención a la Salud , Humanos , Estudios Observacionales como AsuntoRESUMEN
BACKGROUND: We reviewed the results of the Observational Medical Outcomes Research Partnership (OMOP) 2010 Experiment in hopes of finding examples where apparently well-designed drug studies repeatedly produce anomalous findings. OMOP had applied thousands of designs and design parameters to 53 drug-outcome pairs across 10 electronic data resources. Our intent was to use this repository to elucidate some sources of error in observational studies. METHOD: From the 2010 OMOP Experiment, we sought drug-outcome-method combinations (DOMCs) that met consensus design criteria, yet repeatedly produced results contrary to expectation. We set aside DOMCs for which we could not agree on the suitability of the designs, then selected for an in-depth scrutiny one drug-outcome pair analyzed by a seemingly plausible methodological approach, whose results consistently disagreed with the a priori expectation. RESULTS: The OMOP "all-by-all" assessment of possible DOMCs yielded many combinations that would not be chosen by researchers as actual study options. Among those that passed a first level of scrutiny, two of seven drug-outcome pairs for which there were plausible research designs had anomalous results. The use of benzodiazepines was unexpectedly associated with acute renal failure and upper gastrointestinal bleeding. We chose the latter as an example for in-depth study. The factitious appearance of a bleeding risk may have been partly driven by an excess of procedures on the first day of treatment. A risk window definition that excluded the first day largely removed the spurious association. CONCLUSION: One cause of reproducible "error" may be repeated failure to tie design choices closely enough to the research question at hand. Copyright © 2016 John Wiley & Sons, Ltd.
Asunto(s)
Estudios Observacionales como Asunto/métodos , Evaluación de Resultado en la Atención de Salud/métodos , Vigilancia de Productos Comercializados/métodos , Proyectos de Investigación , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/diagnóstico , Humanos , Estudios Observacionales como Asunto/normas , Reproducibilidad de los ResultadosRESUMEN
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results.
Asunto(s)
Terapia Antirretroviral Altamente Activa/estadística & datos numéricos , Infecciones por VIH/tratamiento farmacológico , Aprendizaje Automático , Modelos Estadísticos , Sesgo , Simulación por Computador , Intervalos de Confianza , Factores de Confusión Epidemiológicos , Interpretación Estadística de Datos , Infecciones por VIH/mortalidad , Infecciones por VIH/prevención & control , Humanos , Modelos Logísticos , Mortalidad/tendencias , Probabilidad , EspañaRESUMEN
OBJECTIVE: To present a general framework providing high-level guidance to developers of computable algorithms for identifying patients with specific clinical conditions (phenotypes) through a variety of approaches, including but not limited to machine learning and natural language processing methods to incorporate rich electronic health record data. MATERIALS AND METHODS: Drawing on extensive prior phenotyping experiences and insights derived from 3 algorithm development projects conducted specifically for this purpose, our team with expertise in clinical medicine, statistics, informatics, pharmacoepidemiology, and healthcare data science methods conceptualized stages of development and corresponding sets of principles, strategies, and practical guidelines for improving the algorithm development process. RESULTS: We propose 5 stages of algorithm development and corresponding principles, strategies, and guidelines: (1) assessing fitness-for-purpose, (2) creating gold standard data, (3) feature engineering, (4) model development, and (5) model evaluation. DISCUSSION AND CONCLUSION: This framework is intended to provide practical guidance and serve as a basis for future elaboration and extension.