RESUMO
OBJECTIVES: Traditional methods for medical device post-market surveillance often fail to accurately account for operator learning effects, leading to biased assessments of device safety. These methods struggle with non-linearity, complex learning curves, and time-varying covariates, such as physician experience. To address these limitations, we sought to develop a machine learning (ML) framework to detect and adjust for operator learning effects. MATERIALS AND METHODS: A gradient-boosted decision tree ML method was used to analyze synthetic datasets that replicate the complexity of clinical scenarios involving high-risk medical devices. We designed this process to detect learning effects using a risk-adjusted cumulative sum method, quantify the excess adverse event rate attributable to operator inexperience, and adjust for these alongside patient factors in evaluating device safety signals. To maintain integrity, we employed blinding between data generation and analysis teams. Synthetic data used underlying distributions and patient feature correlations based on clinical data from the Department of Veterans Affairs between 2005 and 2012. We generated 2494 synthetic datasets with widely varying characteristics including number of patient features, operators and institutions, and the operator learning form. Each dataset contained a hypothetical study device, Device B, and a reference device, Device A. We evaluated accuracy in identifying learning effects and identifying and estimating the strength of the device safety signal. Our approach also evaluated different clinically relevant thresholds for safety signal detection. RESULTS: Our framework accurately identified the presence or absence of learning effects in 93.6% of datasets and correctly determined device safety signals in 93.4% of cases. The estimated device odds ratios' 95% confidence intervals were accurately aligned with the specified ratios in 94.7% of datasets. In contrast, a comparative model excluding operator learning effects significantly underperformed in detecting device signals and in accuracy. Notably, our framework achieved 100% specificity for clinically relevant safety signal thresholds, although sensitivity varied with the threshold applied. DISCUSSION: A machine learning framework, tailored for the complexities of post-market device evaluation, may provide superior performance compared to standard parametric techniques when operator learning is present. CONCLUSION: Demonstrating the capacity of ML to overcome complex evaluative challenges, our framework addresses the limitations of traditional statistical methods in current post-market surveillance processes. By offering a reliable means to detect and adjust for learning effects, it may significantly improve medical device safety evaluation.
RESUMO
BACKGROUND: Validating new algorithms, such as methods to disentangle intrinsic treatment risk from risk associated with experiential learning of novel treatments, often requires knowing the ground truth for data characteristics under investigation. Since the ground truth is inaccessible in real world data, simulation studies using synthetic datasets that mimic complex clinical environments are essential. We describe and evaluate a generalizable framework for injecting hierarchical learning effects within a robust data generation process that incorporates the magnitude of intrinsic risk and accounts for known critical elements in clinical data relationships. METHODS: We present a multi-step data generating process with customizable options and flexible modules to support a variety of simulation requirements. Synthetic patients with nonlinear and correlated features are assigned to provider and institution case series. The probability of treatment and outcome assignment are associated with patient features based on user definitions. Risk due to experiential learning by providers and/or institutions when novel treatments are introduced is injected at various speeds and magnitudes. To further reflect real-world complexity, users can request missing values and omitted variables. We illustrate an implementation of our method in a case study using MIMIC-III data for reference patient feature distributions. RESULTS: Realized data characteristics in the simulated data reflected specified values. Apparent deviations in treatment effects and feature distributions, though not statistically significant, were most common in small datasets (n < 3000) and attributable to random noise and variability in estimating realized values in small samples. When learning effects were specified, synthetic datasets exhibited changes in the probability of an adverse outcomes as cases accrued for the treatment group impacted by learning and stable probabilities as cases accrued for the treatment group not affected by learning. CONCLUSIONS: Our framework extends clinical data simulation techniques beyond generation of patient features to incorporate hierarchical learning effects. This enables the complex simulation studies required to develop and rigorously test algorithms developed to disentangle treatment safety signals from the effects of experiential learning. By supporting such efforts, this work can help identify training opportunities, avoid unwarranted restriction of access to medical advances, and hasten treatment improvements.
Assuntos
Aprendizado Profundo , Humanos , Simulação por Computador , AlgoritmosRESUMO
BACKGROUND: Up to 14% of patients in the United States undergoing cardiac catheterization each year experience AKI. Consistent use of risk minimization preventive strategies may improve outcomes. We hypothesized that team-based coaching in a Virtual Learning Collaborative (Collaborative) would reduce postprocedural AKI compared with Technical Assistance (Assistance), both with and without Automated Surveillance Reporting (Surveillance). METHODS: The IMPROVE AKI trial was a 2×2 factorial cluster-randomized trial across 20 Veterans Affairs medical centers (VAMCs). Participating VAMCs received Assistance, Assistance with Surveillance, Collaborative, or Collaborative with Surveillance for 18 months to implement AKI prevention strategies. The Assistance and Collaborative approaches promoted hydration and limited NPO and contrast dye dosing. We fit logistic regression models for AKI with site-level random effects accounting for the clustering of patients within medical centers with a prespecified interest in exploring differences across the four intervention arms. RESULTS: Among VAMCs' 4517 patients, 510 experienced AKI (235 AKI events among 1314 patients with preexisting CKD). AKI events in each intervention cluster were 110 (13%) in Assistance, 122 (11%) in Assistance with Surveillance, 190 (13%) in Collaborative, and 88 (8%) in Collaborative with Surveillance. Compared with sites receiving Assistance alone, case-mix-adjusted differences in AKI event proportions were -3% (95% confidence interval [CI], -4 to -3) for Assistance with Surveillance, -3% (95% CI, -3 to -2) for Collaborative, and -5% (95% CI, -6 to -5) for Collaborative with Surveillance. The Collaborative with Surveillance intervention cluster had a substantial 46% reduction in AKI compared with Assistance alone (adjusted odds ratio=0.54; 0.40-0.74). CONCLUSIONS: This implementation trial estimates that the combination of Collaborative with Surveillance reduced the odds of AKI by 46% at VAMCs and is suggestive of a reduction among patients with CKD. CLINICAL TRIAL REGISTRY NAME AND REGISTRATION NUMBER: IMPROVE AKI Cluster-Randomized Trial (IMPROVE-AKI), NCT03556293.
Assuntos
Injúria Renal Aguda , Tutoria , Insuficiência Renal Crônica , Humanos , Estados Unidos , Meios de Contraste/efeitos adversos , United States Department of Veterans Affairs , Insuficiência Renal Crônica/induzido quimicamente , Injúria Renal Aguda/induzido quimicamente , Injúria Renal Aguda/prevenção & controleRESUMO
BACKGROUND: The utility of quality dashboards to inform decision-making and improve clinical outcomes is tightly linked to the accuracy of the information they provide and, in turn, accuracy of underlying prediction models. Despite recognition of the need to update prediction models to maintain accuracy over time, there is limited guidance on updating strategies. We compare predefined and surveillance-based updating strategies applied to a model supporting quality evaluations among US veterans. METHODS: We evaluated the performance of a US Department of Veterans Affairs-specific model for postcardiac catheterization acute kidney injury using routinely collected observational data over the 6 years following model development (n=90 295 procedures in 2013-2019). Predicted probabilities were generated from the original model, an annually retrained model, and a surveillance-based approach that monitored performance to inform the timing and method of updates. We evaluated how updating the national model impacted regional quality profiles. We compared observed-to-expected outcome ratios, where values above and below 1 indicated more and fewer adverse outcomes than expected, respectively. RESULTS: The original model overpredicted risk at the national level (observed-to-expected outcome ratio, 0.75 [0.74-0.77]). Annual retraining updated the model 5×; surveillance-based updating retrained once and recalibrated twice. While both strategies improved performance, the surveillance-based approach provided superior calibration (observed-to-expected outcome ratio, 1.01 [0.99-1.03] versus 0.94 [0.92-0.96]). Overprediction by the original model led to optimistic quality assessments, incorrectly indicating most of the US Department of Veterans Affairs' 18 regions observed fewer acute kidney injury events than predicted. Both updating strategies revealed 16 regions performed as expected and 2 regions increasingly underperformed, having more acute kidney injury events than predicted. CONCLUSIONS: Miscalibrated clinical prediction models provide inaccurate pictures of performance across clinical units, and degrading calibration further complicates our understanding of quality. Updating strategies tailored to health system needs and capacity should be incorporated into model implementation plans to promote the utility and longevity of quality reporting tools.
Assuntos
Injúria Renal Aguda , Benchmarking , Injúria Renal Aguda/diagnóstico , Injúria Renal Aguda/epidemiologia , Injúria Renal Aguda/terapia , Coleta de Dados , HumanosRESUMO
BACKGROUND: Despite its high prevalence and clinical impact, research on peripheral artery disease (PAD) remains limited due to poor accuracy of billing codes. Ankle-brachial index (ABI) and toe-brachial index can be used to identify PAD patients with high accuracy within electronic health records. METHODS: We developed a novel natural language processing (NLP) algorithm for extracting ABI and toe-brachial index values and laterality (right or left) from ABI reports. A random sample of 800 reports from 94 Veterans Affairs facilities during 2015 to 2017 was selected and annotated by clinical experts. We trained the NLP system using random forest models and optimized it through sequential iterations of 10-fold cross-validation and error analysis on 600 test reports and evaluated its final performance on a separate set of 200 reports. We also assessed the accuracy of NLP-extracted ABI and toe-brachial index values for identifying patients with PAD in a separate cohort undergoing ABI testing. RESULTS: The NLP system had an overall precision (positive predictive value) of 0.85, recall (sensitivity) of 0.93, and F1 measure (accuracy) of 0.89 to correctly identify ABI/toe-brachial index values and laterality. Among 261 patients with ABI testing (49% PAD), the NLP system achieved a positive predictive value of 92.3%, sensitivity of 83.1%, and specificity of 93.1% to identify PAD when compared with a structured chart review. The above findings were consistent in a range of sensitivity analysis. CONCLUSIONS: We successfully developed and validated an NLP system for identifying patients with PAD within the Veterans Affairs electronic health record. Our findings have broad implications for PAD research and quality improvement.
Assuntos
Índice Tornozelo-Braço , Doença Arterial Periférica , Tornozelo , Índice Tornozelo-Braço/métodos , Humanos , Extremidade Inferior , Doença Arterial Periférica/diagnóstico , Doença Arterial Periférica/epidemiologia , Valor Preditivo dos Testes , Resultado do TratamentoRESUMO
BACKGROUND: There are gaps in delivering evidence-based care for patients with chronic liver disease and cirrhosis. OBJECTIVE: Our objective was to use interactive user-centered design methods to develop the Cirrhosis Order Set and Clinical Decision Support (CirrODS) tool in order to improve clinical decision-making and workflow. METHODS: Two work groups were convened with clinicians, user experience designers, human factors and health services researchers, and information technologists to create user interface designs. CirrODS prototypes underwent several rounds of formative design. Physicians (n=20) at three hospitals were provided with clinical scenarios of patients with cirrhosis, and the admission orders made with and without the CirrODS tool were compared. The physicians rated their experience using CirrODS and provided comments, which we coded into categories and themes. We assessed the safety, usability, and quality of CirrODS using qualitative and quantitative methods. RESULTS: We created an interactive CirrODS prototype that displays an alert when existing electronic data indicate a patient is at risk for cirrhosis. The tool consists of two primary frames, presenting relevant patient data and allowing recommended evidence-based tests and treatments to be ordered and categorized. Physicians viewed the tool positively and suggested that it would be most useful at the time of admission. When using the tool, the clinicians placed fewer orders than they placed when not using the tool, but more of the orders placed were considered to be high priority when the tool was used than when it was not used. The physicians' ratings of CirrODS indicated above average usability. CONCLUSIONS: We developed a novel Web-based combined clinical decision-making and workflow support tool to alert and assist clinicians caring for patients with cirrhosis. Further studies are underway to assess the impact on quality of care for patients with cirrhosis in actual practice.