ABSTRACT
Along with the increasing availability of health data has come the rise of data-driven models to inform decision making and policy. These models have the potential to benefit both patients and health care providers but can also exacerbate health inequities. Existing "algorithmic fairness" methods for measuring and correcting model bias fall short of what is needed for health policy in two key ways. First, methods typically focus on a single grouping along which discrimination may occur rather than considering multiple, intersecting groups. Second, in clinical applications, risk prediction is typically used to guide treatment, creating distinct statistical issues that invalidate most existing techniques. We present novel unfairness metrics that address both challenges. We also develop a complete framework of estimation and inference tools for our metrics, including the unfairness value ("u-value"), used to determine the relative extremity of unfairness, and standard errors and confidence intervals employing an alternative to the standard bootstrap. We demonstrate application of our framework to a COVID-19 risk prediction model deployed in a major Midwestern health system.
ABSTRACT
PURPOSE: A machine learning-based anterior cruciate ligament (ACL) revision prediction model has been developed using Norwegian Knee Ligament Register (NKLR) data, but lacks external validation outside Scandinavia. This study aimed to assess the external validity of the NKLR model (https://swastvedt.shinyapps.io/calculator_rev/) using the STABILITY 1 randomized clinical trial (RCT) data set. The hypothesis was that model performance would be similar. METHODS: The NKLR Cox Lasso model was selected for external validation owing to its superior performance in the original study. STABILITY 1 patients with all five predictors required by the Cox Lasso model were included. The STABILITY 1 RCT was a prospective study which randomized patients to receive either a hamstring tendon autograft (HT) alone or HT plus a lateral extra-articular tenodesis (LET). Since all patients in the STABILITY 1 trial received HT ± LET, three configurations were tested: 1: all patients coded as HT, 2: HT + LET group coded as bone-patellar tendon-bone (BPTB) autograft, 3: HT + LET group coded as unknown/other graft choice. Model performance was assessed via concordance and calibration. RESULTS: In total, 591/618 (95.6%) STABILITY 1 patients were eligible for inclusion, with 39 undergoing revisions within 2 years (6.6%). Model performance was best when patients receiving HT + LET were coded as BPTB. Concordance was similar to the original NKLR prediction model for 1- and 2-year revision prediction (STABILITY: 0.71; NKLR: 0.68-0.69). Concordance 95% confidence interval (CI) ranged from 0.63 to 0.79. The model was well calibrated for 1-year prediction while the 2-year prediction demonstrated evidence of miscalibration. CONCLUSION: When patients in STABILITY 1 who received HT + LET were coded as BPTB in the NKLR prediction model, concordance was similar to the index study. However, due to a wide 95% CI, the true performance of the prediction model with this Canadian and European cohort is unclear and a larger data set is required to definitively determine the external validity. Further, better calibration for 1-year predictions aligns with general prediction modelling challenges over longer periods. While not a large enough sample size to elicit the true accuracy and external validity of the prediction model when applied to North American patients, this analysis provides more support for the notion that HT plus LET performs similarly to BPTB reconstruction. In addition, despite the wide confidence interval, this study suggests optimism regarding the accuracy of the model when applied outside of Scandinavia. LEVEL OF EVIDENCE: Level 3, cohort study.
Subject(s)
Anterior Cruciate Ligament Injuries , Anterior Cruciate Ligament Reconstruction , Hamstring Tendons , Patellar Ligament , Humans , Canada , Knee Joint/surgery , Anterior Cruciate Ligament/surgery , Patellar Ligament/surgery , Hamstring Tendons/transplantation , Transplantation, Autologous , Anterior Cruciate Ligament Injuries/surgery , Autografts/surgeryABSTRACT
PURPOSE: Accurate prediction of outcome following hip arthroscopy is challenging and machine learning has the potential to improve our predictive capability. The purpose of this study was to determine if machine learning analysis of the Danish Hip Arthroscopy Registry (DHAR) can develop a clinically meaningful calculator for predicting the probability of a patient undergoing subsequent revision surgery following primary hip arthroscopy. METHODS: Machine learning analysis was performed on the DHAR. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. Data were split randomly into training (75%) and test (25%) sets. Four models intended for these types of data were tested: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). Analysis was performed using only variables available in the pre-operative clinical setting and then repeated to compare model performance using all variables available in the registry. RESULTS: In total, 5581 patients were included for analysis. Average follow-up time or time-to-revision was 4.25 years (± 2.51) years and overall revision rate was 11%. All four models were generally well calibrated and demonstrated concordance in the moderate range when restricted to only pre-operative variables (0.62-0.67), and when considering all variables available in the registry (0.63-0.66). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true accuracy of the models. CONCLUSION: The association between pre-surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalisable across the population. LEVEL OF EVIDENCE: Level III.
Subject(s)
Femoracetabular Impingement , Humans , Femoracetabular Impingement/surgery , Arthroscopy , Treatment Outcome , Registries , Machine Learning , Hip Joint/surgery , Retrospective StudiesABSTRACT
PURPOSE: External validation of machine learning predictive models is achieved through evaluation of model performance on different groups of patients than were used for algorithm development. This important step is uncommonly performed, inhibiting clinical translation of newly developed models. Machine learning analysis of the Norwegian Knee Ligament Register (NKLR) recently led to the development of a tool capable of estimating the risk of anterior cruciate ligament (ACL) revision ( https://swastvedt.shinyapps.io/calculator_rev/ ). The purpose of this study was to determine the external validity of the NKLR model by assessing algorithm performance when applied to patients from the Danish Knee Ligament Registry (DKLR). METHODS: The primary outcome measure of the NKLR model was probability of revision ACL reconstruction within 1, 2, and/or 5 years. For external validation, all DKLR patients with complete data for the five variables required for NKLR prediction were included. The five variables included graft choice, femur fixation device, KOOS QOL score at surgery, years from injury to surgery, and age at surgery. Predicted revision probabilities were calculated for all DKLR patients. The model performance was assessed using the same metrics as the NKLR study: concordance and calibration. RESULTS: In total, 10,922 DKLR patients were included for analysis. Average follow-up time or time-to-revision was 8.4 (± 4.3) years and overall revision rate was 6.9%. Surgical technique trends (i.e., graft choice and fixation devices) and injury characteristics (i.e., concomitant meniscus and cartilage pathology) were dissimilar between registries. The model produced similar concordance when applied to the DKLR population compared to the original NKLR test data (DKLR: 0.68; NKLR: 0.68-0.69). Calibration was poorer for the DKLR population at one and five years post primary surgery but similar to the NKLR at two years. CONCLUSION: The NKLR machine learning algorithm demonstrated similar performance when applied to patients from the DKLR, suggesting that it is valid for application outside of the initial patient population. This represents the first machine learning model for predicting revision ACL reconstruction that has been externally validated. Clinicians can use this in-clinic calculator to estimate revision risk at a patient specific level when discussing outcome expectations pre-operatively. While encouraging, it should be noted that the performance of the model on patients undergoing ACL reconstruction outside of Scandinavia remains unknown. LEVEL OF EVIDENCE: III.
Subject(s)
Anterior Cruciate Ligament Injuries , Anterior Cruciate Ligament , Anterior Cruciate Ligament/surgery , Anterior Cruciate Ligament Injuries/diagnosis , Anterior Cruciate Ligament Injuries/surgery , Humans , Machine Learning , Quality of Life , Registries , ReoperationABSTRACT
In this single-center, retrospective cohort study, we aimed to elucidate simple metabolic markers or surrogate indices of ß-cell function that best predict long-term insulin independence and goal glycemic HbA1c control (HbA1c ≤ 6.5%) after total pancreatectomy with islet autotransplantation (TP-IAT). Patients who underwent TP-IAT (n = 371) were reviewed for metabolic measures before TP-IAT and for insulin independence and glycemic control at 1, 3, and 5 years after TP-IAT. Insulin independence and goal glycemic control were achieved in 33% and 68% at 1 year, respectively. Although the groups who were insulin independent and dependent overlap substantially on baseline measures, an individual who has abnormal glycemia (prediabetes HbA1c or fasting glucose) or estimated IEQs/kg < 2500 has a very high likelihood of remaining insulin dependent after surgery. In multivariate logistic regression modelling, metabolic measures correctly predicted insulin independence in about 70% of patients at 1, 3, and 5 years after TP-IAT. In conclusion, metabolic testing measures before surgery are highly associated with diabetes outcomes after TP-IAT at a population level and correctly predict outcomes in approximately two out of three patients. These findings may aid in prognostic counseling for chronic pancreatitis patients who are likely to eventually need TP-IAT.
Subject(s)
Diabetes Mellitus , Islets of Langerhans Transplantation , Pancreatitis, Chronic , Humans , Pancreatectomy , Pancreatitis, Chronic/surgery , Retrospective Studies , Transplantation, Autologous , Treatment OutcomeABSTRACT
BACKGROUND: Most clinical machine learning applications use a supervised learning approach using labeled variables. In contrast, unsupervised learning enables pattern detection without a prespecified outcome. PURPOSE/HYPOTHESIS: The purpose of this study was to apply unsupervised learning to the combined Danish and Norwegian knee ligament register (KLR) with the goal of detecting distinct subgroups. It was hypothesized that resulting groups would have differing rates of subsequent anterior cruciate ligament reconstruction (ACLR) revision. STUDY DESIGN: Cohort study; Level of evidence, 3. METHODS: K-prototypes clustering was performed on the complete case KLR data. After performing the unsupervised learning analysis, the authors defined clinically relevant characteristics of each cluster using variable summaries, surgeons' domain knowledge, and Shapley Additive exPlanations analysis. RESULTS: Five clusters were identified. Cluster 1 (revision rate, 9.9%) patients were young (mean age, 22 years; SD, 6 years), received hamstring tendon (HT) autograft (91%), and had lower baseline Knee injury and Osteoarthritis Outcome Score (KOOS) Sport and Recreation (Sports) scores (mean, 25.0; SD, 15.6). Cluster 2 (revision rate, 6.9%) patients received HT autograft (89%) and had higher baseline KOOS Sports scores (mean, 67.2; SD, 16.5). Cluster 3 (revision rate, 4.7%) patients received bone-patellar tendon-bone (BPTB) or quadriceps tendon (QT) autograft (94%) and had higher baseline KOOS Sports scores (mean, 65.8; SD, 16.4). Cluster 4 (revision rate, 4.1%) patients received BPTB or QT autograft (88%) and had low baseline KOOS Sports scores (mean, 20.5; SD, 14.0). Cluster 5 (revision rate, 3.1%) patients were older (mean age, 42 years; SD, 7 years), received HT autograft (89%), and had low baseline KOOS Sports scores (mean, 23.4; SD, 17.6). CONCLUSION: Unsupervised learning identified 5 distinct KLR patient subgroups and each grouping was associated with a unique ACLR revision rate. Patients can be approximately classified into 1 of the 5 clusters based on only 3 variables: age, graft choice (HT, BPTB, or QT autograft), and preoperative KOOS Sports subscale score. If externally validated, the resulting groupings may enable quick risk stratification for future patients undergoing ACLR in the clinical setting. Patients in cluster 1 are considered high risk (9.9%), cluster 2 patients medium risk (6.9%), and patients in clusters 3 to 5 low risk (3.1%-4.7%) for revision ACLR.
Subject(s)
Anterior Cruciate Ligament Injuries , Hamstring Tendons , Patellar Ligament , Humans , Young Adult , Adult , Cohort Studies , Unsupervised Machine Learning , Anterior Cruciate Ligament Injuries/surgery , Autografts , Patellar Ligament/transplantation , Hamstring Tendons/transplantation , Transplantation, Autologous , DenmarkABSTRACT
BACKGROUND: Clinical tools based on machine learning analysis now exist for outcome prediction after primary anterior cruciate ligament reconstruction (ACLR). Relying partly on data volume, the general principle is that more data may lead to improved model accuracy. PURPOSE/HYPOTHESIS: The purpose was to apply machine learning to a combined data set from the Norwegian and Danish knee ligament registers (NKLR and DKRR, respectively), with the aim of producing an algorithm that can predict revision surgery with improved accuracy relative to a previously published model developed using only the NKLR. The hypothesis was that the additional patient data would result in an algorithm that is more accurate. STUDY DESIGN: Cohort study; Level of evidence, 3. METHODS: Machine learning analysis was performed on combined data from the NKLR and DKRR. The primary outcome was the probability of revision ACLR within 1, 2, and 5 years. Data were split randomly into training sets (75%) and test sets (25%). There were 4 machine learning models examined: Cox lasso, random survival forest, gradient boosting, and super learner. Concordance and calibration were calculated for all 4 models. RESULTS: The data set included 62,955 patients in which 5% underwent a revision surgical procedure with a mean follow-up of 7.6 ± 4.5 years. The 3 nonparametric models (random survival forest, gradient boosting, and super learner) performed best, demonstrating moderate concordance (0.67 [95% CI, 0.64-0.70]), and were well calibrated at 1 and 2 years. Model performance was similar to that of the previously published model (NKLR-only model: concordance, 0.67-0.69; well calibrated). CONCLUSION: Machine learning analysis of the combined NKLR and DKRR enabled prediction of the revision ACLR risk with moderate accuracy. However, the resulting algorithms were less user-friendly and did not demonstrate superior accuracy in comparison with the previously developed model based on patients from the NKLR alone, despite the analysis of nearly 63,000 patients. This ceiling effect suggests that simply adding more patients to current national knee ligament registers is unlikely to improve predictive capability and may prompt future changes to increase variable inclusion.
Subject(s)
Anterior Cruciate Ligament Injuries , Anterior Cruciate Ligament Reconstruction , Humans , Anterior Cruciate Ligament/surgery , Cohort Studies , Anterior Cruciate Ligament Injuries/surgery , Knee Joint/surgery , Anterior Cruciate Ligament Reconstruction/methods , Reoperation , Norway/epidemiology , DenmarkABSTRACT
BACKGROUND: Relaxation of federal regulations for methadone take-out dosing during the COVID-19 pandemic is unprecedented. The impact of this change on drug use is unknown. This study explores the impact of the federal take-out variance on drug use in one urban opioid treatment program as measured by drug testing. METHODS: This study collected drug test results from 613 patients receiving methadone from July 2020, following COVID-19-related take-out dose adjustments, and July 2019 for comparison. Using a generalized linear mixed model, we computed the average estimated probability of a positive drug test for each year for each take-out phase. To isolate the effect of changing take-out, we removed the main effect of year, while retaining the main effect of take-out phase and the interaction between year and phase. RESULTS: The percent of drug tests positive for opiates, benzodiazepines, and methamphetamine was greater in July 2020 than in July 2019 (p < 0.001 for each), while the percent of tests negative for methadone increased (p < 0.001). Oxycodone, barbiturate, and cocaine positive tests remained stable. In a separate analysis of opioid and non-opioid test results, take-out phase was associated with both opioid and non-opioid positive results (p < 0.001, each outcome). The association of take-out phase with opioid and non-opioid positive results differed in the two years (year-by-phase interaction p < 0.025, each outcome). After removing the year main effect, the rate of positive tests was lower in 2020 for the smallest number of take-out doses, higher for a moderate number of take-out doses, and about the same for the highest number of take-out doses. CONCLUSIONS: Positive opioid and non-opioid drug tests increased following the federal variance allowing more methadone take-out doses, but these findings cannot fully be attributed to alterations in the take-out schedule.
Subject(s)
COVID-19 , Opioid-Related Disorders , Pharmaceutical Preparations , Analgesics, Opioid/therapeutic use , Humans , Methadone/therapeutic use , Opiate Substitution Treatment , Opioid-Related Disorders/drug therapy , Pandemics , SARS-CoV-2ABSTRACT
OBJECTIVES: Accurate prediction of outcome following anterior cruciate ligament (ACL) reconstruction is challenging, and machine learning has the potential to improve our predictive capability. The purpose of this study was to determine if machine learning analysis of the Norwegian Knee Ligament Register (NKLR) can (1) identify the most important risk factors associated with subjective failure of ACL reconstruction and (2) develop a clinically meaningful calculator for predicting the probability of subjective failure following ACL reconstruction. METHODS: Machine learning analysis was performed on the NKLR. All patients with 2-year follow-up data were included. The primary outcome was the probability of subjective failure 2 years following primary surgery, defined as a Knee Injury and Osteoarthritis Outcome Score (KOOS) Quality of Life (QoL) subscale score of <44. Data were split randomly into training (75%) and test (25%) sets. Four models intended for this type of data were tested: Lasso logistic regression, random forest, generalized additive model (GAM), and gradient boosted regression (GBM). These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). RESULTS: Of the 20,818 patients who met the inclusion criteria, 11,630 (56%) completed the 2-year follow-up KOOS QoL questionnaire. Of those with complete KOOS data, 22% reported subjective failure. The lasso logistic regression, GBM, and GAM all demonstrated AUC in the moderate range (0.67-0.68), with the GAM performing best (0.68; 95% CI 0.64-0.71). Lasso logistic regression, GBM, and the GAM were well-calibrated, while the random forest showed evidence of mis-calibration. The GAM was selected to create an in-clinic calculator to predict subjective failure risk at a patient-specific level (https://swastvedt.shinyapps.io/calculator_koosqol/). CONCLUSION: Machine learning analysis of the NKLR can predict subjective failure risk following ACL reconstruction with fair accuracy. This algorithm supports the creation of an easy-to-use in-clinic calculator for point-of-care risk stratification. Clinicians can use this calculator to estimate subjective failure risk at a patient-specific level when discussing outcome expectations preoperatively. LEVEL OF EVIDENCE: Level-III Retrospective review of a prospective national register.
Subject(s)
Anterior Cruciate Ligament Injuries , Anterior Cruciate Ligament/surgery , Anterior Cruciate Ligament Injuries/epidemiology , Anterior Cruciate Ligament Injuries/surgery , Humans , Machine Learning , Patient Reported Outcome Measures , Prospective Studies , Quality of LifeABSTRACT
BACKGROUND: Several factors are associated with an increased risk of anterior cruciate ligament (ACL) reconstruction revision. However, the ability to accurately translate these factors into a quantifiable risk of revision at a patient-specific level has remained elusive. We sought to determine if machine learning analysis of the Norwegian Knee Ligament Register (NKLR) can identify the most important risk factors associated with subsequent revision of primary ACL reconstruction and develop a clinically meaningful calculator for predicting revision of primary ACL reconstruction. METHODS: Machine learning analysis was performed on the NKLR data set. The primary outcome was the probability of revision ACL reconstruction within 1, 2, and/or 5 years. Data were split randomly into training sets (75%) and test sets (25%). Four machine learning models were tested: Cox Lasso, survival random forest, generalized additive model, and gradient boosted regression. Concordance and calibration were calculated for all 4 models. RESULTS: The data set included 24,935 patients, and 4.9% underwent a revision surgical procedure during a mean follow-up (and standard deviation) of 8.1 ± 4.1 years. All 4 models were well-calibrated, with moderate concordance (0.67 to 0.69). The Cox Lasso model required only 5 variables for outcome prediction. The other models either used more variables without an appreciable improvement in accuracy or had slightly lower accuracy overall. An in-clinic calculator was developed that can estimate the risk of ACL revision (Revision Risk Calculator). This calculator can quantify risk at a patient-specific level, with a plausible range from near 0% for low-risk patients to 20% for high-risk patients at 5 years. CONCLUSIONS: Machine learning analysis of a national knee ligament registry can predict the risk of ACL reconstruction revision with moderate accuracy. This algorithm supports the creation of an in-clinic calculator for point-of-care risk stratification based on the input of only 5 variables. Similar analysis using a larger or more comprehensive data set may improve the accuracy of risk prediction, and future studies incorporating patients who have experienced failure of ACL reconstruction but have not undergone subsequent revision may better predict the true risk of ACL reconstruction failure. LEVEL OF EVIDENCE: Prognostic Level III. See Instructions for Authors for a complete description of levels of evidence.
Subject(s)
Anterior Cruciate Ligament Reconstruction , Machine Learning , Reoperation/statistics & numerical data , Female , Humans , Male , Norway , Predictive Value of Tests , Registries , Risk FactorsABSTRACT
BACKGROUND: Total pancreatectomy with islet autotransplantation (TPIAT) involves pancreatectomy, splenectomy, and reinjection of the patient's pancreatic islets into the portal vein. This process triggers a local inflammatory reaction and increase in portal pressure, threatening islet survival and potentially causing portal vein thrombosis. Recent research has highlighted a high frequency of extreme thrombocytosis (platelets ≥1000 × 109/L) after TPIAT, but its cause and association with thrombotic risk remain unclear. METHODS: This retrospective single-site study of a contemporary cohort of 409 pediatric and adult patients analyzed the frequency of thrombocytosis, risk factors for thrombosis, and antiplatelet and anticoagulation strategies. RESULTS: Of 409 patients, 67% developed extreme thrombocytosis, peaking around postoperative day 16. Extreme thrombocytosis was significantly associated with infused islet volumes. Thromboembolic events occurred in 12.2% of patients, with portal vein thromboses occurring significantly earlier than peripheral thromboses. Portal vein thromboses were associated with infused islet volumes and portal pressures but not platelet counts or other measures. Most thromboembolic events (82.7%) occurred before the postoperative day of maximum platelet count. Only 4 of 27 (14.8%) of portal vein thromboses occurred at platelet counts ≥500 × 109/L. Perioperative heparin was given to all patients. Treatment of reactive thrombocytosis using aspirin in adults and hydroxyurea in children was not associated with significantly decreased thromboembolic risk. CONCLUSIONS: These results suggest that post-TPIAT thrombocytosis and portal vein thromboses may be linked to the islet infusion inflammation, not directly to each other, and further reducing this inflammation may reduce thrombosis and thrombocytosis frequencies simultaneously.
Subject(s)
Islets of Langerhans Transplantation , Thrombocytosis , Thrombosis , Adult , Child , Humans , Islets of Langerhans Transplantation/adverse effects , Islets of Langerhans Transplantation/methods , Pancreatectomy/adverse effects , Pancreatectomy/methods , Portal Vein , Retrospective Studies , Thrombocytosis/diagnosis , Thrombocytosis/etiology , Thrombosis/etiology , Transplantation, Autologous/adverse effectsABSTRACT
OBJECTIVES: Smoking and alcohol use are risk factors for acute and chronic pancreatitis, and their role on anxiety, depression, and opioid use in patients who undergo total pancreatectomy and islet autotransplantation (TPIAT) is unknown. METHODS: We included adults enrolled in the Prospective Observational Study of TPIAT (POST). Measured variables included smoking (never, former, current) and alcohol abuse or dependency history (yes vs no). Using univariable and multivariable analyses, we investigated the association of smoking and alcohol dependency history with anxiety and depression, opioid use, and postsurgical outcomes. RESULTS: Of 195 adults studied, 25 were current smokers and 77 former smokers, whereas 18 had a history of alcohol dependency (of whom 10 were current smokers). A diagnosis of anxiety was associated with current smoking (P = 0.005), and depression was associated with history of alcohol abuse/dependency (P = 0.0001). However, active symptoms of anxiety and depression at the time of TPIAT were not associated with smoking or alcohol status. Opioid use in the past 14 days was associated with being a former smoker (P = 0.005). CONCLUSIONS: Active smoking and alcohol abuse history were associated with a diagnosis of anxiety and depression, respectively; however, at the time of TPIAT, symptom scores suggested that they were being addressed.