RESUMEN
Importance: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications. Objective: To evaluate risk-prediction model performance when trained on risk-specific cohorts. Design, Setting, and Participants: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109â¯445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined. Exposures: The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively. Main Outcomes and Measures: Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model. Results: A total of 109â¯445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77â¯921 procedures [71.2%]) and Jacksonville (31â¯524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109â¯445 operations, 55â¯646 patients were male (50.8%), and 66â¯495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40). Conclusion and Relevance: In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.
RESUMEN
The degree to which artificial intelligence healthcare research is informed by data and stakeholders from community settings has not been previously described. As communities are the principal location of healthcare delivery, engaging them could represent an important opportunity to improve scientific quality. This scoping review systematically maps what is known and unknown about community-engaged artificial intelligence research and identifies opportunities to optimize the generalizability of these applications through involvement of community stakeholders and data throughout model development, validation, and implementation. Embase, PubMed, and MEDLINE databases were searched for articles describing artificial intelligence or machine learning healthcare applications with community involvement in model development, validation, or implementation. Model architecture and performance, the nature of community engagement, and barriers or facilitators to community engagement were reported according to PRISMA extension for Scoping Reviews guidelines. Of approximately 10,880 articles describing artificial intelligence healthcare applications, 21 (0.2%) described community involvement. All articles derived data from community settings, most commonly by leveraging existing datasets and sources that included community subjects, and often bolstered by internet-based data acquisition and subject recruitment. Only one article described inclusion of community stakeholders in designing an application-a natural language processing model that detected cases of likely child abuse with 90% accuracy using harmonized electronic health record notes from both hospital and community practice settings. The primary barrier to including community-derived data was small sample sizes, which may have affected 11 of the 21 studies (53%), introducing substantial risk for overfitting that threatens generalizability. Community engagement in artificial intelligence healthcare application development, validation, or implementation is rare. As healthcare delivery occurs primarily in community settings, investigators should consider engaging community stakeholders in user-centered design, usability, and clinical implementation studies to optimize generalizability.
RESUMEN
Objective: To determine whether certain patients are vulnerable to errant triage decisions immediately after major surgery and whether there are unique sociodemographic phenotypes within overtriaged and undertriaged cohorts. Background: In a fair system, overtriage of low-acuity patients to intensive care units (ICUs) and undertriage of high-acuity patients to general wards would affect all sociodemographic subgroups equally. Methods: This multicenter, longitudinal cohort study of hospital admissions immediately after major surgery compared hospital mortality and value of care (risk-adjusted mortality/total costs) across 4 cohorts: overtriage (N = 660), risk-matched overtriage controls admitted to general wards (N = 3077), undertriage (N = 2335), and risk-matched undertriage controls admitted to ICUs (N = 4774). K-means clustering identified sociodemographic phenotypes within overtriage and undertriage cohorts. Results: Compared with controls, overtriaged admissions had a predominance of male patients (56.2% vs 43.1%, P < 0.001) and commercial insurance (6.4% vs 2.5%, P < 0.001); undertriaged admissions had a predominance of Black patients (28.4% vs 24.4%, P < 0.001) and greater socioeconomic deprivation. Overtriage was associated with increased total direct costs [$16.2K ($11.4K-$23.5K) vs $14.1K ($9.1K-$20.7K), P < 0.001] and low value of care; undertriage was associated with increased hospital mortality (1.5% vs 0.7%, P = 0.002) and hospice care (2.2% vs 0.6%, P < 0.001) and low value of care. Unique sociodemographic phenotypes within both overtriage and undertriage cohorts had similar outcomes and value of care, suggesting that triage decisions, rather than patient characteristics, drive outcomes and value of care. Conclusions: Postoperative triage decisions should ensure equality across sociodemographic groups by anchoring triage decisions to objective patient acuity assessments, circumventing cognitive shortcuts and mitigating bias.
RESUMEN
Standard race adjustments for estimating glomerular filtration rate (GFR) and reference creatinine can yield a lower acute kidney injury (AKI) and chronic kidney disease (CKD) prevalence among African American patients than non-race adjusted estimates. We developed two race-agnostic computable phenotypes that assess kidney health among 139,152 subjects admitted to the University of Florida Health between 1/2012-8/2019 by removing the race modifier from the estimated GFR and estimated creatinine formula used by the race-adjusted algorithm (race-agnostic algorithm 1) and by utilizing 2021 CKD-EPI refit without race formula (race-agnostic algorithm 2) for calculations of the estimated GFR and estimated creatinine. We compared results using these algorithms to the race-adjusted algorithm in African American patients. Using clinical adjudication, we validated race-agnostic computable phenotypes developed for preadmission CKD and AKI presence on 300 cases. Race adjustment reclassified 2,113 (8%) to no CKD and 7,901 (29%) to a less severe CKD stage compared to race-agnostic algorithm 1 and reclassified 1,208 (5%) to no CKD and 4,606 (18%) to a less severe CKD stage compared to race-agnostic algorithm 2. Of 12,451 AKI encounters based on race-agnostic algorithm 1, race adjustment reclassified 591 to No AKI and 305 to a less severe AKI stage. Of 12,251 AKI encounters based on race-agnostic algorithm 2, race adjustment reclassified 382 to No AKI and 196 (1.6%) to a less severe AKI stage. The phenotyping algorithm based on refit without race formula performed well in identifying patients with CKD and AKI with a sensitivity of 100% (95% confidence interval [CI] 97%-100%) and 99% (95% CI 97%-100%) and a specificity of 88% (95% CI 82%-93%) and 98% (95% CI 93%-100%), respectively. Race-agnostic algorithms identified substantial proportions of additional patients with CKD and AKI compared to race-adjusted algorithm in African American patients. The phenotyping algorithm is promising in identifying patients with kidney disease and improving clinical decision-making.
Asunto(s)
Lesión Renal Aguda , Negro o Afroamericano , Tasa de Filtración Glomerular , Hospitalización , Insuficiencia Renal Crónica , Adulto , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Lesión Renal Aguda/diagnóstico , Lesión Renal Aguda/epidemiología , Algoritmos , Creatinina/sangre , Riñón/fisiopatología , Fenotipo , Insuficiencia Renal Crónica/fisiopatología , Insuficiencia Renal Crónica/epidemiología , Insuficiencia Renal Crónica/diagnósticoRESUMEN
Using clustering analysis for early vital signs, unique patient phenotypes with distinct pathophysiological signatures and clinical outcomes may be revealed and support early clinical decision-making. Phenotyping using early vital signs has proven challenging, as vital signs are typically sampled sporadically. We proposed a novel, deep temporal interpolation and clustering network to simultaneously extract latent representations from irregularly sampled vital signs and derive phenotypes. Four distinct clusters were identified. Phenotype A (18%) had the greatest prevalence of comorbid disease with increased prevalence of prolonged respiratory insufficiency, acute kidney injury, sepsis, and long-term (3-year) mortality. Phenotypes B (33%) and C (31%) had a diffuse pattern of mild organ dysfunction. Phenotype B's favorable short-term clinical outcomes were tempered by the second highest rate of long-term mortality. Phenotype C had favorable clinical outcomes. Phenotype D (17%) exhibited early and persistent hypotension, high incidence of early surgery, and substantial biomarker incidence of inflammation. Despite early and severe illness, phenotype D had the second lowest long-term mortality. After comparing the sequential organ failure assessment scores, the clustering results did not simply provide a recapitulation of previous acuity assessments. This tool may impact triage decisions and have significant implications for clinical decision-support under time constraints and uncertainty.
Asunto(s)
Puntuaciones en la Disfunción de Órganos , Sepsis , Humanos , Enfermedad Aguda , Fenotipo , Biomarcadores , Análisis por ConglomeradosRESUMEN
BACKGROUND: There is no consensus regarding safe intraoperative blood pressure thresholds that protect against postoperative acute kidney injury (AKI). This review aims to examine the existing literature to delineate safe intraoperative hypotension (IOH) parameters to prevent postoperative AKI. METHODS: PubMed, Cochrane Central, and Web of Science were systematically searched for articles published between 2015 and 2022 relating the effects of IOH on postoperative AKI. RESULTS: Our search yielded 19 articles. IOH risk thresholds ranged from <50 to <75 âmmHg for mean arterial pressure (MAP) and from <70 to <100 âmmHg for systolic blood pressure (SBP). MAP below 65 âmmHg for over 5 âmin was the most cited threshold (N â= â13) consistently associated with increased postoperative AKI. Greater magnitude and duration of MAP and SBP below the thresholds were generally associated with a dose-dependent increase in postoperative AKI incidence. CONCLUSIONS: While a consistent definition for IOH remains elusive, the evidence suggests that MAP below 65 âmmHg for over 5 âmin is strongly associated with postoperative AKI, with the risk increasing with the magnitude and duration of IOH.
Asunto(s)
Lesión Renal Aguda , Hipotensión , Complicaciones Intraoperatorias , Complicaciones Posoperatorias , Humanos , Lesión Renal Aguda/etiología , Lesión Renal Aguda/epidemiología , Lesión Renal Aguda/prevención & control , Hipotensión/etiología , Hipotensión/epidemiología , Hipotensión/prevención & control , Complicaciones Posoperatorias/epidemiología , Complicaciones Posoperatorias/prevención & control , Complicaciones Posoperatorias/etiología , Complicaciones Intraoperatorias/prevención & control , Complicaciones Intraoperatorias/epidemiología , Complicaciones Intraoperatorias/etiologíaRESUMEN
Persistence of acute kidney injury (AKI) or insufficient recovery of renal function was associated with reduced long-term survival and life quality. We quantified AKI trajectories and describe transitions through progression and recovery among hospitalized patients. 245,663 encounters from 128,271 patients admitted to UF Health between 2012 and 2019 were retrospectively categorized according to the worst AKI stage experienced within 24-h periods. Multistate models were fit for describing characteristics influencing transitions towards progressed or regressed AKI, discharge, and death. Effects of age, sex, race, admission comorbidities, and prolonged intensive care unit stay (ICU) on transition rates were examined via Cox proportional hazards models. About 20% of encounters had AKI; where 66% of those with AKI had Stage 1 as their worst AKI severity during hospitalization, 18% had Stage 2, and 16% had Stage 3 AKI (12% with kidney replacement therapy (KRT) and 4% without KRT). At 3 days following Stage 1 AKI, 71.1% (70.5-71.6%) were either resolved to No AKI or discharged, while recovery proportion was 38% (37.4-38.6%) and discharge proportion was 7.1% (6.9-7.3%) following AKI Stage 2. At 14 days following Stage 1 AKI, patients with additional frail conditions stay had lower transition proportion towards No AKI or discharge states. Multistate modeling framework is a facilitating mechanism for understanding AKI clinical course and examining characteristics influencing disease process and transition rates.
Asunto(s)
Lesión Renal Aguda , Unidades de Cuidados Intensivos , Humanos , Estudios Retrospectivos , Lesión Renal Aguda/epidemiología , Lesión Renal Aguda/terapia , Terapia de Reemplazo Renal , Progresión de la Enfermedad , Factores de RiesgoRESUMEN
Background: Machine learning-enabled clinical information systems (ML-CISs) have the potential to drive health care delivery and research. The Fast Healthcare Interoperability Resources (FHIR) data standard has been increasingly applied in developing these systems. However, methods for applying FHIR to ML-CISs are variable. Objective: This study evaluates and compares the functionalities, strengths, and weaknesses of existing systems and proposes guidelines for optimizing future work with ML-CISs. Methods: Embase, PubMed, and Web of Science were searched for articles describing machine learning systems that were used for clinical data analytics or decision support in compliance with FHIR standards. Information regarding each system's functionality, data sources, formats, security, performance, resource requirements, scalability, strengths, and limitations was compared across systems. Results: A total of 39 articles describing FHIR-based ML-CISs were divided into the following three categories according to their primary focus: clinical decision support systems (n=18), data management and analytic platforms (n=10), or auxiliary modules and application programming interfaces (n=11). Model strengths included novel use of cloud systems, Bayesian networks, visualization strategies, and techniques for translating unstructured or free-text data to FHIR frameworks. Many intelligent systems lacked electronic health record interoperability and externally validated evidence of clinical efficacy. Conclusions: Shortcomings in current ML-CISs can be addressed by incorporating modular and interoperable data management, analytic platforms, secure interinstitutional data exchange, and application programming interfaces with adequate scalability to support both real-time and prospective clinical applications that use electronic health record platforms with diverse implementations.
RESUMEN
BACKGROUND: Acute kidney injury is a common postoperative complication affecting between 10% and 30% of surgical patients. Acute kidney injury is associated with increased resource usage and chronic kidney disease development, with more severe acute kidney injury suggesting more aggressive deterioration in clinical outcomes and mortality. METHODS: We considered 42,906 surgical patients admitted to University of Florida Health (n = 51,806) between 2014 and 2021. Acute kidney injury stages were determined using the Kidney Disease Improving Global Outcomes serum creatinine criteria. We developed a recurrent neural network-based model to continuously predict acute kidney injury risk and state in the following 24 hours and compared it with logistic regression, random forest, and multi-layer perceptron models. We used medications, laboratory and vital measurements, and derived features from past one-year records as inputs. We analyzed the proposed model with integrated gradients for enhanced explainability. RESULTS: Postoperative acute kidney injury at any stage developed in 20% (10,664) of the cohort. The recurrent neural network model was more accurate in predicting nearly all categories of next-day acute kidney injury stages (including the no acute kidney injury group). The area under the receiver operating curve and 95% confidence intervals for recurrent neural network and logistic regression models were for no acute kidney injury (0.98 [0.98-0.98] vs 0.93 [0.93-0.93]), stage 1 (0.95 [0.95-0.95] vs. 0.81 [0.80-0.82]), stage 2/3 (0.99 [0.99-0.99] vs 0.96 [0.96-0.97]), and stage 3 with renal replacement therapy (1.0 [1.0-1.0] vs 1.0 [1.0-1.0]. CONCLUSION: The proposed model demonstrates that temporal processing of patient information can lead to more granular and dynamic modeling of acute kidney injury status and result in more continuous and accurate acute kidney injury prediction. We showcase the integrated gradients framework's utility as a mechanism for enhancing model explainability, potentially facilitating clinical trust for future implementation.
Asunto(s)
Lesión Renal Aguda , Aprendizaje Profundo , Humanos , Lesión Renal Aguda/diagnóstico , Lesión Renal Aguda/epidemiología , Lesión Renal Aguda/etiología , Modelos Logísticos , Predicción , RiñónRESUMEN
OBJECTIVES: We aim to quantify longitudinal acute kidney injury (AKI) trajectories and to describe transitions through progressing and recovery states and outcomes among hospitalized patients using multistate models. METHODS: In this large, longitudinal cohort study, 138,449 adult patients admitted to a quaternary care hospital between 2012 and 2019 were staged based on Kidney Disease: Improving Global Outcomes serum creatinine criteria for the first 14 days of their hospital stay. We fit multistate models to estimate probability of being in a certain clinical state at a given time after entering each one of the AKI stages. We investigated the effects of selected variables on transition rates via Cox proportional hazards regression models. RESULTS: Twenty percent of hospitalized encounters (49,325/246,964) had AKI; among patients with AKI, 66% had Stage 1 AKI, 18% had Stage 2 AKI, and 17% had AKI Stage 3 with or without RRT. At seven days following Stage 1 AKI, 69% (95% confidence interval [CI]: 68.8%-70.5%) were either resolved to No AKI or discharged, while smaller proportions of recovery (26.8%, 95% CI: 26.1%-27.5%) and discharge (17.4%, 95% CI: 16.8%-18.0%) were observed following AKI Stage 2. At 14 days following Stage 1 AKI, patients with more frail conditions (Charlson comorbidity index greater than or equal to 3 and had prolonged ICU stay) had lower proportion of transitioning to No AKI or discharge states. DISCUSSION: Multistate analyses showed that the majority of Stage 2 and higher severity AKI patients could not resolve within seven days; therefore, strategies preventing the persistence or progression of AKI would contribute to the patients' life quality. CONCLUSIONS: We demonstrate multistate modeling framework's utility as a mechanism for a better understanding of the clinical course of AKI with the potential to facilitate treatment and resource planning.
RESUMEN
BACKGROUND: In traumatic hemorrhage, hybrid operating rooms offer near simultaneous performance of endovascular and open techniques, with correlations to earlier hemorrhage control, fewer transfusions, and possible decreased mortality. However, hybrid operating rooms are resource intensive. This study quantifies and describes a single-center experience with the complications, cost-utility, and value of a dedicated trauma hybrid operating room. METHODS: This retrospective cohort study evaluated 292 consecutive adult trauma patients who underwent immediate (<4 hours) operative intervention at a Level I trauma center. A total of 106 patients treated before the construction of a hybrid operating room served as historical controls to the 186 patients treated thereafter. Demographics, hemorrhage-control procedures, and financial data as well as postoperative complications and outcomes were collected via electronic medical records. Value and incremental cost-utility ratio were calculated. RESULTS: Demographics and severity of illness were similar between cohorts. Resuscitative endovascular occlusion of the aorta was more frequently used in the hybrid operating room. Hemorrhage control occurred faster (60 vs. 49 minutes, p = 0.005) and, in the 4- to 24-hour postadmission period, required less red blood cell (mean, 1.0 vs. 0 U, p = 0.001) and plasma (mean, 1.0 vs. 0 U, p < 0.001) transfusions. Complications were similar except for a significant decrease in pneumonia (7% vs. 4%, p = 0.008). Severe complications (Clavien-Dindo classification, ≥3) were similar. Across the patient admission, costs were not significantly different ($50,023 vs. $54,740, p = 0.637). There was no change in overall value (1.00 vs. 1.07, p = 0.778). CONCLUSION: The conversion of our standard trauma operating room to an endovascular hybrid operating room provided measurable improvements in hemorrhage control, red blood cell and plasma transfusions, and postoperative pneumonia without significant increase in cost. Value was unchanged. LEVEL OF EVIDENCE: Economic/Value-Based Evaluations; Level III.
Asunto(s)
Procedimientos Endovasculares , Quirófanos , Adulto , Humanos , Estudios Retrospectivos , Hemorragia/etiología , Hemorragia/terapia , Resucitación/métodos , Transfusión Sanguínea , Procedimientos Endovasculares/métodos , Centros TraumatológicosRESUMEN
To evaluate the methodologic rigor and predictive performance of models predicting ICU readmission; to understand the characteristics of ideal prediction models; and to elucidate relationships between appropriate triage decisions and patient outcomes. DATA SOURCES: PubMed, Web of Science, Cochrane, and Embase. STUDY SELECTION: Primary literature that reported the development or validation of ICU readmission prediction models within from 2010 to 2021. DATA EXTRACTION: Relevant study information was extracted independently by two authors using the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies checklist. Bias was evaluated using the Prediction model Risk Of Bias ASsessment Tool. Data sources, modeling methodology, definition of outcomes, performance, and risk of bias were critically evaluated to elucidate relevant relationships. DATA SYNTHESIS: Thirty-three articles describing models were included. Six studies had a high overall risk of bias due to improper inclusion criteria or omission of critical analysis details. Four other studies had an unclear overall risk of bias due to lack of detail describing the analysis. Overall, the most common (50% of studies) source of bias was the filtering of candidate predictors via univariate analysis. The poorest performing models used existing clinical risk or acuity scores such as Acute Physiologic Assessment and Chronic Health Evaluation II, Sequential Organ Failure Assessment, or Stability and Workload Index for Transfer as the sole predictor. The higher-performing ICU readmission prediction models used homogenous patient populations, specifically defined outcomes, and routinely collected predictors that were analyzed over time. CONCLUSIONS: Models predicting ICU readmission can achieve performance advantages by using longitudinal time series modeling, homogenous patient populations, and predictor variables tailored to those populations.
RESUMEN
Accurate prediction of postoperative complications can inform shared decisions regarding prognosis, preoperative risk-reduction, and postoperative resource use. We hypothesized that multi-task deep learning models would outperform conventional machine learning models in predicting postoperative complications, and that integrating high-resolution intraoperative physiological time series would result in more granular and personalized health representations that would improve prognostication compared to preoperative predictions. In a longitudinal cohort study of 56,242 patients undergoing 67,481 inpatient surgical procedures at a university medical center, we compared deep learning models with random forests and XGBoost for predicting nine common postoperative complications using preoperative, intraoperative, and perioperative patient data. Our study indicated several significant results across experimental settings that suggest the utility of deep learning for capturing more precise representations of patient health for augmented surgical decision support. Multi-task learning improved efficiency by reducing computational resources without compromising predictive performance. Integrated gradients interpretability mechanisms identified potentially modifiable risk factors for each complication. Monte Carlo dropout methods provided a quantitative measure of prediction uncertainty that has the potential to enhance clinical trust. Multi-task learning, interpretability mechanisms, and uncertainty metrics demonstrated potential to facilitate effective clinical implementation.
Asunto(s)
Redes Neurales de la Computación , Complicaciones Posoperatorias , Humanos , Estudios Longitudinales , Incertidumbre , Complicaciones Posoperatorias/etiología , Aprendizaje AutomáticoRESUMEN
Objective. In 2019, the University of Florida College of Medicine launched theMySurgeryRiskalgorithm to predict eight major post-operative complications using automatically extracted data from the electronic health record.Approach. This project was developed in parallel with our Intelligent Critical Care Center and represents a culmination of efforts to build an efficient and accurate model for data processing and predictive analytics.Main Results and Significance. This paper discusses how our model was constructed and improved upon. We highlight the consolidation of the database, processing of fixed and time-series physiologic measurements, development and training of predictive models, and expansion of those models into different aspects of patient assessment and treatment. We end by discussing future directions of the model.
Asunto(s)
Registros Electrónicos de Salud , Aprendizaje Automático , Humanos , PredicciónRESUMEN
BACKGROUND: In single-institution studies, overtriaging low-risk postoperative patients to ICUs has been associated with a low value of care; undertriaging high-risk postoperative patients to general wards has been associated with increased mortality and morbidity. This study tested the reproducibility of an automated postoperative triage classification system to generating an actionable, explainable decision support system. STUDY DESIGN: This longitudinal cohort study included adults undergoing inpatient surgery at two university hospitals. Triage classifications were generated by an explainable deep learning model using preoperative and intraoperative electronic health record features. Nearest neighbor algorithms identified risk-matched controls. Primary outcomes were mortality, morbidity, and value of care (inverted risk-adjusted mortality/total direct costs). RESULTS: Among 4,669 ICU admissions, 237 (5.1%) were overtriaged. Compared with 1,021 control ward admissions, overtriaged admissions had similar outcomes but higher costs ($15.9K [interquartile range $9.8K to $22.3K] vs $10.7K [$7.0K to $17.6K], p < 0.001) and lower value of care (0.2 [0.1 to 0.3] vs 1.5 [0.9 to 2.2], p < 0.001). Among 8,594 ward admissions, 1,029 (12.0%) were undertriaged. Compared with 2,498 control ICU admissions, undertriaged admissions had longer hospital length-of-stays (6.4 [3.4 to 12.4] vs 5.4 [2.6 to 10.4] days, p < 0.001); greater incidence of hospital mortality (1.7% vs 0.7%, p = 0.03), cardiac arrest (1.4% vs 0.5%, p = 0.04), and persistent acute kidney injury without renal recovery (5.2% vs 2.8%, p = 0.002); similar costs ($21.8K [$13.3K to $34.9K] vs $21.9K [$13.1K to $36.3K]); and lower value of care (0.8 [0.5 to 1.3] vs 1.2 [0.7 to 2.0], p < 0.001). CONCLUSIONS: Overtriage was associated with low value of care; undertriage was associated with both low value of care and increased mortality and morbidity. The proposed framework for generating automated postoperative triage classifications is reproducible.
Asunto(s)
Aprendizaje Profundo , Adulto , Humanos , Estudios Longitudinales , Reproducibilidad de los Resultados , Triaje , Estudios de Cohortes , Estudios RetrospectivosRESUMEN
OBJECTIVE: We test the hypothesis that for low-acuity surgical patients, postoperative intensive care unit (ICU) admission is associated with lower value of care compared with ward admission. BACKGROUND: Overtriaging low-acuity patients to ICU consumes valuable resources and may not confer better patient outcomes. Associations among postoperative overtriage, patient outcomes, costs, and value of care have not been previously reported. METHODS: In this longitudinal cohort study, postoperative ICU admissions were classified as overtriaged or appropriately triaged according to machine learning-based patient acuity assessments and requirements for immediate postoperative mechanical ventilation or vasopressor support. The nearest neighbors algorithm identified risk-matched control ward admissions. The primary outcome was value of care, calculated as inverse observed-to-expected mortality ratios divided by total costs. RESULTS: Acuity assessments had an area under the receiver operating characteristic curve of 0.92 in generating predictions for triage classifications. Of 8592 postoperative ICU admissions, 423 (4.9%) were overtriaged. These were matched with 2155 control ward admissions with similar comorbidities, incidence of emergent surgery, immediate postoperative vital signs, and do not resuscitate order placement and rescindment patterns. Compared with controls, overtraiged admissions did not have a lower incidence of any measured complications. Total costs for admission were $16.4K for overtriage and $15.9K for controls ( P =0.03). Value of care was lower for overtriaged admissions [2.9 (2.0-4.0)] compared with controls [24.2 (14.1-34.5), P <0.001]. CONCLUSIONS: Low-acuity postoperative patients who were overtriaged to ICUs had increased total costs, no improvements in outcomes, and received low-value care.
Asunto(s)
Hospitalización , Unidades de Cuidados Intensivos , Humanos , Estudios Longitudinales , Estudios Retrospectivos , Estudios de CohortesRESUMEN
Established guidelines describe minimum requirements for reporting algorithms in healthcare; it is equally important to objectify the characteristics of ideal algorithms that confer maximum potential benefits to patients, clinicians, and investigators. We propose a framework for ideal algorithms, including 6 desiderata: explainable (convey the relative importance of features in determining outputs), dynamic (capture temporal changes in physiologic signals and clinical events), precise (use high-resolution, multimodal data and aptly complex architecture), autonomous (learn with minimal supervision and execute without human input), fair (evaluate and mitigate implicit bias and social inequity), and reproducible (validated externally and prospectively and shared with academic communities). We present an ideal algorithms checklist and apply it to highly cited algorithms. Strategies and tools such as the predictive, descriptive, relevant (PDR) framework, the Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence (SPIRIT-AI) extension, sparse regression methods, and minimizing concept drift can help healthcare algorithms achieve these objectives, toward ideal algorithms in healthcare.
RESUMEN
Generalizability, external validity, and reproducibility are high priorities for artificial intelligence applications in healthcare. Traditional approaches to addressing these elements involve sharing patient data between institutions or practice settings, which can compromise data privacy (individuals' right to prevent the sharing and disclosure of information about themselves) and data security (simultaneously preserving confidentiality, accuracy, fidelity, and availability of data). This article describes insights from real-world implementation of federated learning techniques that offer opportunities to maintain both data privacy and availability via collaborative machine learning that shares knowledge, not data. Local models are trained separately on local data. As they train, they send local model updates (e.g. coefficients or gradients) for consolidation into a global model. In some use cases, global models outperform local models on new, previously unseen local datasets, suggesting that collaborative learning from a greater number of examples, including a greater number of rare cases, may improve predictive performance. Even when sharing model updates rather than data, privacy leakage can occur when adversaries perform property or membership inference attacks which can be used to ascertain information about the training set. Emerging techniques mitigate risk from adversarial attacks, allowing investigators to maintain both data privacy and availability in collaborative healthcare research. When data heterogeneity between participating centers is high, personalized algorithms may offer greater generalizability by improving performance on data from centers with proportionately smaller training sample sizes. Properly applied, federated learning has the potential to optimize the reproducibility and performance of collaborative learning while preserving data security and privacy.
RESUMEN
Healthcare systems are hampered by incomplete and fragmented patient health records. Record linkage is widely accepted as a solution to improve the quality and completeness of patient records. However, there does not exist a systematic approach for manually reviewing patient records to create gold standard record linkage data sets. We propose a robust framework for creating and evaluating manually reviewed gold standard data sets for measuring the performance of patient matching algorithms. Our 8-point approach covers data preprocessing, blocking, record adjudication, linkage evaluation, and reviewer characteristics. This framework can help record linkage method developers provide necessary transparency when creating and validating gold standard reference matching data sets. In turn, this transparency will support both the internal and external validity of recording linkage studies and improve the robustness of new record linkage strategies.
Asunto(s)
Registros de Salud Personal , Registro Médico Coordinado , Humanos , Registro Médico Coordinado/métodos , Algoritmos , Almacenamiento y Recuperación de la Información , Recolección de DatosRESUMEN
Importance: Predicting postoperative complications has the potential to inform shared decisions regarding the appropriateness of surgical procedures, targeted risk-reduction strategies, and postoperative resource use. Realizing these advantages requires that accurate real-time predictions be integrated with clinical and digital workflows; artificial intelligence predictive analytic platforms using automated electronic health record (EHR) data inputs offer an intriguing possibility for achieving this, but there is a lack of high-level evidence from prospective studies supporting their use. Objective: To examine whether the MySurgeryRisk artificial intelligence system has stable predictive performance between development and prospective validation phases and whether it is feasible to provide automated outputs directly to surgeons' mobile devices. Design, Setting, and Participants: In this prognostic study, the platform used automated EHR data inputs and machine learning algorithms to predict postoperative complications and provide predictions to surgeons, previously through a web portal and currently through a mobile device application. All patients 18 years or older who were admitted for any type of inpatient surgical procedure (74â¯417 total procedures involving 58 236 patients) between June 1, 2014, and September 20, 2020, were included. Models were developed using retrospective data from 52â¯117 inpatient surgical procedures performed between June 1, 2014, and November 27, 2018. Validation was performed using data from 22â¯300 inpatient surgical procedures collected prospectively from November 28, 2018, to September 20, 2020. Main Outcomes and Measures: Algorithms for generalized additive models and random forest models were developed and validated using real-time EHR data. Model predictive performance was evaluated primarily using area under the receiver operating characteristic curve (AUROC) values. Results: Among 58â¯236 total adult patients who received 74 417 major inpatient surgical procedures, the mean (SD) age was 57 (17) years; 29 226 patients (50.2%) were male. Results reported in this article focus primarily on the validation cohort. The validation cohort included 22â¯300 inpatient surgical procedures involving 19 132 patients (mean [SD] age, 58 [17] years; 9672 [50.6%] male). A total of 2765 patients (14.5%) were Black or African American, 14 777 (77.2%) were White, 1235 (6.5%) were of other races (including American Indian or Alaska Native, Asian, Native Hawaiian or Pacific Islander, and multiracial), and 355 (1.9%) were of unknown race because of missing data; 979 patients (5.1%) were Hispanic, 17 663 (92.3%) were non-Hispanic, and 490 (2.6%) were of unknown ethnicity because of missing data. A greater number of input features was associated with stable or improved model performance. For example, the random forest model trained with 135 input features had the highest AUROC values for predicting acute kidney injury (0.82; 95% CI, 0.82-0.83); cardiovascular complications (0.81; 95% CI, 0.81-0.82); neurological complications, including delirium (0.87; 95% CI, 0.87-0.88); prolonged intensive care unit stay (0.89; 95% CI, 0.88-0.89); prolonged mechanical ventilation (0.91; 95% CI, 0.90-0.91); sepsis (0.86; 95% CI, 0.85-0.87); venous thromboembolism (0.82; 95% CI, 0.81-0.83); wound complications (0.78; 95% CI, 0.78-0.79); 30-day mortality (0.84; 95% CI, 0.82-0.86); and 90-day mortality (0.84; 95% CI, 0.82-0.85), with accuracy similar to surgeons' predictions. Compared with the original web portal, the mobile device application allowed efficient fingerprint login access and loaded data approximately 10 times faster. The application output displayed patient information, risk of postoperative complications, top 3 risk factors for each complication, and patterns of complications for individual surgeons compared with their colleagues. Conclusions and Relevance: In this study, automated real-time predictions of postoperative complications with mobile device outputs had good performance in clinical settings with prospective validation, matching surgeons' predictive accuracy.