RESUMEN
Importance: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications. Objective: To evaluate risk-prediction model performance when trained on risk-specific cohorts. Design, Setting, and Participants: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109â¯445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined. Exposures: The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively. Main Outcomes and Measures: Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model. Results: A total of 109â¯445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77â¯921 procedures [71.2%]) and Jacksonville (31â¯524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109â¯445 operations, 55â¯646 patients were male (50.8%), and 66â¯495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40). Conclusion and Relevance: In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.
RESUMEN
On average, more than 5 million patients are admitted to intensive care units (ICUs) in the US, with mortality rates ranging from 10 to 29%. The acuity state of patients in the ICU can quickly change from stable to unstable, sometimes leading to life-threatening conditions. Early detection of deteriorating conditions can assist in more timely interventions and improved survival rates. While Artificial Intelligence (AI)-based models show potential for assessing acuity in a more granular and automated manner, they typically use mortality as a proxy of acuity in the ICU. Furthermore, these methods do not determine the acuity state of a patient (i.e., stable or unstable), the transition between acuity states, or the need for life-sustaining therapies. In this study, we propose APRICOT-M (Acuity Prediction in Intensive Care Unit-Mamba), a 1M-parameter state space-based neural network to predict acuity state, transitions, and the need for life-sustaining therapies in real-time among ICU patients. The model integrates ICU data in the preceding four hours (including vital signs, laboratory results, assessment scores, and medications) and patient characteristics (age, sex, race, and comorbidities) to predict the acuity outcomes in the next four hours. Our state space-based model can process sparse and irregularly sampled data without manual imputation, thus reducing the noise in input data and increasing inference speed. The model was trained on data from 107,473 patients (142,062 ICU admissions) from 55 hospitals between 2014-2017 and validated externally on data from 74,901 patients (101,356 ICU admissions) from 143 hospitals. Additionally, it was validated temporally on data from 12,927 patients (15,940 ICU admissions) from one hospital in 2018-2019 and prospectively on data from 215 patients (369 ICU admissions) from one hospital in 2021-2023. Three datasets were used for training and evaluation: the University of Florida Health (UFH) dataset, the electronic ICU Collaborative Research Database (eICU), and the Medical Information Mart for Intensive Care (MIMIC)-IV dataset. APRICOT-M significantly outperforms the baseline acuity assessment, Sequential Organ Failure Assessment (SOFA), for mortality prediction in both external (AUROC 0.95 CI: 0.94-0.95 compared to 0.78 CI: 0.78-0.79) and prospective (AUROC 0.99 CI: 0.97-1.00 compared to 0.80 CI: 0.65-0.92) cohorts, as well as for instability prediction (external AUROC 0.75 CI: 0.74-0.75 compared to 0.51 CI: 0.51-0.51, and prospective AUROC 0.69 CI: 0.64-0.74 compared to 0.53 CI: 0.50-0.57). This tool has the potential to help clinicians make timely interventions by predicting the transition between acuity states and decision-making on life-sustaining within the next four hours in the ICU.
RESUMEN
BACKGROUND: Surrogates, proxies, and clinicians making shared treatment decisions for patients who have lost decision-making capacity often fail to honor patients' wishes, due to stress, time pressures, misunderstanding patient values, and projecting personal biases. Advance directives intend to align care with patient values but are limited by low completion rates and application to only a subset of medical decisions. Here, we investigate the potential of large language models (LLMs) to incorporate patient values in supporting critical care clinical decision-making for incapacitated patients in a proof-of-concept study. METHODS: We simulated text-based scenarios for 50 decisionally incapacitated patients for whom a medical condition required imminent clinical decisions regarding specific interventions. For each patient, we also simulated five unique value profiles captured using alternative formats: numeric ranking questionnaires, text-based questionnaires, and free-text narratives. We used pre-trained generative LLMs for two tasks: 1) text extraction of the treatments under consideration and 2) prompt-based question-answering to generate a recommendation in response to the scenario information, extracted treatment, and patient value profiles. Model outputs were compared with adjudications by three domain experts who independently evaluated each scenario and decision. RESULTS AND CONCLUSIONS: Automated extractions of the treatment in question were accurate for 88% (n = 44/50) of scenarios. LLM treatment recommendations received an average Likert score by the adjudicators of 3.92 of 5.00 (five being best) across all patients for being medically plausible and reasonable treatment recommendations, and 3.58 of 5.00 for reflecting the documented values of the patient. Scores were highest when patient values were captured as short, unstructured, and free-text narratives based on simulated patient profiles. This proof-of-concept study demonstrates the potential for LLMs to function as support tools for surrogates, proxies, and clinicians aiming to honor the wishes and values of decisionally incapacitated patients.
Asunto(s)
Apoderado , Humanos , Directivas Anticipadas , Toma de Decisiones , Toma de Decisiones Clínicas/métodos , Prueba de Estudio Conceptual , Encuestas y Cuestionarios , Lenguaje , Cuidados Críticos/métodosRESUMEN
The degree to which artificial intelligence healthcare research is informed by data and stakeholders from community settings has not been previously described. As communities are the principal location of healthcare delivery, engaging them could represent an important opportunity to improve scientific quality. This scoping review systematically maps what is known and unknown about community-engaged artificial intelligence research and identifies opportunities to optimize the generalizability of these applications through involvement of community stakeholders and data throughout model development, validation, and implementation. Embase, PubMed, and MEDLINE databases were searched for articles describing artificial intelligence or machine learning healthcare applications with community involvement in model development, validation, or implementation. Model architecture and performance, the nature of community engagement, and barriers or facilitators to community engagement were reported according to PRISMA extension for Scoping Reviews guidelines. Of approximately 10,880 articles describing artificial intelligence healthcare applications, 21 (0.2%) described community involvement. All articles derived data from community settings, most commonly by leveraging existing datasets and sources that included community subjects, and often bolstered by internet-based data acquisition and subject recruitment. Only one article described inclusion of community stakeholders in designing an application-a natural language processing model that detected cases of likely child abuse with 90% accuracy using harmonized electronic health record notes from both hospital and community practice settings. The primary barrier to including community-derived data was small sample sizes, which may have affected 11 of the 21 studies (53%), introducing substantial risk for overfitting that threatens generalizability. Community engagement in artificial intelligence healthcare application development, validation, or implementation is rare. As healthcare delivery occurs primarily in community settings, investigators should consider engaging community stakeholders in user-centered design, usability, and clinical implementation studies to optimize generalizability.
RESUMEN
Background: The use of Intra-aortic Balloon Pump (IABP) and Impella devices as a bridge to heart transplantation (HTx) has increased significantly in recent times. This study aimed to create and validate an explainable machine learning (ML) model that can predict the failure of status two listings and identify the clinical features that significantly impact this outcome. Methods: We used the UNOS registry database to identify HTx candidates listed as UNOS Status 2 between 2018 and 2022 and supported with either Impella (5.0 or 5.5) or IABP. We used the eXtreme Gradient Boosting (XGBoost) algorithm to build and validate ML models. We developed two models: (1) a comprehensive model that included all patients in our cohort and (2) separate models designed for each of the 11 UNOS regions. Results: We analyzed data from 4,178 patients listed as Status 2. Out of them, 12% had primary outcomes indicating Status 2 failure. Our ML models were based on 19 variables from the UNOS data. The comprehensive model had an area under the curve (AUC) of 0.71 (±0.03), with a range between 0.44 (±0.08) and 0.74 (±0.01) across different regions. The models' specificity ranged from 0.75 to 0.96. The top five most important predictors were the number of inotropes, creatinine, sodium, BMI, and blood group. Conclusion: Using ML is clinically valuable for highlighting patients at risk, enabling healthcare providers to offer intensified monitoring, optimization, and care escalation selectively.
RESUMEN
Objective: To determine whether certain patients are vulnerable to errant triage decisions immediately after major surgery and whether there are unique sociodemographic phenotypes within overtriaged and undertriaged cohorts. Background: In a fair system, overtriage of low-acuity patients to intensive care units (ICUs) and undertriage of high-acuity patients to general wards would affect all sociodemographic subgroups equally. Methods: This multicenter, longitudinal cohort study of hospital admissions immediately after major surgery compared hospital mortality and value of care (risk-adjusted mortality/total costs) across 4 cohorts: overtriage (N = 660), risk-matched overtriage controls admitted to general wards (N = 3077), undertriage (N = 2335), and risk-matched undertriage controls admitted to ICUs (N = 4774). K-means clustering identified sociodemographic phenotypes within overtriage and undertriage cohorts. Results: Compared with controls, overtriaged admissions had a predominance of male patients (56.2% vs 43.1%, P < 0.001) and commercial insurance (6.4% vs 2.5%, P < 0.001); undertriaged admissions had a predominance of Black patients (28.4% vs 24.4%, P < 0.001) and greater socioeconomic deprivation. Overtriage was associated with increased total direct costs [$16.2K ($11.4K-$23.5K) vs $14.1K ($9.1K-$20.7K), P < 0.001] and low value of care; undertriage was associated with increased hospital mortality (1.5% vs 0.7%, P = 0.002) and hospice care (2.2% vs 0.6%, P < 0.001) and low value of care. Unique sociodemographic phenotypes within both overtriage and undertriage cohorts had similar outcomes and value of care, suggesting that triage decisions, rather than patient characteristics, drive outcomes and value of care. Conclusions: Postoperative triage decisions should ensure equality across sociodemographic groups by anchoring triage decisions to objective patient acuity assessments, circumventing cognitive shortcuts and mitigating bias.
RESUMEN
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Asunto(s)
Aprendizaje Profundo , Procesamiento de Lenguaje Natural , Humanos , Inteligencia Artificial , Atención a la Salud/organización & administración , Redes Neurales de la Computación , Registros Electrónicos de SaludRESUMEN
Acuity assessments are vital for timely interventions and fair resource allocation in critical care settings. Conventional acuity scoring systems heavily depend on subjective patient assessments, leaving room for implicit bias and errors. These assessments are often manual, time-consuming, intermittent, and challenging to interpret accurately, especially for healthcare providers. This risk of bias and error is likely most pronounced in time-constrained and high-stakes environments, such as critical care settings. Furthermore, such scores do not incorporate other information, such as patients' mobility level, which can indicate recovery or deterioration in the intensive care unit (ICU), especially at a granular level. We hypothesized that wearable sensor data could assist in assessing patient acuity granularly, especially in conjunction with clinical data from electronic health records (EHR). In this prospective study, we evaluated the impact of integrating mobility data collected from wrist-worn accelerometers with clinical data obtained from EHR for estimating acuity. Accelerometry data were collected from 87 patients wearing accelerometers on their wrists in an academic hospital setting. The data was evaluated using five deep neural network models: VGG, ResNet, MobileNet, SqueezeNet, and a custom Transformer network. These models outperformed a rule-based clinical score (Sequential Organ Failure Assessment, SOFA) used as a baseline when predicting acuity state (for ground truth we labeled as unstable patients if they needed life-supporting therapies, and as stable otherwise), particularly regarding the precision, sensitivity, and F1 score. The results demonstrate that integrating accelerometer data with demographics and clinical variables improves predictive performance compared to traditional scoring systems in healthcare. Deep learning models consistently outperformed the SOFA score baseline across various scenarios, showing notable enhancements in metrics such as the area under the receiver operating characteristic (ROC) Curve (AUC), precision, sensitivity, specificity, and F1 score. The most comprehensive scenario, leveraging accelerometer, demographics, and clinical data, achieved the highest AUC of 0.73, compared to 0.53 when using SOFA score as the baseline, with significant improvements in precision (0.80 vs. 0.23), specificity (0.79 vs. 0.73), and F1 score (0.77 vs. 0.66). This study demonstrates a novel approach beyond the simplistic differentiation between stable and unstable conditions. By incorporating mobility and comprehensive patient information, we distinguish between these states in critically ill patients and capture essential nuances in physiology and functional status. Unlike rudimentary definitions, such as equating low blood pressure with instability, our methodology delves deeper, offering a more holistic understanding and potentially valuable insights for acuity assessment.
RESUMEN
Using clustering analysis for early vital signs, unique patient phenotypes with distinct pathophysiological signatures and clinical outcomes may be revealed and support early clinical decision-making. Phenotyping using early vital signs has proven challenging, as vital signs are typically sampled sporadically. We proposed a novel, deep temporal interpolation and clustering network to simultaneously extract latent representations from irregularly sampled vital signs and derive phenotypes. Four distinct clusters were identified. Phenotype A (18%) had the greatest prevalence of comorbid disease with increased prevalence of prolonged respiratory insufficiency, acute kidney injury, sepsis, and long-term (3-year) mortality. Phenotypes B (33%) and C (31%) had a diffuse pattern of mild organ dysfunction. Phenotype B's favorable short-term clinical outcomes were tempered by the second highest rate of long-term mortality. Phenotype C had favorable clinical outcomes. Phenotype D (17%) exhibited early and persistent hypotension, high incidence of early surgery, and substantial biomarker incidence of inflammation. Despite early and severe illness, phenotype D had the second lowest long-term mortality. After comparing the sequential organ failure assessment scores, the clustering results did not simply provide a recapitulation of previous acuity assessments. This tool may impact triage decisions and have significant implications for clinical decision-support under time constraints and uncertainty.
Asunto(s)
Puntuaciones en la Disfunción de Órganos , Sepsis , Humanos , Enfermedad Aguda , Fenotipo , Biomarcadores , Análisis por ConglomeradosRESUMEN
BACKGROUND: Perhaps nowhere else in the healthcare system than in the intensive care unit environment are the challenges to create useful models with direct time-critical clinical applications more relevant and the obstacles to achieving those goals more massive. Machine learning-based artificial intelligence (AI) techniques to define states and predict future events are commonplace activities of modern life. However, their penetration into acute care medicine has been slow, stuttering and uneven. Major obstacles to widespread effective application of AI approaches to the real-time care of the critically ill patient exist and need to be addressed. MAIN BODY: Clinical decision support systems (CDSSs) in acute and critical care environments support clinicians, not replace them at the bedside. As will be discussed in this review, the reasons are many and include the immaturity of AI-based systems to have situational awareness, the fundamental bias in many large databases that do not reflect the target population of patient being treated making fairness an important issue to address and technical barriers to the timely access to valid data and its display in a fashion useful for clinical workflow. The inherent "black-box" nature of many predictive algorithms and CDSS makes trustworthiness and acceptance by the medical community difficult. Logistically, collating and curating in real-time multidimensional data streams of various sources needed to inform the algorithms and ultimately display relevant clinical decisions support format that adapt to individual patient responses and signatures represent the efferent limb of these systems and is often ignored during initial validation efforts. Similarly, legal and commercial barriers to the access to many existing clinical databases limit studies to address fairness and generalizability of predictive models and management tools. CONCLUSIONS: AI-based CDSS are evolving and are here to stay. It is our obligation to be good shepherds of their use and further development.
Asunto(s)
Algoritmos , Inteligencia Artificial , Humanos , Cuidados Críticos , Unidades de Cuidados Intensivos , Atención a la SaludRESUMEN
OBJECTIVE: Gender disparities in surgical training and assessment are described in the general surgery literature. Assessment disparities have not been explored in vascular surgery. We sought to investigate gender disparities in operative assessment in a national cohort of vascular surgery integrated residents (VIRs) and fellows (VSFs). METHODS: Operative performance and autonomy ratings from the Society for Improving Medical Professional Learning (SIMPL) application database were collected for all vascular surgery participating institutions from 2018 to 2023. Logistic generalized linear mixed models were conducted to examine the association of faculty and trainee gender on faculty and self-assessment of autonomy and performance. Data were adjusted for post-graduate year and case complexity. Random effects were included to account for clustering effects due to participant, program, and procedure. RESULTS: One hundred three trainees (n = 63 VIRs; n = 40 VSFs; 63.1% men) and 99 faculty (73.7% men) from 17 institutions (n = 12 VIR and n = 13 VSF programs) contributed 4951 total assessments (44.4% by faculty, 55.6% by trainees) across 235 unique procedures. Faculty and trainee gender were not associated with faculty ratings of performance (faculty gender: odds ratio [OR], 0.78; 95% confidence interval [CI], 0.27-2.29; trainee gender: OR, 1.80; 95% CI, 0.76-0.43) or autonomy (faculty gender: OR, 0.99; 95% CI, 0.41-2.39; trainee gender: OR, 1.23; 95% CI, 0.62-2.45) of trainees. All trainees self-assessed at lower performance and autonomy ratings as compared with faculty assessments. However, women trainees rated themselves significantly lower than men for both autonomy (OR, 0.57; 95% CI, 0.43-0.74) and performance (OR, 0.40; 95% CI, 0.30-0.54). CONCLUSIONS: Although gender was not associated with differences in faculty assessment of performance or autonomy among vascular surgery trainees, women trainees perceive themselves as performing with lower competency and less autonomy than their male colleagues. These findings suggest utility for exploring gender differences in real-time feedback delivered to and received by trainees and targeted interventions to align trainee self-perception with actual operative performance and autonomy to optimize surgical skill acquisition.
Asunto(s)
Competencia Clínica , Educación de Postgrado en Medicina , Internado y Residencia , Autonomía Profesional , Cirujanos , Procedimientos Quirúrgicos Vasculares , Humanos , Femenino , Masculino , Procedimientos Quirúrgicos Vasculares/educación , Cirujanos/educación , Cirujanos/psicología , Factores Sexuales , Médicos Mujeres , Estados Unidos , Sexismo , Docentes Médicos , AdultoRESUMEN
Diabetic nephropathy (DN) in the context of type 2 diabetes is the leading cause of end-stage renal disease (ESRD) in the United States. DN is graded based on glomerular morphology and has a spatially heterogeneous presentation in kidney biopsies that complicates pathologists' predictions of disease progression. Artificial intelligence and deep learning methods for pathology have shown promise for quantitative pathological evaluation and clinical trajectory estimation; but, they often fail to capture large-scale spatial anatomy and relationships found in whole slide images (WSIs). In this study, we present a transformer-based, multi-stage ESRD prediction framework built upon nonlinear dimensionality reduction, relative Euclidean pixel distance embeddings between every pair of observable glomeruli, and a corresponding spatial self-attention mechanism for a robust contextual representation. We developed a deep transformer network for encoding WSI and predicting future ESRD using a dataset of 56 kidney biopsy WSIs from DN patients at Seoul National University Hospital. Using a leave-one-out cross-validation scheme, our modified transformer framework outperformed RNNs, XGBoost, and logistic regression baseline models, and resulted in an area under the receiver operating characteristic curve (AUC) of 0.97 (95% CI: 0.90-1.00) for predicting two-year ESRD, compared with an AUC of 0.86 (95% CI: 0.66-0.99) without our relative distance embedding, and an AUC of 0.76 (95% CI: 0.59-0.92) without a denoising autoencoder module. While the variability and generalizability induced by smaller sample sizes are challenging, our distance-based embedding approach and overfitting mitigation techniques yielded results that suggest opportunities for future spatially aware WSI research using limited pathology datasets.
RESUMEN
Acute kidney injury (AKI), which is a common complication of acute illnesses, affects the health of individuals in community, acute care and post-acute care settings. Although the recognition, prevention and management of AKI has advanced over the past decades, its incidence and related morbidity, mortality and health care burden remain overwhelming. The rapid growth of digital technologies has provided a new platform to improve patient care, and reports show demonstrable benefits in care processes and, in some instances, in patient outcomes. However, despite great progress, the potential benefits of using digital technology to manage AKI has not yet been fully explored or implemented in clinical practice. Digital health studies in AKI have shown variable evidence of benefits, and the digital divide means that access to digital technologies is not equitable. Upstream research and development costs, limited stakeholder participation and acceptance, and poor scalability of digital health solutions have hindered their widespread implementation and use. Here, we provide recommendations from the Acute Disease Quality Initiative consensus meeting, which involved experts in adult and paediatric nephrology, critical care, pharmacy and data science, at which the use of digital health for risk prediction, prevention, identification and management of AKI and its consequences was discussed.
Asunto(s)
Lesión Renal Aguda , Nefrología , Adulto , Niño , Humanos , Enfermedad Aguda , Consenso , Lesión Renal Aguda/diagnóstico , Lesión Renal Aguda/terapia , Lesión Renal Aguda/etiología , Cuidados CríticosRESUMEN
Objectives: This study tests the null hypotheses that overall sentiment and gendered words in verbal feedback and resident operative autonomy relative to performance are similar for female and male residents. Background: Female and male surgical residents may experience training differently, affecting the quality of learning and graduated autonomy. Methods: A longitudinal, observational study using a Society for Improving Medical Professional Learning collaborative dataset describing resident and attending evaluations of resident operative performance and autonomy and recordings of verbal feedback from attendings from surgical procedures performed at 54 US general surgery residency training programs from 2016 to 2021. Overall sentiment, adjectives, and gendered words in verbal feedback were quantified by natural language processing. Resident operative autonomy and performance, as evaluated by attendings, were reported on 5-point ordinal scales. Performance-adjusted autonomy was calculated as autonomy minus performance. Results: The final dataset included objective assessments and dictated feedback for 2683 surgical procedures. Sentiment scores were higher for female residents (95 [interquartile range (IQR), 4-100] vs 86 [IQR 2-100]; P < 0.001). Gendered words were present in a greater proportion of dictations for female residents (29% vs 25%; P = 0.04) due to male attendings disproportionately using male-associated words in feedback for female residents (28% vs 23%; P = 0.01). Overall, attendings reported that male residents received greater performance-adjusted autonomy compared with female residents (P < 0.001). Conclusions: Sentiment and gendered words in verbal feedback and performance-adjusted operative autonomy differed for female and male general surgery residents. These findings suggest a need to ensure that trainees are given appropriate and equitable operative autonomy and feedback.
RESUMEN
BACKGROUND: Acute kidney injury is a common postoperative complication affecting between 10% and 30% of surgical patients. Acute kidney injury is associated with increased resource usage and chronic kidney disease development, with more severe acute kidney injury suggesting more aggressive deterioration in clinical outcomes and mortality. METHODS: We considered 42,906 surgical patients admitted to University of Florida Health (n = 51,806) between 2014 and 2021. Acute kidney injury stages were determined using the Kidney Disease Improving Global Outcomes serum creatinine criteria. We developed a recurrent neural network-based model to continuously predict acute kidney injury risk and state in the following 24 hours and compared it with logistic regression, random forest, and multi-layer perceptron models. We used medications, laboratory and vital measurements, and derived features from past one-year records as inputs. We analyzed the proposed model with integrated gradients for enhanced explainability. RESULTS: Postoperative acute kidney injury at any stage developed in 20% (10,664) of the cohort. The recurrent neural network model was more accurate in predicting nearly all categories of next-day acute kidney injury stages (including the no acute kidney injury group). The area under the receiver operating curve and 95% confidence intervals for recurrent neural network and logistic regression models were for no acute kidney injury (0.98 [0.98-0.98] vs 0.93 [0.93-0.93]), stage 1 (0.95 [0.95-0.95] vs. 0.81 [0.80-0.82]), stage 2/3 (0.99 [0.99-0.99] vs 0.96 [0.96-0.97]), and stage 3 with renal replacement therapy (1.0 [1.0-1.0] vs 1.0 [1.0-1.0]. CONCLUSION: The proposed model demonstrates that temporal processing of patient information can lead to more granular and dynamic modeling of acute kidney injury status and result in more continuous and accurate acute kidney injury prediction. We showcase the integrated gradients framework's utility as a mechanism for enhancing model explainability, potentially facilitating clinical trust for future implementation.
Asunto(s)
Lesión Renal Aguda , Aprendizaje Profundo , Humanos , Lesión Renal Aguda/diagnóstico , Lesión Renal Aguda/epidemiología , Lesión Renal Aguda/etiología , Modelos Logísticos , Predicción , RiñónRESUMEN
BACKGROUND: Intra-aortic balloon pump (IABP) and Impella device utilization as a bridge to heart transplantation (HTx) have risen exponentially. We aimed to explore the influence of device selection on HTx outcomes, considering regional practice variation. METHODS: A retrospective longitudinal study was performed on a United Network for Organ Sharing (UNOS) registry dataset. We included adult patients listed for HTx between October 2018 and April 2022 as status 2, as justified by requiring IABP or Impella support. The primary end-point was successful bridging to HTx as status 2. RESULTS: Of 32,806 HTx during the study period, 4178 met inclusion criteria (Impella n = 650, IABP n = 3528). Waitlist mortality increased from a nadir of 16 (in 2019) to a peak of 36 (in 2022) per thousand status 2 listed patients. Impella annual use increased from 8% in 2019 to 19% in 2021. Compared to IABP, Impella patients demonstrated higher medical acuity and lower success rate of transplantation as status 2 (92.1% vs 88.9%, p < 0.001). The IABP:Impella utilization ratio varied widely between regions, ranging from 1.77 to 21.31, with high Impella use in Southern and Western states. However, this difference was not justified by medical acuity, regional transplant volume, or waitlist time and did not correlate with waitlist mortality. CONCLUSIONS: The shift in utilizing Impella as opposed to IABP did not improve waitlist outcomes. Our results suggest that clinical practice patterns beyond mere device selection determine successful bridging to HTx. There is a critical need for objective evidence to guide tMCS utilization and a paradigm shift in the UNOS allocation system to achieve equitable HTx practice across the United States.
RESUMEN
Introduction: Artificial intelligence can recognize complex patterns in large datasets. It is a promising technology to advance heart failure practice, as many decisions rely on expert opinions in the absence of high-quality data-driven evidence. Methods: We searched Embase, Web of Science, and PubMed databases for articles containing "artificial intelligence," "machine learning," or "deep learning" and any of the phrases "heart transplantation," "ventricular assist device," or "cardiogenic shock" from inception until August 2022. We only included original research addressing post heart transplantation (HTx) or mechanical circulatory support (MCS) clinical care. Review and data extraction were performed in accordance with PRISMA-Scr guidelines. Results: Of 584 unique publications detected, 31 met the inclusion criteria. The majority focused on outcome prediction post HTx (n = 13) and post durable MCS (n = 7), as well as post HTx and MCS management (n = 7, n = 3, respectively). One study addressed temporary mechanical circulatory support. Most studies advocated for rapid integration of AI into clinical practice, acknowledging potential improvements in management guidance and reliability of outcomes prediction. There was a notable paucity of external data validation and integration of multiple data modalities. Conclusion: Our review showed mounting innovation in AI application in management of MCS and HTx, with the largest evidence showing improved mortality outcome prediction.