RESUMO
OBJECTIVE: Gender disparities in surgical training and assessment are described in the general surgery literature. Assessment disparities have not been explored in vascular surgery. We sought to investigate gender disparities in operative assessment in a national cohort of vascular surgery integrated residents (VIRs) and fellows (VSFs). METHODS: Operative performance and autonomy ratings from the Society for Improving Medical Professional Learning (SIMPL) application database were collected for all vascular surgery participating institutions from 2018 to 2023. Logistic generalized linear mixed models were conducted to examine the association of faculty and trainee gender on faculty and self-assessment of autonomy and performance. Data were adjusted for post-graduate year and case complexity. Random effects were included to account for clustering effects due to participant, program, and procedure. RESULTS: One hundred three trainees (n = 63 VIRs; n = 40 VSFs; 63.1% men) and 99 faculty (73.7% men) from 17 institutions (n = 12 VIR and n = 13 VSF programs) contributed 4951 total assessments (44.4% by faculty, 55.6% by trainees) across 235 unique procedures. Faculty and trainee gender were not associated with faculty ratings of performance (faculty gender: odds ratio [OR], 0.78; 95% confidence interval [CI], 0.27-2.29; trainee gender: OR, 1.80; 95% CI, 0.76-0.43) or autonomy (faculty gender: OR, 0.99; 95% CI, 0.41-2.39; trainee gender: OR, 1.23; 95% CI, 0.62-2.45) of trainees. All trainees self-assessed at lower performance and autonomy ratings as compared with faculty assessments. However, women trainees rated themselves significantly lower than men for both autonomy (OR, 0.57; 95% CI, 0.43-0.74) and performance (OR, 0.40; 95% CI, 0.30-0.54). CONCLUSIONS: Although gender was not associated with differences in faculty assessment of performance or autonomy among vascular surgery trainees, women trainees perceive themselves as performing with lower competency and less autonomy than their male colleagues. These findings suggest utility for exploring gender differences in real-time feedback delivered to and received by trainees and targeted interventions to align trainee self-perception with actual operative performance and autonomy to optimize surgical skill acquisition.
Assuntos
Competência Clínica , Educação de Pós-Graduação em Medicina , Internato e Residência , Autonomia Profissional , Cirurgiões , Procedimentos Cirúrgicos Vasculares , Humanos , Feminino , Masculino , Procedimentos Cirúrgicos Vasculares/educação , Cirurgiões/educação , Cirurgiões/psicologia , Fatores Sexuais , Médicas , Estados Unidos , Sexismo , Docentes de Medicina , AdultoRESUMO
BACKGROUND: Perhaps nowhere else in the healthcare system than in the intensive care unit environment are the challenges to create useful models with direct time-critical clinical applications more relevant and the obstacles to achieving those goals more massive. Machine learning-based artificial intelligence (AI) techniques to define states and predict future events are commonplace activities of modern life. However, their penetration into acute care medicine has been slow, stuttering and uneven. Major obstacles to widespread effective application of AI approaches to the real-time care of the critically ill patient exist and need to be addressed. MAIN BODY: Clinical decision support systems (CDSSs) in acute and critical care environments support clinicians, not replace them at the bedside. As will be discussed in this review, the reasons are many and include the immaturity of AI-based systems to have situational awareness, the fundamental bias in many large databases that do not reflect the target population of patient being treated making fairness an important issue to address and technical barriers to the timely access to valid data and its display in a fashion useful for clinical workflow. The inherent "black-box" nature of many predictive algorithms and CDSS makes trustworthiness and acceptance by the medical community difficult. Logistically, collating and curating in real-time multidimensional data streams of various sources needed to inform the algorithms and ultimately display relevant clinical decisions support format that adapt to individual patient responses and signatures represent the efferent limb of these systems and is often ignored during initial validation efforts. Similarly, legal and commercial barriers to the access to many existing clinical databases limit studies to address fairness and generalizability of predictive models and management tools. CONCLUSIONS: AI-based CDSS are evolving and are here to stay. It is our obligation to be good shepherds of their use and further development.
Assuntos
Algoritmos , Inteligência Artificial , Humanos , Cuidados Críticos , Unidades de Terapia Intensiva , Atenção à SaúdeRESUMO
OBJECTIVE: We test the hypothesis that for low-acuity surgical patients, postoperative intensive care unit (ICU) admission is associated with lower value of care compared with ward admission. BACKGROUND: Overtriaging low-acuity patients to ICU consumes valuable resources and may not confer better patient outcomes. Associations among postoperative overtriage, patient outcomes, costs, and value of care have not been previously reported. METHODS: In this longitudinal cohort study, postoperative ICU admissions were classified as overtriaged or appropriately triaged according to machine learning-based patient acuity assessments and requirements for immediate postoperative mechanical ventilation or vasopressor support. The nearest neighbors algorithm identified risk-matched control ward admissions. The primary outcome was value of care, calculated as inverse observed-to-expected mortality ratios divided by total costs. RESULTS: Acuity assessments had an area under the receiver operating characteristic curve of 0.92 in generating predictions for triage classifications. Of 8592 postoperative ICU admissions, 423 (4.9%) were overtriaged. These were matched with 2155 control ward admissions with similar comorbidities, incidence of emergent surgery, immediate postoperative vital signs, and do not resuscitate order placement and rescindment patterns. Compared with controls, overtraiged admissions did not have a lower incidence of any measured complications. Total costs for admission were $16.4K for overtriage and $15.9K for controls ( P =0.03). Value of care was lower for overtriaged admissions [2.9 (2.0-4.0)] compared with controls [24.2 (14.1-34.5), P <0.001]. CONCLUSIONS: Low-acuity postoperative patients who were overtriaged to ICUs had increased total costs, no improvements in outcomes, and received low-value care.
Assuntos
Hospitalização , Unidades de Terapia Intensiva , Humanos , Estudos Longitudinais , Estudos Retrospectivos , Estudos de CoortesRESUMO
Background: The use of Intra-aortic Balloon Pump (IABP) and Impella devices as a bridge to heart transplantation (HTx) has increased significantly in recent times. This study aimed to create and validate an explainable machine learning (ML) model that can predict the failure of status two listings and identify the clinical features that significantly impact this outcome. Methods: We used the UNOS registry database to identify HTx candidates listed as UNOS Status 2 between 2018 and 2022 and supported with either Impella (5.0 or 5.5) or IABP. We used the eXtreme Gradient Boosting (XGBoost) algorithm to build and validate ML models. We developed two models: (1) a comprehensive model that included all patients in our cohort and (2) separate models designed for each of the 11 UNOS regions. Results: We analyzed data from 4,178 patients listed as Status 2. Out of them, 12% had primary outcomes indicating Status 2 failure. Our ML models were based on 19 variables from the UNOS data. The comprehensive model had an area under the curve (AUC) of 0.71 (±0.03), with a range between 0.44 (±0.08) and 0.74 (±0.01) across different regions. The models' specificity ranged from 0.75 to 0.96. The top five most important predictors were the number of inotropes, creatinine, sodium, BMI, and blood group. Conclusion: Using ML is clinically valuable for highlighting patients at risk, enabling healthcare providers to offer intensified monitoring, optimization, and care escalation selectively.
RESUMO
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Assuntos
Aprendizado Profundo , Processamento de Linguagem Natural , Humanos , Inteligência Artificial , Atenção à Saúde/organização & administração , Redes Neurais de Computação , Registros Eletrônicos de SaúdeRESUMO
On average, more than 5 million patients are admitted to intensive care units (ICUs) in the US, with mortality rates ranging from 10 to 29%. The acuity state of patients in the ICU can quickly change from stable to unstable, sometimes leading to life-threatening conditions. Early detection of deteriorating conditions can assist in more timely interventions and improved survival rates. While Artificial Intelligence (AI)-based models show potential for assessing acuity in a more granular and automated manner, they typically use mortality as a proxy of acuity in the ICU. Furthermore, these methods do not determine the acuity state of a patient (i.e., stable or unstable), the transition between acuity states, or the need for life-sustaining therapies. In this study, we propose APRICOT-M (Acuity Prediction in Intensive Care Unit-Mamba), a 1M-parameter state space-based neural network to predict acuity state, transitions, and the need for life-sustaining therapies in real-time among ICU patients. The model integrates ICU data in the preceding four hours (including vital signs, laboratory results, assessment scores, and medications) and patient characteristics (age, sex, race, and comorbidities) to predict the acuity outcomes in the next four hours. Our state space-based model can process sparse and irregularly sampled data without manual imputation, thus reducing the noise in input data and increasing inference speed. The model was trained on data from 107,473 patients (142,062 ICU admissions) from 55 hospitals between 2014-2017 and validated externally on data from 74,901 patients (101,356 ICU admissions) from 143 hospitals. Additionally, it was validated temporally on data from 12,927 patients (15,940 ICU admissions) from one hospital in 2018-2019 and prospectively on data from 215 patients (369 ICU admissions) from one hospital in 2021-2023. Three datasets were used for training and evaluation: the University of Florida Health (UFH) dataset, the electronic ICU Collaborative Research Database (eICU), and the Medical Information Mart for Intensive Care (MIMIC)-IV dataset. APRICOT-M significantly outperforms the baseline acuity assessment, Sequential Organ Failure Assessment (SOFA), for mortality prediction in both external (AUROC 0.95 CI: 0.94-0.95 compared to 0.78 CI: 0.78-0.79) and prospective (AUROC 0.99 CI: 0.97-1.00 compared to 0.80 CI: 0.65-0.92) cohorts, as well as for instability prediction (external AUROC 0.75 CI: 0.74-0.75 compared to 0.51 CI: 0.51-0.51, and prospective AUROC 0.69 CI: 0.64-0.74 compared to 0.53 CI: 0.50-0.57). This tool has the potential to help clinicians make timely interventions by predicting the transition between acuity states and decision-making on life-sustaining within the next four hours in the ICU.
RESUMO
Importance: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications. Objective: To evaluate risk-prediction model performance when trained on risk-specific cohorts. Design, Setting, and Participants: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109â¯445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined. Exposures: The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively. Main Outcomes and Measures: Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model. Results: A total of 109â¯445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77â¯921 procedures [71.2%]) and Jacksonville (31â¯524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109â¯445 operations, 55â¯646 patients were male (50.8%), and 66â¯495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40). Conclusion and Relevance: In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.
RESUMO
Using clustering analysis for early vital signs, unique patient phenotypes with distinct pathophysiological signatures and clinical outcomes may be revealed and support early clinical decision-making. Phenotyping using early vital signs has proven challenging, as vital signs are typically sampled sporadically. We proposed a novel, deep temporal interpolation and clustering network to simultaneously extract latent representations from irregularly sampled vital signs and derive phenotypes. Four distinct clusters were identified. Phenotype A (18%) had the greatest prevalence of comorbid disease with increased prevalence of prolonged respiratory insufficiency, acute kidney injury, sepsis, and long-term (3-year) mortality. Phenotypes B (33%) and C (31%) had a diffuse pattern of mild organ dysfunction. Phenotype B's favorable short-term clinical outcomes were tempered by the second highest rate of long-term mortality. Phenotype C had favorable clinical outcomes. Phenotype D (17%) exhibited early and persistent hypotension, high incidence of early surgery, and substantial biomarker incidence of inflammation. Despite early and severe illness, phenotype D had the second lowest long-term mortality. After comparing the sequential organ failure assessment scores, the clustering results did not simply provide a recapitulation of previous acuity assessments. This tool may impact triage decisions and have significant implications for clinical decision-support under time constraints and uncertainty.
Assuntos
Escores de Disfunção Orgânica , Sepse , Humanos , Doença Aguda , Fenótipo , Biomarcadores , Análise por ConglomeradosRESUMO
Acuity assessments are vital for timely interventions and fair resource allocation in critical care settings. Conventional acuity scoring systems heavily depend on subjective patient assessments, leaving room for implicit bias and errors. These assessments are often manual, time-consuming, intermittent, and challenging to interpret accurately, especially for healthcare providers. This risk of bias and error is likely most pronounced in time-constrained and high-stakes environments, such as critical care settings. Furthermore, such scores do not incorporate other information, such as patients' mobility level, which can indicate recovery or deterioration in the intensive care unit (ICU), especially at a granular level. We hypothesized that wearable sensor data could assist in assessing patient acuity granularly, especially in conjunction with clinical data from electronic health records (EHR). In this prospective study, we evaluated the impact of integrating mobility data collected from wrist-worn accelerometers with clinical data obtained from EHR for estimating acuity. Accelerometry data were collected from 87 patients wearing accelerometers on their wrists in an academic hospital setting. The data was evaluated using five deep neural network models: VGG, ResNet, MobileNet, SqueezeNet, and a custom Transformer network. These models outperformed a rule-based clinical score (Sequential Organ Failure Assessment, SOFA) used as a baseline when predicting acuity state (for ground truth we labeled as unstable patients if they needed life-supporting therapies, and as stable otherwise), particularly regarding the precision, sensitivity, and F1 score. The results demonstrate that integrating accelerometer data with demographics and clinical variables improves predictive performance compared to traditional scoring systems in healthcare. Deep learning models consistently outperformed the SOFA score baseline across various scenarios, showing notable enhancements in metrics such as the area under the receiver operating characteristic (ROC) Curve (AUC), precision, sensitivity, specificity, and F1 score. The most comprehensive scenario, leveraging accelerometer, demographics, and clinical data, achieved the highest AUC of 0.73, compared to 0.53 when using SOFA score as the baseline, with significant improvements in precision (0.80 vs. 0.23), specificity (0.79 vs. 0.73), and F1 score (0.77 vs. 0.66). This study demonstrates a novel approach beyond the simplistic differentiation between stable and unstable conditions. By incorporating mobility and comprehensive patient information, we distinguish between these states in critically ill patients and capture essential nuances in physiology and functional status. Unlike rudimentary definitions, such as equating low blood pressure with instability, our methodology delves deeper, offering a more holistic understanding and potentially valuable insights for acuity assessment.
RESUMO
BACKGROUND: Surrogates, proxies, and clinicians making shared treatment decisions for patients who have lost decision-making capacity often fail to honor patients' wishes, due to stress, time pressures, misunderstanding patient values, and projecting personal biases. Advance directives intend to align care with patient values but are limited by low completion rates and application to only a subset of medical decisions. Here, we investigate the potential of large language models (LLMs) to incorporate patient values in supporting critical care clinical decision-making for incapacitated patients in a proof-of-concept study. METHODS: We simulated text-based scenarios for 50 decisionally incapacitated patients for whom a medical condition required imminent clinical decisions regarding specific interventions. For each patient, we also simulated five unique value profiles captured using alternative formats: numeric ranking questionnaires, text-based questionnaires, and free-text narratives. We used pre-trained generative LLMs for two tasks: 1) text extraction of the treatments under consideration and 2) prompt-based question-answering to generate a recommendation in response to the scenario information, extracted treatment, and patient value profiles. Model outputs were compared with adjudications by three domain experts who independently evaluated each scenario and decision. RESULTS AND CONCLUSIONS: Automated extractions of the treatment in question were accurate for 88% (n = 44/50) of scenarios. LLM treatment recommendations received an average Likert score by the adjudicators of 3.92 of 5.00 (five being best) across all patients for being medically plausible and reasonable treatment recommendations, and 3.58 of 5.00 for reflecting the documented values of the patient. Scores were highest when patient values were captured as short, unstructured, and free-text narratives based on simulated patient profiles. This proof-of-concept study demonstrates the potential for LLMs to function as support tools for surrogates, proxies, and clinicians aiming to honor the wishes and values of decisionally incapacitated patients.
Assuntos
Procurador , Humanos , Diretivas Antecipadas , Tomada de Decisões , Tomada de Decisão Clínica/métodos , Estudo de Prova de Conceito , Inquéritos e Questionários , Idioma , Cuidados Críticos/métodosRESUMO
The degree to which artificial intelligence healthcare research is informed by data and stakeholders from community settings has not been previously described. As communities are the principal location of healthcare delivery, engaging them could represent an important opportunity to improve scientific quality. This scoping review systematically maps what is known and unknown about community-engaged artificial intelligence research and identifies opportunities to optimize the generalizability of these applications through involvement of community stakeholders and data throughout model development, validation, and implementation. Embase, PubMed, and MEDLINE databases were searched for articles describing artificial intelligence or machine learning healthcare applications with community involvement in model development, validation, or implementation. Model architecture and performance, the nature of community engagement, and barriers or facilitators to community engagement were reported according to PRISMA extension for Scoping Reviews guidelines. Of approximately 10,880 articles describing artificial intelligence healthcare applications, 21 (0.2%) described community involvement. All articles derived data from community settings, most commonly by leveraging existing datasets and sources that included community subjects, and often bolstered by internet-based data acquisition and subject recruitment. Only one article described inclusion of community stakeholders in designing an application-a natural language processing model that detected cases of likely child abuse with 90% accuracy using harmonized electronic health record notes from both hospital and community practice settings. The primary barrier to including community-derived data was small sample sizes, which may have affected 11 of the 21 studies (53%), introducing substantial risk for overfitting that threatens generalizability. Community engagement in artificial intelligence healthcare application development, validation, or implementation is rare. As healthcare delivery occurs primarily in community settings, investigators should consider engaging community stakeholders in user-centered design, usability, and clinical implementation studies to optimize generalizability.
RESUMO
Objective: To determine whether certain patients are vulnerable to errant triage decisions immediately after major surgery and whether there are unique sociodemographic phenotypes within overtriaged and undertriaged cohorts. Background: In a fair system, overtriage of low-acuity patients to intensive care units (ICUs) and undertriage of high-acuity patients to general wards would affect all sociodemographic subgroups equally. Methods: This multicenter, longitudinal cohort study of hospital admissions immediately after major surgery compared hospital mortality and value of care (risk-adjusted mortality/total costs) across 4 cohorts: overtriage (N = 660), risk-matched overtriage controls admitted to general wards (N = 3077), undertriage (N = 2335), and risk-matched undertriage controls admitted to ICUs (N = 4774). K-means clustering identified sociodemographic phenotypes within overtriage and undertriage cohorts. Results: Compared with controls, overtriaged admissions had a predominance of male patients (56.2% vs 43.1%, P < 0.001) and commercial insurance (6.4% vs 2.5%, P < 0.001); undertriaged admissions had a predominance of Black patients (28.4% vs 24.4%, P < 0.001) and greater socioeconomic deprivation. Overtriage was associated with increased total direct costs [$16.2K ($11.4K-$23.5K) vs $14.1K ($9.1K-$20.7K), P < 0.001] and low value of care; undertriage was associated with increased hospital mortality (1.5% vs 0.7%, P = 0.002) and hospice care (2.2% vs 0.6%, P < 0.001) and low value of care. Unique sociodemographic phenotypes within both overtriage and undertriage cohorts had similar outcomes and value of care, suggesting that triage decisions, rather than patient characteristics, drive outcomes and value of care. Conclusions: Postoperative triage decisions should ensure equality across sociodemographic groups by anchoring triage decisions to objective patient acuity assessments, circumventing cognitive shortcuts and mitigating bias.
RESUMO
Diabetic nephropathy (DN) in the context of type 2 diabetes is the leading cause of end-stage renal disease (ESRD) in the United States. DN is graded based on glomerular morphology and has a spatially heterogeneous presentation in kidney biopsies that complicates pathologists' predictions of disease progression. Artificial intelligence and deep learning methods for pathology have shown promise for quantitative pathological evaluation and clinical trajectory estimation; but, they often fail to capture large-scale spatial anatomy and relationships found in whole slide images (WSIs). In this study, we present a transformer-based, multi-stage ESRD prediction framework built upon nonlinear dimensionality reduction, relative Euclidean pixel distance embeddings between every pair of observable glomeruli, and a corresponding spatial self-attention mechanism for a robust contextual representation. We developed a deep transformer network for encoding WSI and predicting future ESRD using a dataset of 56 kidney biopsy WSIs from DN patients at Seoul National University Hospital. Using a leave-one-out cross-validation scheme, our modified transformer framework outperformed RNNs, XGBoost, and logistic regression baseline models, and resulted in an area under the receiver operating characteristic curve (AUC) of 0.97 (95% CI: 0.90-1.00) for predicting two-year ESRD, compared with an AUC of 0.86 (95% CI: 0.66-0.99) without our relative distance embedding, and an AUC of 0.76 (95% CI: 0.59-0.92) without a denoising autoencoder module. While the variability and generalizability induced by smaller sample sizes are challenging, our distance-based embedding approach and overfitting mitigation techniques yielded results that suggest opportunities for future spatially aware WSI research using limited pathology datasets.
RESUMO
Accurate prediction of postoperative complications can inform shared decisions regarding prognosis, preoperative risk-reduction, and postoperative resource use. We hypothesized that multi-task deep learning models would outperform conventional machine learning models in predicting postoperative complications, and that integrating high-resolution intraoperative physiological time series would result in more granular and personalized health representations that would improve prognostication compared to preoperative predictions. In a longitudinal cohort study of 56,242 patients undergoing 67,481 inpatient surgical procedures at a university medical center, we compared deep learning models with random forests and XGBoost for predicting nine common postoperative complications using preoperative, intraoperative, and perioperative patient data. Our study indicated several significant results across experimental settings that suggest the utility of deep learning for capturing more precise representations of patient health for augmented surgical decision support. Multi-task learning improved efficiency by reducing computational resources without compromising predictive performance. Integrated gradients interpretability mechanisms identified potentially modifiable risk factors for each complication. Monte Carlo dropout methods provided a quantitative measure of prediction uncertainty that has the potential to enhance clinical trust. Multi-task learning, interpretability mechanisms, and uncertainty metrics demonstrated potential to facilitate effective clinical implementation.
Assuntos
Redes Neurais de Computação , Complicações Pós-Operatórias , Humanos , Estudos Longitudinais , Incerteza , Complicações Pós-Operatórias/etiologia , Aprendizado de MáquinaRESUMO
In the United States, more than 5 million patients are admitted annually to ICUs, with ICU mortality of 10%-29% and costs over $82 billion. Acute brain dysfunction status, delirium, is often underdiagnosed or undervalued. This study's objective was to develop automated computable phenotypes for acute brain dysfunction states and describe transitions among brain dysfunction states to illustrate the clinical trajectories of ICU patients. We created two single-center, longitudinal EHR datasets for 48,817 adult patients admitted to an ICU at UFH Gainesville (GNV) and Jacksonville (JAX). We developed algorithms to quantify acute brain dysfunction status including coma, delirium, normal, or death at 12-hour intervals of each ICU admission and to identify acute brain dysfunction phenotypes using continuous acute brain dysfunction status and k-means clustering approach. There were 49,770 admissions for 37,835 patients in UFH GNV dataset and 18,472 admissions for 10,982 patients in UFH JAX dataset. In total, 18% of patients had coma as the worst brain dysfunction status; every 12 hours, around 4%-7% would transit to delirium, 22%-25% would recover, 3%-4% would expire, and 67%-68% would remain in a coma in the ICU. Additionally, 7% of patients had delirium as the worst brain dysfunction status; around 6%-7% would transit to coma, 40%-42% would be no delirium, 1% would expire, and 51%-52% would remain delirium in the ICU. There were three phenotypes: persistent coma/delirium, persistently normal, and transition from coma/delirium to normal almost exclusively in first 48 hours after ICU admission. We developed phenotyping scoring algorithms that determined acute brain dysfunction status every 12 hours while admitted to the ICU. This approach may be useful in developing prognostic and decision-support tools to aid patients and clinicians in decision-making on resource use and escalation of care.
RESUMO
Diabetic nephropathy (DN) in the context of type 2 diabetes is the leading cause of end-stage renal disease (ESRD) in the United States. DN is graded based on glomerular morphology and has a spatially heterogeneous presentation in kidney biopsies that complicates pathologists' predictions of disease progression. Artificial intelligence and deep learning methods for pathology have shown promise for quantitative pathological evaluation and clinical trajectory estimation; but, they often fail to capture large-scale spatial anatomy and relationships found in whole slide images (WSIs). In this study, we present a transformer-based, multi-stage ESRD prediction framework built upon nonlinear dimensionality reduction, relative Euclidean pixel distance embeddings between every pair of observable glomeruli, and a corresponding spatial self-attention mechanism for a robust contextual representation. We developed a deep transformer network for encoding WSI and predicting future ESRD using a dataset of 56 kidney biopsy WSIs from DN patients at Seoul National University Hospital. Using a leave-one-out cross-validation scheme, our modified transformer framework outperformed RNNs, XGBoost, and logistic regression baseline models, and resulted in an area under the receiver operating characteristic curve (AUC) of 0.97 (95% CI: 0.90-1.00) for predicting two-year ESRD, compared with an AUC of 0.86 (95% CI: 0.66-0.99) without our relative distance embedding, and an AUC of 0.76 (95% CI: 0.59-0.92) without a denoising autoencoder module. While the variability and generalizability induced by smaller sample sizes are challenging, our distance-based embedding approach and overfitting mitigation techniques yielded results that sugest opportunities for future spatially aware WSI research using limited pathology datasets.
RESUMO
Objectives: This study tests the null hypotheses that overall sentiment and gendered words in verbal feedback and resident operative autonomy relative to performance are similar for female and male residents. Background: Female and male surgical residents may experience training differently, affecting the quality of learning and graduated autonomy. Methods: A longitudinal, observational study using a Society for Improving Medical Professional Learning collaborative dataset describing resident and attending evaluations of resident operative performance and autonomy and recordings of verbal feedback from attendings from surgical procedures performed at 54 US general surgery residency training programs from 2016 to 2021. Overall sentiment, adjectives, and gendered words in verbal feedback were quantified by natural language processing. Resident operative autonomy and performance, as evaluated by attendings, were reported on 5-point ordinal scales. Performance-adjusted autonomy was calculated as autonomy minus performance. Results: The final dataset included objective assessments and dictated feedback for 2683 surgical procedures. Sentiment scores were higher for female residents (95 [interquartile range (IQR), 4-100] vs 86 [IQR 2-100]; P < 0.001). Gendered words were present in a greater proportion of dictations for female residents (29% vs 25%; P = 0.04) due to male attendings disproportionately using male-associated words in feedback for female residents (28% vs 23%; P = 0.01). Overall, attendings reported that male residents received greater performance-adjusted autonomy compared with female residents (P < 0.001). Conclusions: Sentiment and gendered words in verbal feedback and performance-adjusted operative autonomy differed for female and male general surgery residents. These findings suggest a need to ensure that trainees are given appropriate and equitable operative autonomy and feedback.
RESUMO
BACKGROUND: Acute kidney injury is a common postoperative complication affecting between 10% and 30% of surgical patients. Acute kidney injury is associated with increased resource usage and chronic kidney disease development, with more severe acute kidney injury suggesting more aggressive deterioration in clinical outcomes and mortality. METHODS: We considered 42,906 surgical patients admitted to University of Florida Health (n = 51,806) between 2014 and 2021. Acute kidney injury stages were determined using the Kidney Disease Improving Global Outcomes serum creatinine criteria. We developed a recurrent neural network-based model to continuously predict acute kidney injury risk and state in the following 24 hours and compared it with logistic regression, random forest, and multi-layer perceptron models. We used medications, laboratory and vital measurements, and derived features from past one-year records as inputs. We analyzed the proposed model with integrated gradients for enhanced explainability. RESULTS: Postoperative acute kidney injury at any stage developed in 20% (10,664) of the cohort. The recurrent neural network model was more accurate in predicting nearly all categories of next-day acute kidney injury stages (including the no acute kidney injury group). The area under the receiver operating curve and 95% confidence intervals for recurrent neural network and logistic regression models were for no acute kidney injury (0.98 [0.98-0.98] vs 0.93 [0.93-0.93]), stage 1 (0.95 [0.95-0.95] vs. 0.81 [0.80-0.82]), stage 2/3 (0.99 [0.99-0.99] vs 0.96 [0.96-0.97]), and stage 3 with renal replacement therapy (1.0 [1.0-1.0] vs 1.0 [1.0-1.0]. CONCLUSION: The proposed model demonstrates that temporal processing of patient information can lead to more granular and dynamic modeling of acute kidney injury status and result in more continuous and accurate acute kidney injury prediction. We showcase the integrated gradients framework's utility as a mechanism for enhancing model explainability, potentially facilitating clinical trust for future implementation.
Assuntos
Injúria Renal Aguda , Aprendizado Profundo , Humanos , Injúria Renal Aguda/diagnóstico , Injúria Renal Aguda/epidemiologia , Injúria Renal Aguda/etiologia , Modelos Logísticos , Previsões , RimRESUMO
Delirium is a syndrome of acute brain failure which is prevalent amongst older adults in the Intensive Care Unit (ICU). Incidence of delirium can significantly worsen prognosis and increase mortality, therefore necessitating its rapid and continual assessment in the ICU. Currently, the common approach for delirium assessment is manual and sporadic. Hence, there exists a critical need for a robust and automated system for predicting delirium in the ICU. In this work, we develop a machine learning (ML) system for real-time prediction of delirium using Electronic Health Record (EHR) data. Unlike prior approaches which provide one delirium prediction label per entire ICU stay, our approach provides predictions every 12 hours. We use the latest 12 hours of ICU data, along with patient demographic and medical history data, to predict delirium risk in the next 12-hour window. This enables delirium risk prediction as soon as 12 hours after ICU admission. We train and test four ML classification algorithms on longitudinal EHR data pertaining to 16,327 ICU stays of 13,395 patients covering a total of 56,297 12-hour windows in the ICU to predict the dynamic incidence of delirium. The best performing algorithm was Categorical Boosting which achieved an area under receiver operating characteristic curve (AUROC) of 0.87 (95% Confidence Interval; C.I, 0.86-0.87). The deployment of this ML system in ICUs can enable early identification of delirium, thereby reducing its deleterious impact on long-term adverse outcomes, such as ICU cost, length of stay and mortality.