Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Ann Surg Oncol ; 31(1): 488-498, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37782415

RESUMEN

BACKGROUND: While lower socioeconomic status has been shown to correlate with worse outcomes in cancer care, data correlating neighborhood-level metrics with outcomes are scarce. We aim to explore the association between neighborhood disadvantage and both short- and long-term postoperative outcomes in patients undergoing pancreatectomy for pancreatic ductal adenocarcinoma (PDAC). PATIENTS AND METHODS: We retrospectively analyzed 243 patients who underwent resection for PDAC at a single institution between 1 January 2010 and 15 September 2021. To measure neighborhood disadvantage, the cohort was divided into tertiles by Area Deprivation Index (ADI). Short-term outcomes of interest were minor complications, major complications, unplanned readmission within 30 days, prolonged hospitalization, and delayed gastric emptying (DGE). The long-term outcome of interest was overall survival. Logistic regression was used to test short-term outcomes; Cox proportional hazards models and Kaplan-Meier method were used for long-term outcomes. RESULTS: The median ADI of the cohort was 49 (IQR 32-64.5). On adjusted analysis, the high-ADI group demonstrated greater odds of suffering a major complication (odds ratio [OR], 2.78; 95% confidence interval [CI], 1.26-6.40; p = 0.01) and of an unplanned readmission (OR, 3.09; 95% CI, 1.16-9.28; p = 0.03) compared with the low-ADI group. There were no significant differences between groups in the odds of minor complications, prolonged hospitalization, or DGE (all p > 0.05). High ADI did not confer an increased hazard of death (p = 0.63). CONCLUSIONS: We found that worse neighborhood disadvantage is associated with a higher risk of major complication and unplanned readmission after pancreatectomy for PDAC.


Asunto(s)
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Pancreatectomía/efectos adversos , Pancreatectomía/métodos , Estudios Retrospectivos , Neoplasias Pancreáticas/patología , Carcinoma Ductal Pancreático/patología , Características del Vecindario
2.
J Biomed Inform ; 157: 104707, 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39142598

RESUMEN

OBJECTIVE: Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models' internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs. METHODS: We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation. RESULTS: In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments. CONCLUSION: We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.

3.
BMC Emerg Med ; 24(1): 110, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38982351

RESUMEN

BACKGROUND: Substance misuse poses a significant public health challenge, characterized by premature morbidity and mortality, and heightened healthcare utilization. While studies have demonstrated that previous hospitalizations and emergency department visits are associated with increased mortality in patients with substance misuse, it is unknown whether prior utilization of emergency medical service (EMS) is similarly associated with poor outcomes among this population. The objective of this study is to determine the association between EMS utilization in the 30 days before a hospitalization or emergency department visit and in-hospital outcomes among patients with substance misuse. METHODS: We conducted a retrospective analysis of adult emergency department visits and hospitalizations (referred to as a hospital encounter) between 2017 and 2021 within the Substance Misuse Data Commons, which maintains electronic health records from substance misuse patients seen at two University of Wisconsin hospitals, linked with state agency, claims, and socioeconomic datasets. Using regression models, we examined the association between EMS use and the outcomes of in-hospital death, hospital length of stay, intensive care unit (ICU) admission, and critical illness events, defined by invasive mechanical ventilation or vasoactive drug administration. Models were adjusted for age, comorbidities, initial severity of illness, substance misuse type, and socioeconomic status. RESULTS: Among 19,402 encounters, individuals with substance misuse who had at least one EMS incident within 30 days of a hospital encounter experienced a higher likelihood of in-hospital mortality (OR 1.52, 95% CI [1.05 - 2.14]) compared to those without prior EMS use, after adjusting for confounders. Using EMS in the 30 days prior to an encounter was associated with a small increase in hospital length of stay but was not associated with ICU admission or critical illness events. CONCLUSIONS: Individuals with substance misuse who have used EMS in the month preceding a hospital encounter are at an increased risk of in-hospital mortality. Enhanced monitoring of EMS users in this population could improve overall patient outcomes.


Asunto(s)
Servicios Médicos de Urgencia , Mortalidad Hospitalaria , Trastornos Relacionados con Sustancias , Humanos , Estudios Retrospectivos , Masculino , Femenino , Persona de Mediana Edad , Adulto , Factores de Riesgo , Servicios Médicos de Urgencia/estadística & datos numéricos , Wisconsin/epidemiología , Tiempo de Internación/estadística & datos numéricos , Anciano
4.
medRxiv ; 2024 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-38585973

RESUMEN

Objective: The application of Natural Language Processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models (LMs) capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent. Materials and Methods: To address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to instruction-tuned large language models. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations. Results and Discussion: Baseline models showed 28.9% for best-performing supervised models and 32.2% for GPT-4 in F1-metrics. Notes in our dataset have a median word count of 1687. Our analysis of the model outputs showed that our dataset is challenging for both models and human experts, but the models can find meaningful signals from the text. Conclusion: We expect our LCD benchmark to be a resource for the development of advanced supervised models, or prompting methods, tailored for clinical text. The benchmark dataset is available at https://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc.

5.
JAMIA Open ; 7(2): ooae039, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38779571

RESUMEN

Objectives: Numerous studies have identified information overload as a key issue for electronic health records (EHRs). This study describes the amount of text data across all notes available to emergency physicians in the EHR, trended over the time since EHR establishment. Materials and Methods: We conducted a retrospective analysis of EHR data from a large healthcare system, examining the number of notes and a corresponding number of total words and total tokens across all notes available to physicians during patient encounters in the emergency department (ED). We assessed the change in these metrics over a 17-year period between 2006 and 2023. Results: The study cohort included 730 968 ED visits made by 293 559 unique patients and a total note count of 132 574 964. The median note count for all encounters in 2006 was 5 (IQR 1-16), accounting for 1735 (IQR 447-5521) words. By the last full year of the study period, 2022, the median number of notes had grown to 359 (IQR 84-943), representing 359 (IQR 84-943) words. Note and word counts were higher for admitted patients. Discussion: The volume of notes available for review by providers has increased by over 30-fold in the 17 years since the implementation of the EHR at a large health system. The task of reviewing these notes has become commensurately more difficult. These data point to the critical need for new strategies and tools for filtering, synthesizing, and summarizing information to achieve the promise of the medical record.

6.
medRxiv ; 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38562730

RESUMEN

In the evolving landscape of clinical Natural Language Generation (NLG), assessing abstractive text quality remains challenging, as existing methods often overlook generative task complexities. This work aimed to examine the current state of automated evaluation metrics in NLG in healthcare. To have a robust and well-validated baseline with which to examine the alignment of these metrics, we created a comprehensive human evaluation framework. Employing ChatGPT-3.5-turbo generative output, we correlated human judgments with each metric. None of the metrics demonstrated high alignment; however, the SapBERT score-a Unified Medical Language System (UMLS)- showed the best results. This underscores the importance of incorporating domain-specific knowledge into evaluation efforts. Our work reveals the deficiency in quality evaluations for generated text and introduces our comprehensive human evaluation framework as a baseline. Future efforts should prioritize integrating medical knowledge databases to enhance the alignment of automated metrics, particularly focusing on refining the SapBERT score for improved assessments.

7.
J Addict Med ; 2024 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-38776423

RESUMEN

OBJECTIVE: A trial comparing extended-release naltrexone and sublingual buprenorphine-naloxone demonstrated higher relapse rates in individuals randomized to extended-release naltrexone. The effectiveness of treatment might vary based on patient characteristics. We hypothesized that causal machine learning would identify individualized treatment effects for each medication. METHODS: This is a secondary analysis of a multicenter randomized trial that compared the effectiveness of extended-release naltrexone versus buprenorphine-naloxone for preventing relapse of opioid misuse. Three machine learning models were derived using all trial participants with 50% randomly selected for training (n = 285) and the remaining 50% for validation. Individualized treatment effect was measured by the Qini value and c-for-benefit, with the absence of relapse denoting treatment success. Patients were grouped into quartiles by predicted individualized treatment effect to examine differences in characteristics and the observed treatment effects. RESULTS: The best-performing model had a Qini value of 4.45 (95% confidence interval, 1.02-7.83) and a c-for-benefit of 0.63 (95% confidence interval, 0.53-0.68). The quartile most likely to benefit from buprenorphine-naloxone had a 35% absolute benefit from this treatment, and at study entry, they had a high median opioid withdrawal score (P < 0.001), used cocaine on more days over the prior 30 days than other quartiles (P < 0.001), and had highest proportions with alcohol and cocaine use disorder (P ≤ 0.02). Quartile 4 individuals were predicted to be most likely to benefit from extended-release naltrexone, with the greatest proportion having heroin drug preference (P = 0.02) and all experiencing homelessness (P < 0.001). CONCLUSIONS: Causal machine learning identified differing individualized treatment effects between medications based on characteristics associated with preventing relapse.

8.
J Am Med Inform Assoc ; 31(6): 1322-1330, 2024 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-38679906

RESUMEN

OBJECTIVES: To compare and externally validate popular deep learning model architectures and data transformation methods for variable-length time series data in 3 clinical tasks (clinical deterioration, severe acute kidney injury [AKI], and suspected infection). MATERIALS AND METHODS: This multicenter retrospective study included admissions at 2 medical centers that spanned 2007-2022. Distinct datasets were created for each clinical task, with 1 site used for training and the other for testing. Three feature engineering methods (normalization, standardization, and piece-wise linear encoding with decision trees [PLE-DTs]) and 3 architectures (long short-term memory/gated recurrent unit [LSTM/GRU], temporal convolutional network, and time-distributed wrapper with convolutional neural network [TDW-CNN]) were compared in each clinical task. Model discrimination was evaluated using the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC). RESULTS: The study comprised 373 825 admissions for training and 256 128 admissions for testing. LSTM/GRU models tied with TDW-CNN models with both obtaining the highest mean AUPRC in 2 tasks, and LSTM/GRU had the highest mean AUROC across all tasks (deterioration: 0.81, AKI: 0.92, infection: 0.87). PLE-DT with LSTM/GRU achieved the highest AUPRC in all tasks. DISCUSSION: When externally validated in 3 clinical tasks, the LSTM/GRU model architecture with PLE-DT transformed data demonstrated the highest AUPRC in all tasks. Multiple models achieved similar performance when evaluated using AUROC. CONCLUSION: The LSTM architecture performs as well or better than some newer architectures, and PLE-DT may enhance the AUPRC in variable-length time series data for predicting clinical outcomes during external validation.


Asunto(s)
Aprendizaje Profundo , Femenino , Humanos , Masculino , Persona de Mediana Edad , Lesión Renal Aguda , Conjuntos de Datos como Asunto , Redes Neurales de la Computación , Estudios Retrospectivos , Curva ROC
9.
J Am Med Inform Assoc ; 31(6): 1291-1302, 2024 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-38587875

RESUMEN

OBJECTIVE: The timely stratification of trauma injury severity can enhance the quality of trauma care but it requires intense manual annotation from certified trauma coders. The objective of this study is to develop machine learning models for the stratification of trauma injury severity across various body regions using clinical text and structured electronic health records (EHRs) data. MATERIALS AND METHODS: Our study utilized clinical documents and structured EHR variables linked with the trauma registry data to create 2 machine learning models with different approaches to representing text. The first one fuses concept unique identifiers (CUIs) extracted from free text with structured EHR variables, while the second one integrates free text with structured EHR variables. Temporal validation was undertaken to ensure the models' temporal generalizability. Additionally, analyses to assess the variable importance were conducted. RESULTS: Both models demonstrated impressive performance in categorizing leg injuries, achieving high accuracy with macro-F1 scores of over 0.8. Additionally, they showed considerable accuracy, with macro-F1 scores exceeding or near 0.7, in assessing injuries in the areas of the chest and head. We showed in our variable importance analysis that the most important features in the model have strong face validity in determining clinically relevant trauma injuries. DISCUSSION: The CUI-based model achieves comparable performance, if not higher, compared to the free-text-based model, with reduced complexity. Furthermore, integrating structured EHR data improves performance, particularly when the text modalities are insufficiently indicative. CONCLUSIONS: Our multi-modal, multiclass models can provide accurate stratification of trauma injury severity and clinically relevant interpretations.


Asunto(s)
Registros Electrónicos de Salud , Aprendizaje Automático , Heridas y Lesiones , Humanos , Heridas y Lesiones/clasificación , Puntaje de Gravedad del Traumatismo , Sistema de Registros , Índices de Gravedad del Trauma , Procesamiento de Lenguaje Natural
10.
medRxiv ; 2024 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-38562803

RESUMEN

Rationale: Early detection of clinical deterioration using early warning scores may improve outcomes. However, most implemented scores were developed using logistic regression, only underwent retrospective internal validation, and were not tested in important patient subgroups. Objectives: To develop a gradient boosted machine model (eCARTv5) for identifying clinical deterioration and then validate externally, test prospectively, and evaluate across patient subgroups. Methods: All adult patients hospitalized on the wards in seven hospitals from 2008- 2022 were used to develop eCARTv5, with demographics, vital signs, clinician documentation, and laboratory values utilized to predict intensive care unit transfer or death in the next 24 hours. The model was externally validated retrospectively in 21 hospitals from 2009-2023 and prospectively in 10 hospitals from February to May 2023. eCARTv5 was compared to the Modified Early Warning Score (MEWS) and the National Early Warning Score (NEWS) using the area under the receiver operating characteristic curve (AUROC). Measurements and Main Results: The development cohort included 901,491 admissions, the retrospective validation cohort included 1,769,461 admissions, and the prospective validation cohort included 46,330 admissions. In retrospective validation, eCART had the highest AUROC (0.835; 95%CI 0.834, 0.835), followed by NEWS (0.766 (95%CI 0.766, 0.767)), and MEWS (0.704 (95%CI 0.703, 0.704)). eCART's performance remained high (AUROC ≥0.80) across a range of patient demographics, clinical conditions, and during prospective validation. Conclusions: We developed eCARTv5, which accurately identifies early clinical deterioration in hospitalized ward patients. Our model performed better than the NEWS and MEWS retrospectively, prospectively, and across a range of subgroups.

11.
Crit Care Explor ; 6(3): e1066, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38505174

RESUMEN

OBJECTIVES: Alcohol withdrawal syndrome (AWS) may progress to require high-intensity care. Approaches to identify hospitalized patients with AWS who received higher level of care have not been previously examined. This study aimed to examine the utility of Clinical Institute Withdrawal Assessment Alcohol Revised (CIWA-Ar) for alcohol scale scores and medication doses for alcohol withdrawal management in identifying patients who received high-intensity care. DESIGN: A multicenter observational cohort study of hospitalized adults with alcohol withdrawal. SETTING: University of Chicago Medical Center and University of Wisconsin Hospital. PATIENTS: Inpatient encounters between November 2008 and February 2022 with a CIWA-Ar score greater than 0 and benzodiazepine or barbiturate administered within the first 24 hours. The primary composite outcome was patients who progressed to high-intensity care (intermediate care or ICU). INTERVENTIONS: None. MAIN RESULTS: Among the 8742 patients included in the study, 37.5% (n = 3280) progressed to high-intensity care. The odds ratio for the composite outcome increased above 1.0 when the CIWA-Ar score was 24. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) at this threshold were 0.12 (95% CI, 0.11-0.13), 0.95 (95% CI, 0.94-0.95), 0.58 (95% CI, 0.54-0.61), and 0.64 (95% CI, 0.63-0.65), respectively. The OR increased above 1.0 at a 24-hour lorazepam milligram equivalent dose cutoff of 15 mg. The sensitivity, specificity, PPV, and NPV at this threshold were 0.16 (95% CI, 0.14-0.17), 0.96 (95% CI, 0.95-0.96), 0.68 (95% CI, 0.65-0.72), and 0.65 (95% CI, 0.64-0.66), respectively. CONCLUSIONS: Neither CIWA-Ar scores nor medication dose cutoff points were effective measures for identifying patients with alcohol withdrawal who received high-intensity care. Research studies for examining outcomes in patients who deteriorate with AWS will require better methods for cohort identification.

12.
JAMIA Open ; 7(3): ooae080, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39166170

RESUMEN

Background: Large language models (LLMs) can assist providers in drafting responses to patient inquiries. We examined a prompt engineering strategy to draft responses for providers in the electronic health record. The aim was to evaluate the change in usability after prompt engineering. Materials and Methods: A pre-post study over 8 months was conducted across 27 providers. The primary outcome was the provider use of LLM-generated messages from Generative Pre-Trained Transformer 4 (GPT-4) in a mixed-effects model, and the secondary outcome was provider sentiment analysis. Results: Of the 7605 messages generated, 17.5% (n = 1327) were used. There was a reduction in negative sentiment with an odds ratio of 0.43 (95% CI, 0.36-0.52), but message use decreased (P < .01). The addition of nurses after the study period led to an increase in message use to 35.8% (P < .01). Discussion: The improvement in sentiment with prompt engineering suggests better content quality, but the initial decrease in usage highlights the need for integration with human factors design. Conclusion: Future studies should explore strategies for optimizing the integration of LLMs into the provider workflow to maximize both usability and effectiveness.

13.
medRxiv ; 2024 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-38370788

RESUMEN

OBJECTIVE: Timely intervention for clinically deteriorating ward patients requires that care teams accurately diagnose and treat their underlying medical conditions. However, the most common diagnoses leading to deterioration and the relevant therapies provided are poorly characterized. Therefore, we aimed to determine the diagnoses responsible for clinical deterioration, the relevant diagnostic tests ordered, and the treatments administered among high-risk ward patients using manual chart review. DESIGN: Multicenter retrospective observational study. SETTING: Inpatient medical-surgical wards at four health systems from 2006-2020 PATIENTS: Randomly selected patients (1,000 from each health system) with clinical deterioration, defined by reaching the 95th percentile of a validated early warning score, electronic Cardiac Arrest Risk Triage (eCART), were included. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Clinical deterioration was confirmed by a trained reviewer or marked as a false alarm if no deterioration occurred for each patient. For true deterioration events, the condition causing deterioration, relevant diagnostic tests ordered, and treatments provided were collected. Of the 4,000 included patients, 2,484 (62%) had clinical deterioration confirmed by chart review. Sepsis was the most common cause of deterioration (41%; n=1,021), followed by arrhythmia (19%; n=473), while liver failure had the highest in-hospital mortality (41%). The most common diagnostic tests ordered were complete blood counts (47% of events), followed by chest x-rays (42%), and cultures (40%), while the most common medication orders were antimicrobials (46%), followed by fluid boluses (34%), and antiarrhythmics (19%). CONCLUSIONS: We found that sepsis was the most common cause of deterioration, while liver failure had the highest mortality. Complete blood counts and chest x-rays were the most common diagnostic tests ordered, and antimicrobials and fluid boluses were the most common medication interventions. These results provide important insights for clinical decision-making at the bedside, training of rapid response teams, and the development of institutional treatment pathways for clinical deterioration. KEY POINTS: Question: What are the most common diagnoses, diagnostic test orders, and treatments for ward patients experiencing clinical deterioration? Findings: In manual chart review of 2,484 encounters with deterioration across four health systems, we found that sepsis was the most common cause of clinical deterioration, followed by arrythmias, while liver failure had the highest mortality. Complete blood counts and chest x-rays were the most common diagnostic test orders, while antimicrobials and fluid boluses were the most common treatments. Meaning: Our results provide new insights into clinical deterioration events, which can inform institutional treatment pathways, rapid response team training, and patient care.

14.
J Clin Med ; 13(5)2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38592057

RESUMEN

(1) Background: SeptiCyte RAPID is a molecular test for discriminating sepsis from non-infectious systemic inflammation, and for estimating sepsis probabilities. The objective of this study was the clinical validation of SeptiCyte RAPID, based on testing retrospectively banked and prospectively collected patient samples. (2) Methods: The cartridge-based SeptiCyte RAPID test accepts a PAXgene blood RNA sample and provides sample-to-answer processing in ~1 h. The test output (SeptiScore, range 0-15) falls into four interpretation bands, with higher scores indicating higher probabilities of sepsis. Retrospective (N = 356) and prospective (N = 63) samples were tested from adult patients in ICU who either had the systemic inflammatory response syndrome (SIRS), or were suspected of having/diagnosed with sepsis. Patients were clinically evaluated by a panel of three expert physicians blinded to the SeptiCyte test results. Results were interpreted under either the Sepsis-2 or Sepsis-3 framework. (3) Results: Under the Sepsis-2 framework, SeptiCyte RAPID performance for the combined retrospective and prospective cohorts had Areas Under the ROC Curve (AUCs) ranging from 0.82 to 0.85, a negative predictive value of 0.91 (sensitivity 0.94) for SeptiScore Band 1 (score range 0.1-5.0; lowest risk of sepsis), and a positive predictive value of 0.81 (specificity 0.90) for SeptiScore Band 4 (score range 7.4-15; highest risk of sepsis). Performance estimates for the prospective cohort ranged from AUC 0.86-0.95. For physician-adjudicated sepsis cases that were blood culture (+) or blood, urine culture (+)(+), 43/48 (90%) of SeptiCyte scores fell in Bands 3 or 4. In multivariable analysis with up to 14 additional clinical variables, SeptiScore was the most important variable for sepsis diagnosis. A comparable performance was obtained for the majority of patients reanalyzed under the Sepsis-3 definition, although a subgroup of 16 patients was identified that was called septic under Sepsis-2 but not under Sepsis-3. (4) Conclusions: This study validates SeptiCyte RAPID for estimating sepsis probability, under both the Sepsis-2 and Sepsis-3 frameworks, for hospitalized patients on their first day of ICU admission.

15.
Front Pediatr ; 11: 1284672, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38188917

RESUMEN

Introduction: Critical deterioration in hospitalized children, defined as ward to pediatric intensive care unit (PICU) transfer followed by mechanical ventilation (MV) or vasoactive infusion (VI) within 12 h, has been used as a primary metric to evaluate the effectiveness of clinical interventions or quality improvement initiatives. We explore the association between critical events (CEs), i.e., MV or VI events, within the first 48 h of PICU transfer from the ward or emergency department (ED) and in-hospital mortality. Methods: We conducted a retrospective study of a cohort of PICU transfers from the ward or the ED at two tertiary-care academic hospitals. We determined the association between mortality and occurrence of CEs within 48 h of PICU transfer after adjusting for age, gender, hospital, and prior comorbidities. Results: Experiencing a CE within 48 h of PICU transfer was associated with an increased risk of mortality [OR 12.40 (95% CI: 8.12-19.23, P < 0.05)]. The increased risk of mortality was highest in the first 12 h [OR 11.32 (95% CI: 7.51-17.15, P < 0.05)] but persisted in the 12-48 h time interval [OR 2.84 (95% CI: 1.40-5.22, P < 0.05)]. Varying levels of risk were observed when considering ED or ward transfers only, when considering different age groups, and when considering individual 12-h time intervals. Discussion: We demonstrate that occurrence of a CE within 48 h of PICU transfer was associated with mortality after adjusting for confounders. Studies focusing on the impact of quality improvement efforts may benefit from using CEs within 48 h of PICU transfer as an additional evaluation metric, provided these events could have been influenced by the initiative.

16.
JAMIA Open ; 6(4): ooad109, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38144168

RESUMEN

Objectives: To develop and externally validate machine learning models using structured and unstructured electronic health record data to predict postoperative acute kidney injury (AKI) across inpatient settings. Materials and Methods: Data for adult postoperative admissions to the Loyola University Medical Center (2009-2017) were used for model development and admissions to the University of Wisconsin-Madison (2009-2020) were used for validation. Structured features included demographics, vital signs, laboratory results, and nurse-documented scores. Unstructured text from clinical notes were converted into concept unique identifiers (CUIs) using the clinical Text Analysis and Knowledge Extraction System. The primary outcome was the development of Kidney Disease Improvement Global Outcomes stage 2 AKI within 7 days after leaving the operating room. We derived unimodal extreme gradient boosting machines (XGBoost) and elastic net logistic regression (GLMNET) models using structured-only data and multimodal models combining structured data with CUI features. Model comparison was performed using the receiver operating characteristic curve (AUROC), with Delong's test for statistical differences. Results: The study cohort included 138 389 adult patient admissions (mean [SD] age 58 [16] years; 11 506 [8%] African-American; and 70 826 [51%] female) across the 2 sites. Of those, 2959 (2.1%) developed stage 2 AKI or higher. Across all data types, XGBoost outperformed GLMNET (mean AUROC 0.81 [95% confidence interval (CI), 0.80-0.82] vs 0.78 [95% CI, 0.77-0.79]). The multimodal XGBoost model incorporating CUIs parameterized as term frequency-inverse document frequency (TF-IDF) showed the highest discrimination performance (AUROC 0.82 [95% CI, 0.81-0.83]) over unimodal models (AUROC 0.79 [95% CI, 0.78-0.80]). Discussion: A multimodality approach with structured data and TF-IDF weighting of CUIs increased model performance over structured data-only models. Conclusion: These findings highlight the predictive power of CUIs when merged with structured data for clinical prediction models, which may improve the detection of postoperative AKI.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA