RESUMEN
BACKGROUND: Predicting hospitalization from nurse triage notes has the potential to augment care. However, there needs to be careful considerations for which models to choose for this goal. Specifically, health systems will have varying degrees of computational infrastructure available and budget constraints. OBJECTIVE: To this end, we compared the performance of the deep learning, Bidirectional Encoder Representations from Transformers (BERT)-based model, Bio-Clinical-BERT, with a bag-of-words (BOW) logistic regression (LR) model incorporating term frequency-inverse document frequency (TF-IDF). These choices represent different levels of computational requirements. METHODS: A retrospective analysis was conducted using data from 1,391,988 patients who visited emergency departments in the Mount Sinai Health System spanning from 2017 to 2022. The models were trained on 4 hospitals' data and externally validated on a fifth hospital's data. RESULTS: The Bio-Clinical-BERT model achieved higher areas under the receiver operating characteristic curve (0.82, 0.84, and 0.85) compared to the BOW-LR-TF-IDF model (0.81, 0.83, and 0.84) across training sets of 10,000; 100,000; and ~1,000,000 patients, respectively. Notably, both models proved effective at using triage notes for prediction, despite the modest performance gap. CONCLUSIONS: Our findings suggest that simpler machine learning models such as BOW-LR-TF-IDF could serve adequately in resource-limited settings. Given the potential implications for patient care and hospital resource management, further exploration of alternative models and techniques is warranted to enhance predictive performance in this critical domain. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2023.08.07.23293699.
RESUMEN
This study was designed to assess how different prompt engineering techniques, specifically direct prompts, Chain of Thought (CoT), and a modified CoT approach, influence the ability of GPT-3.5 to answer clinical and calculation-based medical questions, particularly those styled like the USMLE Step 1 exams. To achieve this, we analyzed the responses of GPT-3.5 to two distinct sets of questions: a batch of 1000 questions generated by GPT-4, and another set comprising 95 real USMLE Step 1 questions. These questions spanned a range of medical calculations and clinical scenarios across various fields and difficulty levels. Our analysis revealed that there were no significant differences in the accuracy of GPT-3.5's responses when using direct prompts, CoT, or modified CoT methods. For instance, in the USMLE sample, the success rates were 61.7% for direct prompts, 62.8% for CoT, and 57.4% for modified CoT, with a p-value of 0.734. Similar trends were observed in the responses to GPT-4 generated questions, both clinical and calculation-based, with p-values above 0.05 indicating no significant difference between the prompt types. The conclusion drawn from this study is that the use of CoT prompt engineering does not significantly alter GPT-3.5's effectiveness in handling medical calculations or clinical scenario questions styled like those in USMLE exams. This finding is crucial as it suggests that performance of ChatGPT remains consistent regardless of whether a CoT technique is used instead of direct prompts. This consistency could be instrumental in simplifying the integration of AI tools like ChatGPT into medical education, enabling healthcare professionals to utilize these tools with ease, without the necessity for complex prompt engineering.
Asunto(s)
Evaluación Educacional , Humanos , Evaluación Educacional/métodos , Licencia Médica , Competencia Clínica , Estados Unidos , Educación de Pregrado en Medicina/métodosRESUMEN
Malnutrition is a frequently underdiagnosed condition leading to increased morbidity, mortality, and healthcare costs. The Mount Sinai Health System (MSHS) deployed a machine learning model (MUST-Plus) to detect malnutrition upon hospital admission. However, in diverse patient groups, a poorly calibrated model may lead to misdiagnosis, exacerbating health care disparities. We explored the model's calibration across different variables and methods to improve calibration. Data from adult patients admitted to five MSHS hospitals from January 1, 2021 - December 31, 2022, were analyzed. We compared MUST-Plus prediction to the registered dietitian's formal assessment. Hierarchical calibration was assessed and compared between the recalibration sample (N = 49,562) of patients admitted between January 1, 2021 - December 31, 2022, and the hold-out sample (N = 17,278) of patients admitted between January 1, 2023 - September 30, 2023. Statistical differences in calibration metrics were tested using bootstrapping with replacement. Before recalibration, the overall model calibration intercept was -1.17 (95% CI: -1.20, -1.14), slope was 1.37 (95% CI: 1.34, 1.40), and Brier score was 0.26 (95% CI: 0.25, 0.26). Both weak and moderate measures of calibration were significantly different between White and Black patients and between male and female patients. Logistic recalibration significantly improved calibration of the model across race and gender in the hold-out sample. The original MUST-Plus model showed significant differences in calibration between White vs. Black patients. It also overestimated malnutrition in females compared to males. Logistic recalibration effectively reduced miscalibration across all patient subgroups. Continual monitoring and timely recalibration can improve model accuracy.
RESUMEN
The decision to extubate patients on invasive mechanical ventilation is critical; however, clinician performance in identifying patients to liberate from the ventilator is poor. Machine Learning-based predictors using tabular data have been developed; however, these fail to capture the wide spectrum of data available. Here, we develop and validate a deep learning-based model using routinely collected chest X-rays to predict the outcome of attempted extubation. We included 2288 serial patients admitted to the Medical ICU at an urban academic medical center, who underwent invasive mechanical ventilation, with at least one intubated CXR, and a documented extubation attempt. The last CXR before extubation for each patient was taken and split 79/21 for training/testing sets, then transfer learning with k-fold cross-validation was used on a pre-trained ResNet50 deep learning architecture. The top three models were ensembled to form a final classifier. The Grad-CAM technique was used to visualize image regions driving predictions. The model achieved an AUC of 0.66, AUPRC of 0.94, sensitivity of 0.62, and specificity of 0.60. The model performance was improved compared to the Rapid Shallow Breathing Index (AUC 0.61) and the only identified previous study in this domain (AUC 0.55), but significant room for improvement and experimentation remains.
RESUMEN
BACKGROUND: Artificial intelligence (AI) and large language models (LLMs) can play a critical role in emergency room operations by augmenting decision-making about patient admission. However, there are no studies for LLMs using real-world data and scenarios, in comparison to and being informed by traditional supervised machine learning (ML) models. We evaluated the performance of GPT-4 for predicting patient admissions from emergency department (ED) visits. We compared performance to traditional ML models both naively and when informed by few-shot examples and/or numerical probabilities. METHODS: We conducted a retrospective study using electronic health records across 7 NYC hospitals. We trained Bio-Clinical-BERT and XGBoost (XGB) models on unstructured and structured data, respectively, and created an ensemble model reflecting ML performance. We then assessed GPT-4 capabilities in many scenarios: through Zero-shot, Few-shot with and without retrieval-augmented generation (RAG), and with and without ML numerical probabilities. RESULTS: The Ensemble ML model achieved an area under the receiver operating characteristic curve (AUC) of 0.88, an area under the precision-recall curve (AUPRC) of 0.72 and an accuracy of 82.9%. The naïve GPT-4's performance (0.79 AUC, 0.48 AUPRC, and 77.5% accuracy) showed substantial improvement when given limited, relevant data to learn from (ie, RAG) and underlying ML probabilities (0.87 AUC, 0.71 AUPRC, and 83.1% accuracy). Interestingly, RAG alone boosted performance to near peak levels (0.82 AUC, 0.56 AUPRC, and 81.3% accuracy). CONCLUSIONS: The naïve LLM had limited performance but showed significant improvement in predicting ED admissions when supplemented with real-world examples to learn from, particularly through RAG, and/or numerical probabilities from traditional ML models. Its peak performance, although slightly lower than the pure ML model, is noteworthy given its potential for providing reasoning behind predictions. Further refinement of LLMs with real-world data is necessary for successful integration as decision-support tools in care settings.
Asunto(s)
Registros Electrónicos de Salud , Servicio de Urgencia en Hospital , Admisión del Paciente , Humanos , Estudios Retrospectivos , Inteligencia Artificial , Procesamiento de Lenguaje Natural , Aprendizaje Automático , Aprendizaje Automático SupervisadoRESUMEN
OBJECTIVES: Machine learning algorithms can outperform older methods in predicting clinical deterioration, but rigorous prospective data on their real-world efficacy are limited. We hypothesized that real-time machine learning generated alerts sent directly to front-line providers would reduce escalations. DESIGN: Single-center prospective pragmatic nonrandomized clustered clinical trial. SETTING: Academic tertiary care medical center. PATIENTS: Adult patients admitted to four medical-surgical units. Assignment to intervention or control arms was determined by initial unit admission. INTERVENTIONS: Real-time alerts stratified according to predicted likelihood of deterioration sent either to the primary team or directly to the rapid response team (RRT). Clinical care and interventions were at the providers' discretion. For the control units, alerts were generated but not sent, and standard RRT activation criteria were used. MEASUREMENTS AND MAIN RESULTS: The primary outcome was the rate of escalation per 1000 patient bed days. Secondary outcomes included the frequency of orders for fluids, medications, and diagnostic tests, and combined in-hospital and 30-day mortality. Propensity score modeling with stabilized inverse probability of treatment weight (IPTW) was used to account for differences between groups. Data from 2740 patients enrolled between July 2019 and March 2020 were analyzed (1488 intervention, 1252 control). Average age was 66.3 years and 1428 participants (52%) were female. The rate of escalation was 12.3 vs. 11.3 per 1000 patient bed days (difference, 1.0; 95% CI, -2.8 to 4.7) and IPTW adjusted incidence rate ratio 1.43 (95% CI, 1.16-1.78; p < 0.001). Patients in the intervention group were more likely to receive cardiovascular medication orders (16.1% vs. 11.3%; 4.7%; 95% CI, 2.1-7.4%) and IPTW adjusted relative risk (RR) (1.74; 95% CI, 1.39-2.18; p < 0.001). Combined in-hospital and 30-day-mortality was lower in the intervention group (7% vs. 9.3%; -2.4%; 95% CI, -4.5% to -0.2%) and IPTW adjusted RR (0.76; 95% CI, 0.58-0.99; p = 0.045). CONCLUSIONS: Real-time machine learning alerts do not reduce the rate of escalation but may reduce mortality.
Asunto(s)
Deterioro Clínico , Aprendizaje Automático , Humanos , Femenino , Masculino , Estudios Prospectivos , Persona de Mediana Edad , Anciano , Equipo Hospitalario de Respuesta Rápida/organización & administración , Equipo Hospitalario de Respuesta Rápida/estadística & datos numéricos , Mortalidad HospitalariaRESUMEN
BACKGROUND: Malnutrition is associated with increased morbidity, mortality, and healthcare costs. Early detection is important for timely intervention. This paper assesses the ability of a machine learning screening tool (MUST-Plus) implemented in registered dietitian (RD) workflow to identify malnourished patients early in the hospital stay and to improve the diagnosis and documentation rate of malnutrition. METHODS: This retrospective cohort study was conducted in a large, urban health system in New York City comprising six hospitals serving a diverse patient population. The study included all patients aged ≥ 18 years, who were not admitted for COVID-19 and had a length of stay of ≤ 30 days. RESULTS: Of the 7736 hospitalisations that met the inclusion criteria, 1947 (25.2%) were identified as being malnourished by MUST-Plus-assisted RD evaluations. The lag between admission and diagnosis improved with MUST-Plus implementation. The usability of the tool output by RDs exceeded 90%, showing good acceptance by users. When compared pre-/post-implementation, the rate of both diagnoses and documentation of malnutrition showed improvement. CONCLUSION: MUST-Plus, a machine learning-based screening tool, shows great promise as a malnutrition screening tool for hospitalised patients when used in conjunction with adequate RD staffing and training about the tool. It performed well across multiple measures and settings. Other health systems can use their electronic health record data to develop, test and implement similar machine learning-based processes to improve malnutrition screening and facilitate timely intervention.
Asunto(s)
Aprendizaje Automático , Desnutrición , Tamizaje Masivo , Evaluación Nutricional , Humanos , Estudios Retrospectivos , Desnutrición/diagnóstico , Persona de Mediana Edad , Masculino , Femenino , Ciudad de Nueva York , Anciano , Medición de Riesgo/métodos , Tamizaje Masivo/métodos , Adulto , Hospitalización , Anciano de 80 o más AñosRESUMEN
BACKGROUND: Early prediction of the need for invasive mechanical ventilation (IMV) in patients hospitalized with COVID-19 symptoms can help in the allocation of resources appropriately and improve patient outcomes by appropriately monitoring and treating patients at the greatest risk of respiratory failure. To help with the complexity of deciding whether a patient needs IMV, machine learning algorithms may help bring more prognostic value in a timely and systematic manner. Chest radiographs (CXRs) and electronic medical records (EMRs), typically obtained early in patients admitted with COVID-19, are the keys to deciding whether they need IMV. OBJECTIVE: We aimed to evaluate the use of a machine learning model to predict the need for intubation within 24 hours by using a combination of CXR and EMR data in an end-to-end automated pipeline. We included historical data from 2481 hospitalizations at The Mount Sinai Hospital in New York City. METHODS: CXRs were first resized, rescaled, and normalized. Then lungs were segmented from the CXRs by using a U-Net algorithm. After splitting them into a training and a test set, the training set images were augmented. The augmented images were used to train an image classifier to predict the probability of intubation with a prediction window of 24 hours by retraining a pretrained DenseNet model by using transfer learning, 10-fold cross-validation, and grid search. Then, in the final fusion model, we trained a random forest algorithm via 10-fold cross-validation by combining the probability score from the image classifier with 41 longitudinal variables in the EMR. Variables in the EMR included clinical and laboratory data routinely collected in the inpatient setting. The final fusion model gave a prediction likelihood for the need of intubation within 24 hours as well. RESULTS: At a prediction probability threshold of 0.5, the fusion model provided 78.9% (95% CI 59%-96%) sensitivity, 83% (95% CI 76%-89%) specificity, 0.509 (95% CI 0.34-0.67) F1-score, 0.874 (95% CI 0.80-0.94) area under the receiver operating characteristic curve (AUROC), and 0.497 (95% CI 0.32-0.65) area under the precision recall curve (AUPRC) on the holdout set. Compared to the image classifier alone, which had an AUROC of 0.577 (95% CI 0.44-0.73) and an AUPRC of 0.206 (95% CI 0.08-0.38), the fusion model showed significant improvement (P<.001). The most important predictor variables were respiratory rate, C-reactive protein, oxygen saturation, and lactate dehydrogenase. The imaging probability score ranked 15th in overall feature importance. CONCLUSIONS: We show that, when linked with EMR data, an automated deep learning image classifier improved performance in identifying hospitalized patients with severe COVID-19 at risk for intubation. With additional prospective and external validation, such a model may assist risk assessment and optimize clinical decision-making in choosing the best care plan during the critical stages of COVID-19.
RESUMEN
BACKGROUND AND AIM: We analyzed an inclusive gradient boosting model to predict hospital admission from the emergency department (ED) at different time points. We compared its results to multiple models built exclusively at each time point. METHODS: This retrospective multisite study utilized ED data from the Mount Sinai Health System, NY, during 2015-2019. Data included tabular clinical features and free-text triage notes represented using bag-of-words. A full gradient boosting model, trained on data available at different time points (30, 60, 90, 120, and 150 min), was compared to single models trained exclusively at data available at each time point. This was conducted by concatenating the rows of data available at each time point to one data matrix for the full model, where each row is considered a separate case. RESULTS: The cohort included 1,043,345 ED visits. The full model showed comparable results to the single models at all time points (AUCs 0.84-0.88 for different time points for both the full and single models). CONCLUSION: A full model trained on data concatenated from different time points showed similar results to single models trained at each time point. An ML-based prediction model can use used for identifying hospital admission.
RESUMEN
BACKGROUND AND OBJECTIVES: AKI treated with dialysis initiation is a common complication of coronavirus disease 2019 (COVID-19) among hospitalized patients. However, dialysis supplies and personnel are often limited. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: Using data from adult patients hospitalized with COVID-19 from five hospitals from the Mount Sinai Health System who were admitted between March 10 and December 26, 2020, we developed and validated several models (logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme GradientBoosting [XGBoost; with and without imputation]) for predicting treatment with dialysis or death at various time horizons (1, 3, 5, and 7 days) after hospital admission. Patients admitted to the Mount Sinai Hospital were used for internal validation, whereas the other hospitals formed part of the external validation cohort. Features included demographics, comorbidities, and laboratory and vital signs within 12 hours of hospital admission. RESULTS: A total of 6093 patients (2442 in training and 3651 in external validation) were included in the final cohort. Of the different modeling approaches used, XGBoost without imputation had the highest area under the receiver operating characteristic (AUROC) curve on internal validation (range of 0.93-0.98) and area under the precision-recall curve (AUPRC; range of 0.78-0.82) for all time points. XGBoost without imputation also had the highest test parameters on external validation (AUROC range of 0.85-0.87, and AUPRC range of 0.27-0.54) across all time windows. XGBoost without imputation outperformed all models with higher precision and recall (mean difference in AUROC of 0.04; mean difference in AUPRC of 0.15). Features of creatinine, BUN, and red cell distribution width were major drivers of the model's prediction. CONCLUSIONS: An XGBoost model without imputation for prediction of a composite outcome of either death or dialysis in patients positive for COVID-19 had the best performance, as compared with standard and other machine learning models. PODCAST: This article contains a podcast at https://www.asn-online.org/media/podcast/CJASN/2021_07_09_CJN17311120.mp3.
Asunto(s)
Lesión Renal Aguda/terapia , COVID-19/complicaciones , Aprendizaje Automático , Diálisis Renal , SARS-CoV-2 , Lesión Renal Aguda/mortalidad , COVID-19/mortalidad , Hospitalización , HumanosRESUMEN
Early admission to the neurosciences intensive care unit (NSICU) is associated with improved patient outcomes. Natural language processing offers new possibilities for mining free text in electronic health record data. We sought to develop a machine learning model using both tabular and free text data to identify patients requiring NSICU admission shortly after arrival to the emergency department (ED). We conducted a single-center, retrospective cohort study of adult patients at the Mount Sinai Hospital, an academic medical center in New York City. All patients presenting to our institutional ED between January 2014 and December 2018 were included. Structured (tabular) demographic, clinical, bed movement record data, and free text data from triage notes were extracted from our institutional data warehouse. A machine learning model was trained to predict likelihood of NSICU admission at 30 min from arrival to the ED. We identified 412,858 patients presenting to the ED over the study period, of whom 1900 (0.5%) were admitted to the NSICU. The daily median number of ED presentations was 231 (IQR 200-256) and the median time from ED presentation to the decision for NSICU admission was 169 min (IQR 80-324). A model trained only with text data had an area under the receiver-operating curve (AUC) of 0.90 (95% confidence interval (CI) 0.87-0.91). A structured data-only model had an AUC of 0.92 (95% CI 0.91-0.94). A combined model trained on structured and text data had an AUC of 0.93 (95% CI 0.92-0.95). At a false positive rate of 1:100 (99% specificity), the combined model was 58% sensitive for identifying NSICU admission. A machine learning model using structured and free text data can predict NSICU admission soon after ED arrival. This may potentially improve ED and NSICU resource allocation. Further studies should validate our findings.
Asunto(s)
Servicio de Urgencia en Hospital , Hospitalización , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Enfermedades del Sistema Nervioso/diagnóstico , Triaje , Adulto , Femenino , Humanos , Masculino , Neurociencias , Ciudad de Nueva York , Estudios RetrospectivosRESUMEN
BACKGROUND: Early reports indicate that AKI is common among patients with coronavirus disease 2019 (COVID-19) and associated with worse outcomes. However, AKI among hospitalized patients with COVID-19 in the United States is not well described. METHODS: This retrospective, observational study involved a review of data from electronic health records of patients aged ≥18 years with laboratory-confirmed COVID-19 admitted to the Mount Sinai Health System from February 27 to May 30, 2020. We describe the frequency of AKI and dialysis requirement, AKI recovery, and adjusted odds ratios (aORs) with mortality. RESULTS: Of 3993 hospitalized patients with COVID-19, AKI occurred in 1835 (46%) patients; 347 (19%) of the patients with AKI required dialysis. The proportions with stages 1, 2, or 3 AKI were 39%, 19%, and 42%, respectively. A total of 976 (24%) patients were admitted to intensive care, and 745 (76%) experienced AKI. Of the 435 patients with AKI and urine studies, 84% had proteinuria, 81% had hematuria, and 60% had leukocyturia. Independent predictors of severe AKI were CKD, men, and higher serum potassium at admission. In-hospital mortality was 50% among patients with AKI versus 8% among those without AKI (aOR, 9.2; 95% confidence interval, 7.5 to 11.3). Of survivors with AKI who were discharged, 35% had not recovered to baseline kidney function by the time of discharge. An additional 28 of 77 (36%) patients who had not recovered kidney function at discharge did so on posthospital follow-up. CONCLUSIONS: AKI is common among patients hospitalized with COVID-19 and is associated with high mortality. Of all patients with AKI, only 30% survived with recovery of kidney function by the time of discharge.
Asunto(s)
Lesión Renal Aguda/etiología , COVID-19/complicaciones , SARS-CoV-2 , Lesión Renal Aguda/epidemiología , Lesión Renal Aguda/terapia , Lesión Renal Aguda/orina , Anciano , Anciano de 80 o más Años , COVID-19/mortalidad , Femenino , Hematuria/etiología , Mortalidad Hospitalaria , Hospitales Privados/estadística & datos numéricos , Hospitales Urbanos/estadística & datos numéricos , Humanos , Incidencia , Pacientes Internos , Leucocitos , Masculino , Persona de Mediana Edad , Ciudad de Nueva York/epidemiología , Proteinuria/etiología , Diálisis Renal , Estudios Retrospectivos , Resultado del Tratamiento , Orina/citologíaRESUMEN
OBJECTIVE: Malnutrition among hospital patients, a frequent, yet under-diagnosed problem is associated with adverse impact on patient outcome and health care costs. Development of highly accurate malnutrition screening tools is, therefore, essential for its timely detection, for providing nutritional care, and for addressing the concerns related to the suboptimal predictive value of the conventional screening tools, such as the Malnutrition Universal Screening Tool (MUST). We aimed to develop a machine learning (ML) based classifier (MUST-Plus) for more accurate prediction of malnutrition. METHOD: A retrospective cohort with inpatient data consisting of anthropometric, lab biochemistry, clinical data, and demographics from adult (≥ 18 years) admissions at a large tertiary health care system between January 2017 and July 2018 was used. The registered dietitian (RD) nutritional assessments were used as the gold standard outcome label. The cohort was randomly split (70:30) into training and test sets. A random forest model was trained using 10-fold cross-validation on training set, and its predictive performance on test set was compared to MUST. RESULTS: In all, 13.3% of admissions were associated with malnutrition in the test cohort. MUST-Plus provided 73.07% (95% confidence interval [CI]: 69.61%-76.33%) sensitivity, 76.89% (95% CI: 75.64%-78.11%) specificity, and 83.5% (95% CI: 82.0%-85.0%) area under the receiver operating curve (AUC). Compared to classic MUST, MUST-Plus demonstrated 30% higher sensitivity, 6% higher specificity, and 17% increased AUC. CONCLUSIONS: ML-based MUST-Plus provided superior performance in identifying malnutrition compared to the classic MUST. The tool can be used for improving the operational efficiency of RDs by timely referrals of high-risk patients.
Asunto(s)
Desnutrición , Evaluación Nutricional , Adulto , Humanos , Aprendizaje Automático , Desnutrición/diagnóstico , Tamizaje Masivo , Estudios RetrospectivosRESUMEN
OBJECTIVE: The COVID-19 pandemic is a global public health crisis, with over 33 million cases and 999 000 deaths worldwide. Data are needed regarding the clinical course of hospitalised patients, particularly in the USA. We aimed to compare clinical characteristic of patients with COVID-19 who had in-hospital mortality with those who were discharged alive. DESIGN: Demographic, clinical and outcomes data for patients admitted to five Mount Sinai Health System hospitals with confirmed COVID-19 between 27 February and 2 April 2020 were identified through institutional electronic health records. We performed a retrospective comparative analysis of patients who had in-hospital mortality or were discharged alive. SETTING: All patients were admitted to the Mount Sinai Health System, a large quaternary care urban hospital system. PARTICIPANTS: Participants over the age of 18 years were included. PRIMARY OUTCOMES: We investigated in-hospital mortality during the study period. RESULTS: A total of 2199 patients with COVID-19 were hospitalised during the study period. As of 2 April, 1121 (51%) patients remained hospitalised, and 1078 (49%) completed their hospital course. Of the latter, the overall mortality was 29%, and 36% required intensive care. The median age was 65 years overall and 75 years in those who died. Pre-existing conditions were present in 65% of those who died and 46% of those discharged. In those who died, the admission median lymphocyte percentage was 11.7%, D-dimer was 2.4 µg/mL, C reactive protein was 162 mg/L and procalcitonin was 0.44 ng/mL. In those discharged, the admission median lymphocyte percentage was 16.6%, D-dimer was 0.93 µg/mL, C reactive protein was 79 mg/L and procalcitonin was 0.09 ng/mL. CONCLUSIONS: In our cohort of hospitalised patients, requirement of intensive care and mortality were high. Patients who died typically had more pre-existing conditions and greater perturbations in inflammatory markers as compared with those who were discharged.
Asunto(s)
COVID-19/sangre , Cuidados Críticos , Mortalidad Hospitalaria , Hospitalización , Pandemias , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Proteína C-Reactiva/metabolismo , COVID-19/epidemiología , COVID-19/mortalidad , Comorbilidad , Cuidados Críticos/estadística & datos numéricos , Femenino , Productos de Degradación de Fibrina-Fibrinógeno/metabolismo , Hospitales , Humanos , Linfocitos/metabolismo , Masculino , Persona de Mediana Edad , Ciudad de Nueva York/epidemiología , Polipéptido alfa Relacionado con Calcitonina/sangre , Estudios Retrospectivos , Factores de Riesgo , SARS-CoV-2 , Adulto JovenRESUMEN
BACKGROUND: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. OBJECTIVE: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. METHODS: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. RESULTS: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. CONCLUSIONS: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.
Asunto(s)
Infecciones por Coronavirus/diagnóstico , Infecciones por Coronavirus/mortalidad , Aprendizaje Automático/normas , Neumonía Viral/diagnóstico , Neumonía Viral/mortalidad , Lesión Renal Aguda/epidemiología , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Betacoronavirus , COVID-19 , Estudios de Cohortes , Registros Electrónicos de Salud , Femenino , Mortalidad Hospitalaria , Hospitalización/estadística & datos numéricos , Hospitales , Humanos , Masculino , Persona de Mediana Edad , Ciudad de Nueva York/epidemiología , Pandemias , Pronóstico , Curva ROC , Medición de Riesgo/métodos , Medición de Riesgo/normas , SARS-CoV-2 , Adulto JovenRESUMEN
OBJECTIVES: To develop and validate a model for prediction of near-term in-hospital mortality among patients with COVID-19 by application of a machine learning (ML) algorithm on time-series inpatient data from electronic health records. METHODS: A cohort comprised of 567 patients with COVID-19 at a large acute care healthcare system between 10 February 2020 and 7 April 2020 observed until either death or discharge. Random forest (RF) model was developed on randomly drawn 70% of the cohort (training set) and its performance was evaluated on the rest of 30% (the test set). The outcome variable was in-hospital mortality within 20-84 hours from the time of prediction. Input features included patients' vital signs, laboratory data and ECG results. RESULTS: Patients had a median age of 60.2 years (IQR 26.2 years); 54.1% were men. In-hospital mortality rate was 17.0% and overall median time to death was 6.5 days (range 1.3-23.0 days). In the test set, the RF classifier yielded a sensitivity of 87.8% (95% CI: 78.2% to 94.3%), specificity of 60.6% (95% CI: 55.2% to 65.8%), accuracy of 65.5% (95% CI: 60.7% to 70.0%), area under the receiver operating characteristic curve of 85.5% (95% CI: 80.8% to 90.2%) and area under the precision recall curve of 64.4% (95% CI: 53.5% to 75.3%). CONCLUSIONS: Our ML-based approach can be used to analyse electronic health record data and reliably predict near-term mortality prediction. Using such a model in hospitals could help improve care, thereby better aligning clinical decisions with prognosis in critically ill patients with COVID-19.
RESUMEN
BACKGROUND: Thromboembolic disease is common in coronavirus disease-2019 (COVID-19). There is limited evidence on the association of in-hospital anticoagulation (AC) with outcomes and postmortem findings. OBJECTIVES: The purpose of this study was to examine association of AC with in-hospital outcomes and describe thromboembolic findings on autopsies. METHODS: This retrospective analysis examined the association of AC with mortality, intubation, and major bleeding. Subanalyses were also conducted on the association of therapeutic versus prophylactic AC initiated ≤48 h from admission. Thromboembolic disease was contextualized by premortem AC among consecutive autopsies. RESULTS: Among 4,389 patients, median age was 65 years with 44% women. Compared with no AC (n = 1,530; 34.9%), therapeutic AC (n = 900; 20.5%) and prophylactic AC (n = 1,959; 44.6%) were associated with lower in-hospital mortality (adjusted hazard ratio [aHR]: 0.53; 95% confidence interval [CI]: 0.45 to 0.62 and aHR: 0.50; 95% CI: 0.45 to 0.57, respectively), and intubation (aHR: 0.69; 95% CI: 0.51 to 0.94 and aHR: 0.72; 95% CI: 0.58 to 0.89, respectively). When initiated ≤48 h from admission, there was no statistically significant difference between therapeutic (n = 766) versus prophylactic AC (n = 1,860) (aHR: 0.86; 95% CI: 0.73 to 1.02; p = 0.08). Overall, 89 patients (2%) had major bleeding adjudicated by clinician review, with 27 of 900 (3.0%) on therapeutic, 33 of 1,959 (1.7%) on prophylactic, and 29 of 1,530 (1.9%) on no AC. Of 26 autopsies, 11 (42%) had thromboembolic disease not clinically suspected and 3 of 11 (27%) were on therapeutic AC. CONCLUSIONS: AC was associated with lower mortality and intubation among hospitalized COVID-19 patients. Compared with prophylactic AC, therapeutic AC was associated with lower mortality, although not statistically significant. Autopsies revealed frequent thromboembolic disease. These data may inform trials to determine optimal AC regimens.
Asunto(s)
Anticoagulantes , Autopsia/estadística & datos numéricos , Infecciones por Coronavirus , Hospitalización/estadística & datos numéricos , Pandemias , Neumonía Viral , Profilaxis Posexposición , Tromboembolia , Anciano , Anticoagulantes/clasificación , Anticoagulantes/uso terapéutico , Betacoronavirus/aislamiento & purificación , Coagulación Sanguínea , COVID-19 , Infecciones por Coronavirus/sangre , Infecciones por Coronavirus/complicaciones , Infecciones por Coronavirus/mortalidad , Infecciones por Coronavirus/terapia , Femenino , Hemorragia/inducido químicamente , Hemorragia/prevención & control , Mortalidad Hospitalaria , Humanos , Masculino , Ciudad de Nueva York/epidemiología , Evaluación de Procesos y Resultados en Atención de Salud , Neumonía Viral/sangre , Neumonía Viral/complicaciones , Neumonía Viral/mortalidad , Neumonía Viral/terapia , Profilaxis Posexposición/métodos , Profilaxis Posexposición/estadística & datos numéricos , Ajuste de Riesgo/métodos , SARS-CoV-2 , Tromboembolia/tratamiento farmacológico , Tromboembolia/mortalidad , Tromboembolia/prevención & control , Tromboembolia/virologíaRESUMEN
IMPORTANCE: Preliminary reports indicate that acute kidney injury (AKI) is common in coronavirus disease (COVID)-19 patients and is associated with worse outcomes. AKI in hospitalized COVID-19 patients in the United States is not well-described. OBJECTIVE: To provide information about frequency, outcomes and recovery associated with AKI and dialysis in hospitalized COVID-19 patients. DESIGN: Observational, retrospective study. SETTING: Admitted to hospital between February 27 and April 15, 2020. PARTICIPANTS: Patients aged ≥18 years with laboratory confirmed COVID-19 Exposures: AKI (peak serum creatinine increase of 0.3 mg/dL or 50% above baseline). Main Outcomes and Measures: Frequency of AKI and dialysis requirement, AKI recovery, and adjusted odds ratios (aOR) with mortality. We also trained and tested a machine learning model for predicting dialysis requirement with independent validation. RESULTS: A total of 3,235 hospitalized patients were diagnosed with COVID-19. AKI occurred in 1406 (46%) patients overall and 280 (20%) with AKI required renal replacement therapy. The incidence of AKI (admission plus new cases) in patients admitted to the intensive care unit was 68% (553 of 815). In the entire cohort, the proportion with stages 1, 2, and 3 AKI were 35%, 20%, 45%, respectively. In those needing intensive care, the respective proportions were 20%, 17%, 63%, and 34% received acute renal replacement therapy. Independent predictors of severe AKI were chronic kidney disease, systolic blood pressure, and potassium at baseline. In-hospital mortality in patients with AKI was 41% overall and 52% in intensive care. The aOR for mortality associated with AKI was 9.6 (95% CI 7.4-12.3) overall and 20.9 (95% CI 11.7-37.3) in patients receiving intensive care. 56% of patients with AKI who were discharged alive recovered kidney function back to baseline. The area under the curve (AUC) for the machine learned predictive model using baseline features for dialysis requirement was 0.79 in a validation test. CONCLUSIONS AND RELEVANCE: AKI is common in patients hospitalized with COVID-19, associated with worse mortality, and the majority of patients that survive do not recover kidney function. A machine-learned model using admission features had good performance for dialysis prediction and could be used for resource allocation.
RESUMEN
BACKGROUND: The coronavirus 2019 (Covid-19) pandemic is a global public health crisis, with over 1.6 million cases and 95,000 deaths worldwide. Data are needed regarding the clinical course of hospitalized patients, particularly in the United States. METHODS: Demographic, clinical, and outcomes data for patients admitted to five Mount Sinai Health System hospitals with confirmed Covid-19 between February 27 and April 2, 2020 were identified through institutional electronic health records. We conducted a descriptive study of patients who had in-hospital mortality or were discharged alive. RESULTS: A total of 2,199 patients with Covid-19 were hospitalized during the study period. As of April 2 nd , 1,121 (51%) patients remained hospitalized, and 1,078 (49%) completed their hospital course. Of the latter, the overall mortality was 29%, and 36% required intensive care. The median age was 65 years overall and 75 years in those who died. Pre-existing conditions were present in 65% of those who died and 46% of those discharged. In those who died, the admission median lymphocyte percentage was 11.7%, D-dimer was 2.4 ug/ml, C-reactive protein was 162 mg/L, and procalcitonin was 0.44 ng/mL. In those discharged, the admission median lymphocyte percentage was 16.6%, D-dimer was 0.93 ug/ml, C-reactive protein was 79 mg/L, and procalcitonin was 0.09 ng/mL. CONCLUSIONS: This is the largest and most diverse case series of hospitalized patients with Covid-19 in the United States to date. Requirement of intensive care and mortality were high. Patients who died typically had pre-existing conditions and severe perturbations in inflammatory markers.
RESUMEN
OBJECTIVES: Approximately 20-30% of patients with COVID-19 require hospitalization, and 5-12% may require critical care in an intensive care unit (ICU). A rapid surge in cases of severe COVID-19 will lead to a corresponding surge in demand for ICU care. Because of constraints on resources, frontline healthcare workers may be unable to provide the frequent monitoring and assessment required for all patients at high risk of clinical deterioration. We developed a machine learning-based risk prioritization tool that predicts ICU transfer within 24 h, seeking to facilitate efficient use of care providers' efforts and help hospitals plan their flow of operations. METHODS: A retrospective cohort was comprised of non-ICU COVID-19 admissions at a large acute care health system between 26 February and 18 April 2020. Time series data, including vital signs, nursing assessments, laboratory data, and electrocardiograms, were used as input variables for training a random forest (RF) model. The cohort was randomly split (70:30) into training and test sets. The RF model was trained using 10-fold cross-validation on the training set, and its predictive performance on the test set was then evaluated. RESULTS: The cohort consisted of 1987 unique patients diagnosed with COVID-19 and admitted to non-ICU units of the hospital. The median time to ICU transfer was 2.45 days from the time of admission. Compared to actual admissions, the tool had 72.8% (95% CI: 63.2-81.1%) sensitivity, 76.3% (95% CI: 74.7-77.9%) specificity, 76.2% (95% CI: 74.6-77.7%) accuracy, and 79.9% (95% CI: 75.2-84.6%) area under the receiver operating characteristics curve. CONCLUSIONS: A ML-based prediction model can be used as a screening tool to identify patients at risk of imminent ICU transfer within 24 h. This tool could improve the management of hospital resources and patient-throughput planning, thus delivering more effective care to patients hospitalized with COVID-19.