Pesquisa | BVS Bolivia

Traditional Machine Learning, Deep Learning, and BERT (Large Language Model) Approaches for Predicting Hospitalizations From Nurse Triage Notes: Comparative Evaluation of Resource Management.

Patel, Dhavalkumar; Timsina, Prem; Gorenstein, Larisa; Glicksberg, Benjamin S; Raut, Ganesh; Cheetirala, Satya Narayan; Santana, Fabio; Tamegue, Jules; Kia, Arash; Zimlichman, Eyal; Levin, Matthew A; Freeman, Robert; Klang, Eyal.

JMIR AI ; 3: e52190, 2024 Aug 27.

Artigo em Inglês | MEDLINE | ID: mdl-39190905

RESUMO

BACKGROUND: Predicting hospitalization from nurse triage notes has the potential to augment care. However, there needs to be careful considerations for which models to choose for this goal. Specifically, health systems will have varying degrees of computational infrastructure available and budget constraints. OBJECTIVE: To this end, we compared the performance of the deep learning, Bidirectional Encoder Representations from Transformers (BERT)-based model, Bio-Clinical-BERT, with a bag-of-words (BOW) logistic regression (LR) model incorporating term frequency-inverse document frequency (TF-IDF). These choices represent different levels of computational requirements. METHODS: A retrospective analysis was conducted using data from 1,391,988 patients who visited emergency departments in the Mount Sinai Health System spanning from 2017 to 2022. The models were trained on 4 hospitals' data and externally validated on a fifth hospital's data. RESULTS: The Bio-Clinical-BERT model achieved higher areas under the receiver operating characteristic curve (0.82, 0.84, and 0.85) compared to the BOW-LR-TF-IDF model (0.81, 0.83, and 0.84) across training sets of 10,000; 100,000; and ~1,000,000 patients, respectively. Notably, both models proved effective at using triage notes for prediction, despite the modest performance gap. CONCLUSIONS: Our findings suggest that simpler machine learning models such as BOW-LR-TF-IDF could serve adequately in resource-limited settings. Given the potential implications for patient care and hospital resource management, further exploration of alternative models and techniques is warranted to enhance predictive performance in this critical domain. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2023.08.07.23293699.

Multimodal fine-tuning of clinical language models for predicting COVID-19 outcomes.

Henriksson, Aron; Pawar, Yash; Hedberg, Pontus; Nauclér, Pontus.

Artif Intell Med ; 146: 102695, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-38042595

RESUMO

Clinical prediction models tend only to incorporate structured healthcare data, ignoring information recorded in other data modalities, including free-text clinical notes. Here, we demonstrate how multimodal models that effectively leverage both structured and unstructured data can be developed for predicting COVID-19 outcomes. The models are trained end-to-end using a technique we refer to as multimodal fine-tuning, whereby a pre-trained language model is updated based on both structured and unstructured data. The multimodal models are trained and evaluated using a multicenter cohort of COVID-19 patients encompassing all encounters at the emergency department of six hospitals. Experimental results show that multimodal models, leveraging the notion of multimodal fine-tuning and trained to predict (i) 30-day mortality, (ii) safe discharge and (iii) readmission, outperform unimodal models trained using only structured or unstructured healthcare data on all three outcomes. Sensitivity analyses are performed to better understand how well the multimodal models perform on different patient groups, while an ablation study is conducted to investigate the impact of different types of clinical notes on model performance. We argue that multimodal models that make effective use of routinely collected healthcare data to predict COVID-19 outcomes may facilitate patient management and contribute to the effective use of limited healthcare resources.

Assuntos

COVID-19 , Humanos , COVID-19/epidemiologia , Serviço Hospitalar de Emergência , Hospitais , Idioma , Alta do Paciente , Processamento de Linguagem Natural

Feature engineering from medical notes: A case study of dementia detection.

Ben Miled, Zina; Dexter, Paul R; Grout, Randall W; Boustani, Malaz.

Heliyon ; 9(3): e14636, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-37020943

RESUMO

Background and objectives: Medical notes are narratives that describe the health of the patient in free text format. These notes can be more informative than structured data such as the history of medications or disease conditions. They are routinely collected and can be used to evaluate the patient's risk for developing chronic diseases such as dementia. This study investigates different methodologies for transforming routine care notes into dementia risk classifiers and evaluates the generalizability of these classifiers to new patients and new health care institutions. Methods: The notes collected over the relevant history of the patient are lengthy. In this study, TF-ICF is used to select keywords with the highest discriminative ability between at risk dementia patients and healthy controls. The medical notes are then summarized in the form of occurrences of the selected keywords. Two different encodings of the summary are compared. The first encoding consists of the average of the vector embedding of each keyword occurrence as produced by the BERT or Clinical BERT pre-trained language models. The second encoding aggregates the keywords according to UMLS concepts and uses each concept as an exposure variable. For both encodings, misspellings of the selected keywords are also considered in an effort to improve the predictive performance of the classifiers. A neural network is developed over the first encoding and a gradient boosted trees model is applied to the second encoding. Patients from a single health care institution are used to develop all the classifiers which are then evaluated on held-out patients from the same health care institution as well as test patients from two other health care institutions. Results: The results indicate that it is possible to identify patients at risk for dementia one year ahead of the onset of the disease using medical notes with an AUC of 75% when a gradient boosted trees model is used in conjunction with exposure variables derived from UMLS concepts. However, this performance is not maintained with an embedded feature space and when the classifier is applied to patients from other health care institutions. Moreover, an analysis of the top predictors of the gradient boosted trees model indicates that different features inform the classification depending on whether or not spelling variants of the keywords are included. Conclusion: The present study demonstrates that medical notes can enable risk prediction models for complex chronic diseases such as dementia. However, additional research efforts are needed to improve the generalizability of these models. These efforts should take into consideration the length and localization of the medical notes; the availability of sufficient training data for each disease condition; and the variabilities resulting from different feature engineering techniques.

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA