Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 147
Filtrar
1.
medRxiv ; 2024 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-38562803

RESUMEN

Rationale: Early detection of clinical deterioration using early warning scores may improve outcomes. However, most implemented scores were developed using logistic regression, only underwent retrospective internal validation, and were not tested in important patient subgroups. Objectives: To develop a gradient boosted machine model (eCARTv5) for identifying clinical deterioration and then validate externally, test prospectively, and evaluate across patient subgroups. Methods: All adult patients hospitalized on the wards in seven hospitals from 2008- 2022 were used to develop eCARTv5, with demographics, vital signs, clinician documentation, and laboratory values utilized to predict intensive care unit transfer or death in the next 24 hours. The model was externally validated retrospectively in 21 hospitals from 2009-2023 and prospectively in 10 hospitals from February to May 2023. eCARTv5 was compared to the Modified Early Warning Score (MEWS) and the National Early Warning Score (NEWS) using the area under the receiver operating characteristic curve (AUROC). Measurements and Main Results: The development cohort included 901,491 admissions, the retrospective validation cohort included 1,769,461 admissions, and the prospective validation cohort included 46,330 admissions. In retrospective validation, eCART had the highest AUROC (0.835; 95%CI 0.834, 0.835), followed by NEWS (0.766 (95%CI 0.766, 0.767)), and MEWS (0.704 (95%CI 0.703, 0.704)). eCART's performance remained high (AUROC ≥0.80) across a range of patient demographics, clinical conditions, and during prospective validation. Conclusions: We developed eCARTv5, which accurately identifies early clinical deterioration in hospitalized ward patients. Our model performed better than the NEWS and MEWS retrospectively, prospectively, and across a range of subgroups.

2.
J Am Med Inform Assoc ; 31(6): 1322-1330, 2024 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-38679906

RESUMEN

OBJECTIVES: To compare and externally validate popular deep learning model architectures and data transformation methods for variable-length time series data in 3 clinical tasks (clinical deterioration, severe acute kidney injury [AKI], and suspected infection). MATERIALS AND METHODS: This multicenter retrospective study included admissions at 2 medical centers that spanned 2007-2022. Distinct datasets were created for each clinical task, with 1 site used for training and the other for testing. Three feature engineering methods (normalization, standardization, and piece-wise linear encoding with decision trees [PLE-DTs]) and 3 architectures (long short-term memory/gated recurrent unit [LSTM/GRU], temporal convolutional network, and time-distributed wrapper with convolutional neural network [TDW-CNN]) were compared in each clinical task. Model discrimination was evaluated using the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC). RESULTS: The study comprised 373 825 admissions for training and 256 128 admissions for testing. LSTM/GRU models tied with TDW-CNN models with both obtaining the highest mean AUPRC in 2 tasks, and LSTM/GRU had the highest mean AUROC across all tasks (deterioration: 0.81, AKI: 0.92, infection: 0.87). PLE-DT with LSTM/GRU achieved the highest AUPRC in all tasks. DISCUSSION: When externally validated in 3 clinical tasks, the LSTM/GRU model architecture with PLE-DT transformed data demonstrated the highest AUPRC in all tasks. Multiple models achieved similar performance when evaluated using AUROC. CONCLUSION: The LSTM architecture performs as well or better than some newer architectures, and PLE-DT may enhance the AUPRC in variable-length time series data for predicting clinical outcomes during external validation.


Asunto(s)
Aprendizaje Profundo , Humanos , Estudios Retrospectivos , Lesión Renal Aguda , Redes Neurales de la Computación , Curva ROC , Masculino , Conjuntos de Datos como Asunto , Femenino , Persona de Mediana Edad
3.
J Am Med Inform Assoc ; 31(6): 1291-1302, 2024 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-38587875

RESUMEN

OBJECTIVE: The timely stratification of trauma injury severity can enhance the quality of trauma care but it requires intense manual annotation from certified trauma coders. The objective of this study is to develop machine learning models for the stratification of trauma injury severity across various body regions using clinical text and structured electronic health records (EHRs) data. MATERIALS AND METHODS: Our study utilized clinical documents and structured EHR variables linked with the trauma registry data to create 2 machine learning models with different approaches to representing text. The first one fuses concept unique identifiers (CUIs) extracted from free text with structured EHR variables, while the second one integrates free text with structured EHR variables. Temporal validation was undertaken to ensure the models' temporal generalizability. Additionally, analyses to assess the variable importance were conducted. RESULTS: Both models demonstrated impressive performance in categorizing leg injuries, achieving high accuracy with macro-F1 scores of over 0.8. Additionally, they showed considerable accuracy, with macro-F1 scores exceeding or near 0.7, in assessing injuries in the areas of the chest and head. We showed in our variable importance analysis that the most important features in the model have strong face validity in determining clinically relevant trauma injuries. DISCUSSION: The CUI-based model achieves comparable performance, if not higher, compared to the free-text-based model, with reduced complexity. Furthermore, integrating structured EHR data improves performance, particularly when the text modalities are insufficiently indicative. CONCLUSIONS: Our multi-modal, multiclass models can provide accurate stratification of trauma injury severity and clinically relevant interpretations.


Asunto(s)
Registros Electrónicos de Salud , Aprendizaje Automático , Heridas y Lesiones , Humanos , Heridas y Lesiones/clasificación , Puntaje de Gravedad del Traumatismo , Sistema de Registros , Índices de Gravedad del Trauma , Procesamiento de Lenguaje Natural
5.
medRxiv ; 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38585973

RESUMEN

Natural Language Processing (NLP) is a study of automated processing of text data. Application of NLP in the clinical domain is important due to the rich unstructured information implanted in clinical documents, which often remains inaccessible in structured data. Empowered by the recent advance of language models (LMs), there is a growing interest in their application within the clinical domain. When applying NLP methods to a certain domain, the role of benchmark datasets are crucial as benchmark datasets not only guide the selection of best-performing models but also enable assessing of the reliability of the generated outputs. Despite the recent availability of LMs capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent. To address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. Our notes have a median word count of 1687 and an interquartile range of 1308 to 2169. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to Hierarchical Transformer and an open-source instruction-tuned large language model. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations. We expect LCD benchmarks to become a resource for the development of advanced supervised models, prompting methods, or the foundation models themselves, tailored for clinical text. The benchmark dataset is available at https://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc.

6.
J Clin Med ; 13(5)2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38592057

RESUMEN

(1) Background: SeptiCyte RAPID is a molecular test for discriminating sepsis from non-infectious systemic inflammation, and for estimating sepsis probabilities. The objective of this study was the clinical validation of SeptiCyte RAPID, based on testing retrospectively banked and prospectively collected patient samples. (2) Methods: The cartridge-based SeptiCyte RAPID test accepts a PAXgene blood RNA sample and provides sample-to-answer processing in ~1 h. The test output (SeptiScore, range 0-15) falls into four interpretation bands, with higher scores indicating higher probabilities of sepsis. Retrospective (N = 356) and prospective (N = 63) samples were tested from adult patients in ICU who either had the systemic inflammatory response syndrome (SIRS), or were suspected of having/diagnosed with sepsis. Patients were clinically evaluated by a panel of three expert physicians blinded to the SeptiCyte test results. Results were interpreted under either the Sepsis-2 or Sepsis-3 framework. (3) Results: Under the Sepsis-2 framework, SeptiCyte RAPID performance for the combined retrospective and prospective cohorts had Areas Under the ROC Curve (AUCs) ranging from 0.82 to 0.85, a negative predictive value of 0.91 (sensitivity 0.94) for SeptiScore Band 1 (score range 0.1-5.0; lowest risk of sepsis), and a positive predictive value of 0.81 (specificity 0.90) for SeptiScore Band 4 (score range 7.4-15; highest risk of sepsis). Performance estimates for the prospective cohort ranged from AUC 0.86-0.95. For physician-adjudicated sepsis cases that were blood culture (+) or blood, urine culture (+)(+), 43/48 (90%) of SeptiCyte scores fell in Bands 3 or 4. In multivariable analysis with up to 14 additional clinical variables, SeptiScore was the most important variable for sepsis diagnosis. A comparable performance was obtained for the majority of patients reanalyzed under the Sepsis-3 definition, although a subgroup of 16 patients was identified that was called septic under Sepsis-2 but not under Sepsis-3. (4) Conclusions: This study validates SeptiCyte RAPID for estimating sepsis probability, under both the Sepsis-2 and Sepsis-3 frameworks, for hospitalized patients on their first day of ICU admission.

7.
medRxiv ; 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38562730

RESUMEN

In the evolving landscape of clinical Natural Language Generation (NLG), assessing abstractive text quality remains challenging, as existing methods often overlook generative task complexities. This work aimed to examine the current state of automated evaluation metrics in NLG in healthcare. To have a robust and well-validated baseline with which to examine the alignment of these metrics, we created a comprehensive human evaluation framework. Employing ChatGPT-3.5-turbo generative output, we correlated human judgments with each metric. None of the metrics demonstrated high alignment; however, the SapBERT score-a Unified Medical Language System (UMLS)- showed the best results. This underscores the importance of incorporating domain-specific knowledge into evaluation efforts. Our work reveals the deficiency in quality evaluations for generated text and introduces our comprehensive human evaluation framework as a baseline. Future efforts should prioritize integrating medical knowledge databases to enhance the alignment of automated metrics, particularly focusing on refining the SapBERT score for improved assessments.

8.
Crit Care Explor ; 6(3): e1066, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38505174

RESUMEN

OBJECTIVES: Alcohol withdrawal syndrome (AWS) may progress to require high-intensity care. Approaches to identify hospitalized patients with AWS who received higher level of care have not been previously examined. This study aimed to examine the utility of Clinical Institute Withdrawal Assessment Alcohol Revised (CIWA-Ar) for alcohol scale scores and medication doses for alcohol withdrawal management in identifying patients who received high-intensity care. DESIGN: A multicenter observational cohort study of hospitalized adults with alcohol withdrawal. SETTING: University of Chicago Medical Center and University of Wisconsin Hospital. PATIENTS: Inpatient encounters between November 2008 and February 2022 with a CIWA-Ar score greater than 0 and benzodiazepine or barbiturate administered within the first 24 hours. The primary composite outcome was patients who progressed to high-intensity care (intermediate care or ICU). INTERVENTIONS: None. MAIN RESULTS: Among the 8742 patients included in the study, 37.5% (n = 3280) progressed to high-intensity care. The odds ratio for the composite outcome increased above 1.0 when the CIWA-Ar score was 24. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) at this threshold were 0.12 (95% CI, 0.11-0.13), 0.95 (95% CI, 0.94-0.95), 0.58 (95% CI, 0.54-0.61), and 0.64 (95% CI, 0.63-0.65), respectively. The OR increased above 1.0 at a 24-hour lorazepam milligram equivalent dose cutoff of 15 mg. The sensitivity, specificity, PPV, and NPV at this threshold were 0.16 (95% CI, 0.14-0.17), 0.96 (95% CI, 0.95-0.96), 0.68 (95% CI, 0.65-0.72), and 0.65 (95% CI, 0.64-0.66), respectively. CONCLUSIONS: Neither CIWA-Ar scores nor medication dose cutoff points were effective measures for identifying patients with alcohol withdrawal who received high-intensity care. Research studies for examining outcomes in patients who deteriorate with AWS will require better methods for cohort identification.

9.
medRxiv ; 2024 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-38370788

RESUMEN

OBJECTIVE: Timely intervention for clinically deteriorating ward patients requires that care teams accurately diagnose and treat their underlying medical conditions. However, the most common diagnoses leading to deterioration and the relevant therapies provided are poorly characterized. Therefore, we aimed to determine the diagnoses responsible for clinical deterioration, the relevant diagnostic tests ordered, and the treatments administered among high-risk ward patients using manual chart review. DESIGN: Multicenter retrospective observational study. SETTING: Inpatient medical-surgical wards at four health systems from 2006-2020 PATIENTS: Randomly selected patients (1,000 from each health system) with clinical deterioration, defined by reaching the 95th percentile of a validated early warning score, electronic Cardiac Arrest Risk Triage (eCART), were included. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Clinical deterioration was confirmed by a trained reviewer or marked as a false alarm if no deterioration occurred for each patient. For true deterioration events, the condition causing deterioration, relevant diagnostic tests ordered, and treatments provided were collected. Of the 4,000 included patients, 2,484 (62%) had clinical deterioration confirmed by chart review. Sepsis was the most common cause of deterioration (41%; n=1,021), followed by arrhythmia (19%; n=473), while liver failure had the highest in-hospital mortality (41%). The most common diagnostic tests ordered were complete blood counts (47% of events), followed by chest x-rays (42%), and cultures (40%), while the most common medication orders were antimicrobials (46%), followed by fluid boluses (34%), and antiarrhythmics (19%). CONCLUSIONS: We found that sepsis was the most common cause of deterioration, while liver failure had the highest mortality. Complete blood counts and chest x-rays were the most common diagnostic tests ordered, and antimicrobials and fluid boluses were the most common medication interventions. These results provide important insights for clinical decision-making at the bedside, training of rapid response teams, and the development of institutional treatment pathways for clinical deterioration. KEY POINTS: Question: What are the most common diagnoses, diagnostic test orders, and treatments for ward patients experiencing clinical deterioration? Findings: In manual chart review of 2,484 encounters with deterioration across four health systems, we found that sepsis was the most common cause of clinical deterioration, followed by arrythmias, while liver failure had the highest mortality. Complete blood counts and chest x-rays were the most common diagnostic test orders, while antimicrobials and fluid boluses were the most common treatments. Meaning: Our results provide new insights into clinical deterioration events, which can inform institutional treatment pathways, rapid response team training, and patient care.

10.
Addiction ; 119(4): 766-771, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38011858

RESUMEN

BACKGROUND AND AIMS: Accurate case discovery is critical for disease surveillance, resource allocation and research. International Classification of Disease (ICD) diagnosis codes are commonly used for this purpose. We aimed to determine the sensitivity, specificity and positive predictive value (PPV) of ICD-10 codes for opioid misuse case discovery in the emergency department (ED) setting. DESIGN AND SETTING: Retrospective cohort study of ED encounters from January 2018 to December 2020 at an urban academic hospital in the United States. A sample of ED encounters enriched for opioid misuse was developed by oversampling ED encounters with positive urine opiate screens or pre-existing opioid-related diagnosis codes in addition to other opioid misuse risk factors. CASES: A total of 1200 randomly selected encounters were annotated by research staff for the presence of opioid misuse within health record documentation using a 5-point scale for likelihood of opioid misuse and dichotomized into cohorts of opioid misuse and no opioid misuse. MEASUREMENTS: Using manual annotation as ground truth, the sensitivity and specificity of ICD-10 codes entered during the encounter were determined with PPV adjusted for oversampled data. Metrics were also determined by disposition subgroup: discharged home or admitted. FINDINGS: There were 541 encounters annotated as opioid misuse and 617 with no opioid misuse. The majority were males (54.4%), average age was 47 years and 68.5% were discharged directly from the ED. The sensitivity of ICD-10 codes was 0.56 (95% confidence interval [CI], 0.51-0.60), specificity 0.99 (95% CI, 0.97-0.99) and adjusted PPV 0.78 (95% CI, 0.65-0.92). The sensitivity was higher for patients discharged from the ED (0.65; 95% CI, 0.60-0.69) than those admitted (0.31; 95% CI, 0.24-0.39). CONCLUSIONS: International Classification of Disease-10 codes appear to have low sensitivity but high specificity and positive predictive value in detecting opioid misuse among emergency department patients in the United States.


Asunto(s)
Clasificación Internacional de Enfermedades , Trastornos Relacionados con Opioides , Masculino , Humanos , Estados Unidos/epidemiología , Persona de Mediana Edad , Femenino , Estudios Retrospectivos , Trastornos Relacionados con Opioides/diagnóstico , Trastornos Relacionados con Opioides/epidemiología , Valor Predictivo de las Pruebas , Servicio de Urgencia en Hospital
11.
Ann Surg Oncol ; 31(1): 488-498, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37782415

RESUMEN

BACKGROUND: While lower socioeconomic status has been shown to correlate with worse outcomes in cancer care, data correlating neighborhood-level metrics with outcomes are scarce. We aim to explore the association between neighborhood disadvantage and both short- and long-term postoperative outcomes in patients undergoing pancreatectomy for pancreatic ductal adenocarcinoma (PDAC). PATIENTS AND METHODS: We retrospectively analyzed 243 patients who underwent resection for PDAC at a single institution between 1 January 2010 and 15 September 2021. To measure neighborhood disadvantage, the cohort was divided into tertiles by Area Deprivation Index (ADI). Short-term outcomes of interest were minor complications, major complications, unplanned readmission within 30 days, prolonged hospitalization, and delayed gastric emptying (DGE). The long-term outcome of interest was overall survival. Logistic regression was used to test short-term outcomes; Cox proportional hazards models and Kaplan-Meier method were used for long-term outcomes. RESULTS: The median ADI of the cohort was 49 (IQR 32-64.5). On adjusted analysis, the high-ADI group demonstrated greater odds of suffering a major complication (odds ratio [OR], 2.78; 95% confidence interval [CI], 1.26-6.40; p = 0.01) and of an unplanned readmission (OR, 3.09; 95% CI, 1.16-9.28; p = 0.03) compared with the low-ADI group. There were no significant differences between groups in the odds of minor complications, prolonged hospitalization, or DGE (all p > 0.05). High ADI did not confer an increased hazard of death (p = 0.63). CONCLUSIONS: We found that worse neighborhood disadvantage is associated with a higher risk of major complication and unplanned readmission after pancreatectomy for PDAC.


Asunto(s)
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Pancreatectomía/efectos adversos , Pancreatectomía/métodos , Estudios Retrospectivos , Neoplasias Pancreáticas/patología , Carcinoma Ductal Pancreático/patología , Características del Vecindario
12.
JAMIA Open ; 6(4): ooad109, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38144168

RESUMEN

Objectives: To develop and externally validate machine learning models using structured and unstructured electronic health record data to predict postoperative acute kidney injury (AKI) across inpatient settings. Materials and Methods: Data for adult postoperative admissions to the Loyola University Medical Center (2009-2017) were used for model development and admissions to the University of Wisconsin-Madison (2009-2020) were used for validation. Structured features included demographics, vital signs, laboratory results, and nurse-documented scores. Unstructured text from clinical notes were converted into concept unique identifiers (CUIs) using the clinical Text Analysis and Knowledge Extraction System. The primary outcome was the development of Kidney Disease Improvement Global Outcomes stage 2 AKI within 7 days after leaving the operating room. We derived unimodal extreme gradient boosting machines (XGBoost) and elastic net logistic regression (GLMNET) models using structured-only data and multimodal models combining structured data with CUI features. Model comparison was performed using the receiver operating characteristic curve (AUROC), with Delong's test for statistical differences. Results: The study cohort included 138 389 adult patient admissions (mean [SD] age 58 [16] years; 11 506 [8%] African-American; and 70 826 [51%] female) across the 2 sites. Of those, 2959 (2.1%) developed stage 2 AKI or higher. Across all data types, XGBoost outperformed GLMNET (mean AUROC 0.81 [95% confidence interval (CI), 0.80-0.82] vs 0.78 [95% CI, 0.77-0.79]). The multimodal XGBoost model incorporating CUIs parameterized as term frequency-inverse document frequency (TF-IDF) showed the highest discrimination performance (AUROC 0.82 [95% CI, 0.81-0.83]) over unimodal models (AUROC 0.79 [95% CI, 0.78-0.80]). Discussion: A multimodality approach with structured data and TF-IDF weighting of CUIs increased model performance over structured data-only models. Conclusion: These findings highlight the predictive power of CUIs when merged with structured data for clinical prediction models, which may improve the detection of postoperative AKI.

13.
JAMIA Open ; 6(4): ooad092, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37942470

RESUMEN

Objectives: Substance misuse is a complex and heterogeneous set of conditions associated with high mortality and regional/demographic variations. Existing data systems are siloed and have been ineffective in curtailing the substance misuse epidemic. Therefore, we aimed to build a novel informatics platform, the Substance Misuse Data Commons (SMDC), by integrating multiple data modalities to provide a unified record of information crucial to improving outcomes in substance misuse patients. Materials and Methods: The SMDC was created by linking electronic health record (EHR) data from adult cases of substance (alcohol, opioid, nonopioid drug) misuse at the University of Wisconsin hospitals to socioeconomic and state agency data. To ensure private and secure data exchange, Privacy-Preserving Record Linkage (PPRL) and Honest Broker services were utilized. The overlap in mortality reporting among the EHR, state Vital Statistics, and a commercial national data source was assessed. Results: The SMDC included data from 36 522 patients experiencing 62 594 healthcare encounters. Over half of patients were linked to the statewide ambulance database and prescription drug monitoring program. Chronic diseases accounted for most underlying causes of death, while drug-related overdoses constituted 8%. Our analysis of mortality revealed a 49.1% overlap across the 3 data sources. Nonoverlapping deaths were associated with poor socioeconomic indicators. Discussion: Through PPRL, the SMDC enabled the longitudinal integration of multimodal data. Combining death data from local, state, and national sources enhanced mortality tracking and exposed disparities. Conclusion: The SMDC provides a comprehensive resource for clinical providers and policymakers to inform interventions targeting substance misuse-related hospitalizations, overdoses, and death.

14.
Proc Conf Assoc Comput Linguist Meet ; 2023: 125-130, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37786810

RESUMEN

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.

15.
J Am Med Inform Assoc ; 31(1): 89-97, 2023 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-37725927

RESUMEN

OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability. MATERIALS AND METHODS: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain-adaptive pretraining and task-adaptive pretraining. We added in-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. RESULTS: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across 3 datasets. This improvement was equivalent to adding 35 in-domain annotated samples. DISCUSSION: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. CONCLUSION: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.


Asunto(s)
Instituciones de Salud , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Tamaño de la Muestra
16.
Proc Conf Assoc Comput Linguist Meet ; 2023: 461-467, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37583489

RESUMEN

The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers' decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.

17.
Proc Conf Assoc Comput Linguist Meet ; 2023(ClinicalNLP): 78-85, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37492270

RESUMEN

Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically-trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.

18.
Adv Exp Med Biol ; 1426: 395-412, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37464130

RESUMEN

Severe asthma is a spectrum disorder with numerous subsets, many of which are defined by clinical history and a general predisposition for T2 inflammation. Most of the approved therapies for severe asthma have required clinical trial designs with population enrichment for exacerbation frequency and/or elevation of blood eosinophils. Moving beyond this framework will require trial designs that increase efficiency for studying nondominant subsets and continue to improve upon biomarker signatures. In addition to reviewing the current literature on biomarker-informed trials for severe asthma, this chapter will also review the advantages of master protocols and adaptive design methods for establishing the efficacy of new interventions in prospectively defined subsets of patients. The incorporation of methods that allow for data collection outside of traditional study visits at academic centers, called remote decentralized trial design, is a growing trend that may increase diversity in study participation and allow for enhanced resiliency during the COVID-19 pandemic. Finally, reaching the goals of precision medicine in asthma will require increased emphasis on effectiveness studies. Recent advances in real-world data utilization from electronic health records are also discussed with a view toward pragmatic trial designs that could also incorporate the evaluation of biomarker signatures.


Asunto(s)
Asma , COVID-19 , Medicina de Precisión , Humanos , Asma/diagnóstico , Asma/terapia , Biomarcadores , Ensayos Clínicos como Asunto , COVID-19/terapia , Pandemias
19.
JAMIA Open ; 6(2): ooad038, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37351012

RESUMEN

Objectives: Introduce the CDS-Sandbox, a cloud-based virtual machine created to facilitate Clinical Decision Support (CDS) developers and implementers in the use of FHIR- and CQL-based open-source tools and technologies for building and testing CDS artifacts. Materials and Methods: The CDS-Sandbox includes components that enable workflows for authoring and testing CDS artifacts. Two workshops at the 2020 and 2021 AMIA Annual Symposia were conducted to demonstrate the use of the open-source CDS tools. Results: The CDS-Sandbox successfully integrated the use of open-source CDS tools. Both workshops were well attended. Participants demonstrated use and understanding of the workshop materials and provided positive feedback after the workshops. Discussion: The CDS-Sandbox and publicly available tutorial materials facilitated an understanding of the leading-edge open-source CDS infrastructure components. Conclusion: The CDS-Sandbox supports integrated use of the key CDS open-source tools that may be used to introduce CDS concepts and practice to the clinical informatics community.

20.
medRxiv ; 2023 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-37162963

RESUMEN

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective", "Object", "Assessment" and "Plan") framework with improved transferability. Materials and methods: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. Results: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples. Discussion: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. Conclusion: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...