Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
J Biomed Inform ; 139: 104302, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36754129

RESUMEN

An accurate and detailed account of patient medications, including medication changes within the patient timeline, is essential for healthcare providers to provide appropriate patient care. Healthcare providers or the patients themselves may initiate changes to patient medication. Medication changes take many forms, including prescribed medication and associated dosage modification. These changes provide information about the overall health of the patient and the rationale that led to the current care. Future care can then build on the resulting state of the patient. This work explores the automatic extraction of medication change information from free-text clinical notes. The Contextual Medication Event Dataset (CMED) is a corpus of clinical notes with annotations that characterize medication changes through multiple change-related attributes, including the type of change (start, stop, increase, etc.), initiator of the change, temporality, change likelihood, and negation. Using CMED, we identify medication mentions in clinical text and propose three novel high-performing BERT-based systems that resolve the annotated medication change characteristics. We demonstrate that our proposed systems improve medication change classification performance over the initial work exploring CMED.


Asunto(s)
Lenguaje , Procesamiento de Lenguaje Natural , Humanos , Narración
2.
BMC Pulm Med ; 23(1): 292, 2023 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-37559024

RESUMEN

BACKGROUND: Evolving ARDS epidemiology and management during COVID-19 have prompted calls to reexamine the construct validity of Berlin criteria, which have been rarely evaluated in real-world data. We developed a Berlin ARDS definition (EHR-Berlin) computable in electronic health records (EHR) to (1) assess its construct validity, and (2) assess how expanding its criteria affected validity. METHODS: We performed a retrospective cohort study at two tertiary care hospitals with one EHR, among adults hospitalized with COVID-19 February 2020-March 2021. We assessed five candidate definitions for ARDS: the EHR-Berlin definition modeled on Berlin criteria, and four alternatives informed by recent proposals to expand criteria and include patients on high-flow oxygen (EHR-Alternative 1), relax imaging criteria (EHR-Alternatives 2-3), and extend timing windows (EHR-Alternative 4). We evaluated two aspects of construct validity for the EHR-Berlin definition: (1) criterion validity: agreement with manual ARDS classification by experts, available in 175 patients; (2) predictive validity: relationships with hospital mortality, assessed by Pearson r and by area under the receiver operating curve (AUROC). We assessed predictive validity and timing of identification of EHR-Berlin definition compared to alternative definitions. RESULTS: Among 765 patients, mean (SD) age was 57 (18) years and 471 (62%) were male. The EHR-Berlin definition classified 171 (22%) patients as ARDS, which had high agreement with manual classification (kappa 0.85), and was associated with mortality (Pearson r = 0.39; AUROC 0.72, 95% CI 0.68, 0.77). In comparison, EHR-Alternative 1 classified 219 (29%) patients as ARDS, maintained similar relationships to mortality (r = 0.40; AUROC 0.74, 95% CI 0.70, 0.79, Delong test P = 0.14), and identified patients earlier in their hospitalization (median 13 vs. 15 h from admission, Wilcoxon signed-rank test P < 0.001). EHR-Alternative 3, which removed imaging criteria, had similar correlation (r = 0.41) but better discrimination for mortality (AUROC 0.76, 95% CI 0.72, 0.80; P = 0.036), and identified patients median 2 h (P < 0.001) from admission. CONCLUSIONS: The EHR-Berlin definition can enable ARDS identification with high criterion validity, supporting large-scale study and surveillance. There are opportunities to expand the Berlin criteria that preserve predictive validity and facilitate earlier identification.


Asunto(s)
COVID-19 , Síndrome de Dificultad Respiratoria , Humanos , Masculino , Adulto , Persona de Mediana Edad , Femenino , Estudios Retrospectivos , Registros Electrónicos de Salud , COVID-19/diagnóstico , Síndrome de Dificultad Respiratoria/diagnóstico , Medición de Riesgo
3.
BMC Anesthesiol ; 23(1): 296, 2023 09 04.
Artículo en Inglés | MEDLINE | ID: mdl-37667258

RESUMEN

BACKGROUND: Electronic health records (EHR) contain large volumes of unstructured free-form text notes that richly describe a patient's health and medical comorbidities. It is unclear if perioperative risk stratification can be performed directly from these notes without manual data extraction. We conduct a feasibility study using natural language processing (NLP) to predict the American Society of Anesthesiologists Physical Status Classification (ASA-PS) as a surrogate measure for perioperative risk. We explore prediction performance using four different model types and compare the use of different note sections versus the whole note. We use Shapley values to explain model predictions and analyze disagreement between model and human anesthesiologist predictions. METHODS: Single-center retrospective cohort analysis of EHR notes from patients undergoing procedures with anesthesia care spanning all procedural specialties during a 5 year period who were not assigned ASA VI and also had a preoperative evaluation note filed within 90 days prior to the procedure. NLP models were trained for each combination of 4 models and 8 text snippets from notes. Model performance was compared using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Shapley values were used to explain model predictions. Error analysis and model explanation using Shapley values was conducted for the best performing model. RESULTS: Final dataset includes 38,566 patients undergoing 61,503 procedures with anesthesia care. Prevalence of ASA-PS was 8.81% for ASA I, 31.4% for ASA II, 43.25% for ASA III, and 16.54% for ASA IV-V. The best performing models were the BioClinicalBERT model on the truncated note task (macro-average AUROC 0.845) and the fastText model on the full note task (macro-average AUROC 0.865). Shapley values reveal human-interpretable model predictions. Error analysis reveals that some original ASA-PS assignments may be incorrect and the model is making a reasonable prediction in these cases. CONCLUSIONS: Text classification models can accurately predict a patient's illness severity using only free-form text descriptions of patients without any manual data extraction. They can be an additional patient safety tool in the perioperative setting and reduce manual chart review for medical billing. Shapley feature attributions produce explanations that logically support model predictions and are understandable to clinicians.


Asunto(s)
Anestesia , Anestesiólogos , Humanos , Procesamiento de Lenguaje Natural , Estudios Retrospectivos , Estados Unidos
4.
J Digit Imaging ; 36(1): 91-104, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36253581

RESUMEN

Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging ("lesions") and other types of clinical problems ("medical problems"). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, and count. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9-93.4% F1 for finding triggers and 72.0-85.6% F1 for argument roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6% for finding triggers and 79.1-89.7% for argument roles, demonstrating that the model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.


Asunto(s)
Radiología , Humanos , Tomografía Computarizada por Rayos X , Semántica , Informe de Investigación , Procesamiento de Lenguaje Natural
5.
J Biomed Inform ; 117: 103761, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33781918

RESUMEN

Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). Our span-based event extraction model outperforms an extractor built on MetaMapLite for the identification of symptoms with assertion values. In a secondary use application, we predicted COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information, to explore the clinical presentation of COVID-19. Automatically extracted symptoms improve COVID-19 prediction performance, beyond structured data alone.


Asunto(s)
COVID-19/diagnóstico , Registros Electrónicos de Salud , Evaluación de Síntomas , Humanos , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural
6.
J Biomed Inform ; 113: 103631, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33290878

RESUMEN

Social determinants of health (SDOH) affect health outcomes, and knowledge of SDOH can inform clinical decision-making. Automatically extracting SDOH information from clinical text requires data-driven information extraction models trained on annotated corpora that are heterogeneous and frequently include critical SDOH. This work presents a new corpus with SDOH annotations, a novel active learning framework, and the first extraction results on the new corpus. The Social History Annotation Corpus (SHAC) includes 4480 social history sections with detailed annotation for 12 SDOH characterizing the status, extent, and temporal information of 18K distinct events. We introduce a novel active learning framework that selects samples for annotation using a surrogate text classification task as a proxy for a more complex event extraction task. The active learning framework successfully increases the frequency of health risk factors and improves automatic extraction of these events over undirected annotation. An event extraction model trained on SHAC achieves high extraction performance for substance use status (0.82-0.93 F1), employment status (0.81-0.86 F1), and living status type (0.81-0.93 F1) on data from three institutions.


Asunto(s)
Determinantes Sociales de la Salud , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Factores de Riesgo
7.
J Biomed Inform ; 107: 103425, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32348850

RESUMEN

Medical error is a leading cause of patient death in the United States. Among the different types of medical errors, harm to patients caused by doctors missing early signs of deterioration is especially challenging to address due to the heterogeneity of patients' physiological patterns. In this study, we implemented risk prediction models using the gradient boosted tree method to derive risk estimates for acute onset diseases in the near future. The prediction model uses physiological variables as input signals and the time of the administration of outcome-related interventions and discharge diagnoses as labels. We examine four categories of acute onset illness: acute heart failure (AHF), acute lung injury (ALI), acute kidney injury (AKI), and acute liver failure (ALF). To develop and test the model, we consider data from two sources: 23,578 admissions to the Intensive Care Unit (ICU) from the MIMIC-3 dataset (Beth-Israel Hospital) and 16,612 ICU admissions on hospitals affiliated with our institution (University of Washington Medical Center and Harborview Medical Center, the UW-CDR dataset). We systematically identify outcome-related interventions for each acute organ failure, then use them, along with discharge diagnoses, to label proxy events to train gradient boosted trees. The trained models achieve the highest F1 score with a value of 0.6018 when predicting the need for life-saving interventions for ALI within the next 24 h in the MIMIC-3 dataset while showing a median F1 score of 0.3850 from all acute organ failures in both datasets. The approach also achieves the highest F1 score of 0.6301 when classifying a patient's ALI status at the time of discharge from the MIMIC-3 dataset, with a median F1 score of 0.4307 in both datasets. This study shows the potential for using the time of outcome-related intervention administrations and discharge diagnoses as labels to train supervised machine learning models that predict the risk of acute onset illnesses.


Asunto(s)
Lesión Renal Aguda , Aprendizaje Automático , Lesión Renal Aguda/diagnóstico , Hospitalización , Humanos , Unidades de Cuidados Intensivos
10.
J Biomed Inform ; 64: 179-191, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27729234

RESUMEN

BACKGROUND: Anaphoric references occur ubiquitously in clinical narrative text. However, the problem, still very much an open challenge, is typically less aggressively focused on in clinical text domain applications. Furthermore, existing research on reference resolution is often conducted disjointly from real-world motivating tasks. OBJECTIVE: In this paper, we present our machine-learning system that automatically performs reference resolution and a rule-based system to extract tumor characteristics, with component-based and end-to-end evaluations. Specifically, our goal was to build an algorithm that takes in tumor templates and outputs tumor characteristic, e.g. tumor number and largest tumor sizes, necessary for identifying patient liver cancer stage phenotypes. RESULTS: Our reference resolution system reached a modest performance of 0.66 F1 for the averaged MUC, B-cubed, and CEAF scores for coreference resolution and 0.43 F1 for particularization relations. However, even this modest performance was helpful to increase the automatic tumor characteristics annotation substantially over no reference resolution. CONCLUSION: Experiments revealed the benefit of reference resolution even for relatively simple tumor characteristics variables such as largest tumor size. However we found that different overall variables had different tolerances to reference resolution upstream errors, highlighting the need to characterize systems by end-to-end evaluations.


Asunto(s)
Minería de Datos , Neoplasias Hepáticas/diagnóstico , Procesamiento de Lenguaje Natural , Algoritmos , Registros Electrónicos de Salud , Humanos , Neoplasias Hepáticas/clasificación , Neoplasias Hepáticas/diagnóstico por imagen , Semántica
11.
JAMA Surg ; 2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38837145

RESUMEN

Importance: General-domain large language models may be able to perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient's electronic health record notes. Objective: To examine predictive performance on 8 different tasks: prediction of American Society of Anesthesiologists Physical Status (ASA-PS), hospital admission, intensive care unit (ICU) admission, unplanned admission, hospital mortality, postanesthesia care unit (PACU) phase 1 duration, hospital duration, and ICU duration. Design, Setting, and Participants: This prognostic study included task-specific datasets constructed from 2 years of retrospective electronic health records data collected during routine clinical care. Case and note data were formatted into prompts and given to the large language model GPT-4 Turbo (OpenAI) to generate a prediction and explanation. The setting included a quaternary care center comprising 3 academic hospitals and affiliated clinics in a single metropolitan area. Patients who had a surgery or procedure with anesthesia and at least 1 clinician-written note filed in the electronic health record before surgery were included in the study. Data were analyzed from November to December 2023. Exposures: Compared original notes, note summaries, few-shot prompting, and chain-of-thought prompting strategies. Main Outcomes and Measures: F1 score for binary and categorical outcomes. Mean absolute error for numerical duration outcomes. Results: Study results were measured on task-specific datasets, each with 1000 cases with the exception of unplanned admission, which had 949 cases, and hospital mortality, which had 576 cases. The best results for each task included an F1 score of 0.50 (95% CI, 0.47-0.53) for ASA-PS, 0.64 (95% CI, 0.61-0.67) for hospital admission, 0.81 (95% CI, 0.78-0.83) for ICU admission, 0.61 (95% CI, 0.58-0.64) for unplanned admission, and 0.86 (95% CI, 0.83-0.89) for hospital mortality prediction. Performance on duration prediction tasks was universally poor across all prompt strategies for which the large language model achieved a mean absolute error of 49 minutes (95% CI, 46-51 minutes) for PACU phase 1 duration, 4.5 days (95% CI, 4.2-5.0 days) for hospital duration, and 1.1 days (95% CI, 0.9-1.3 days) for ICU duration prediction. Conclusions and Relevance: Current general-domain large language models may assist clinicians in perioperative risk stratification on classification tasks but are inadequate for numerical duration predictions. Their ability to produce high-quality natural language explanations for the predictions may make them useful tools in clinical workflows and may be complementary to traditional risk prediction models.

12.
Artículo en Inglés | MEDLINE | ID: mdl-38657567

RESUMEN

OBJECTIVES: Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks. Using a few examples (few-shot) or no examples (zero-shot) for prompt-tuning has enabled LLMs to achieve state-of-the-art performance in a broad range of NLP applications. This article by the American Medical Informatics Association (AMIA) NLP Working Group characterizes the opportunities, challenges, and best practices for our community to leverage and advance the integration of LLMs in downstream NLP applications effectively. This can be accomplished through a variety of approaches, including augmented prompting, instruction prompt tuning, and reinforcement learning from human feedback (RLHF). TARGET AUDIENCE: Our focus is on making LLMs accessible to the broader biomedical informatics community, including clinicians and researchers who may be unfamiliar with NLP. Additionally, NLP practitioners may gain insight from the described best practices. SCOPE: We focus on 3 broad categories of NLP tasks, namely natural language understanding, natural language inferencing, and natural language generation. We review the emerging trends in prompt tuning, instruction fine-tuning, and evaluation metrics used for LLMs while drawing attention to several issues that impact biomedical NLP applications, including falsehoods in generated text (confabulation/hallucinations), toxicity, and dataset contamination leading to overfitting. We also review potential approaches to address some of these current challenges in LLMs, such as chain of thought prompting, and the phenomena of emergent capabilities observed in LLMs that can be leveraged to address complex NLP challenge in biomedical applications.

13.
J Am Med Inform Assoc ; 30(8): 1367-1378, 2023 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-36795066

RESUMEN

OBJECTIVE: The n2c2/UW SDOH Challenge explores the extraction of social determinant of health (SDOH) information from clinical notes. The objectives include the advancement of natural language processing (NLP) information extraction techniques for SDOH and clinical information more broadly. This article presents the shared task, data, participating teams, performance results, and considerations for future work. MATERIALS AND METHODS: The task used the Social History Annotated Corpus (SHAC), which consists of clinical text with detailed event-based annotations for SDOH events, such as alcohol, drug, tobacco, employment, and living situation. Each SDOH event is characterized through attributes related to status, extent, and temporality. The task includes 3 subtasks related to information extraction (Subtask A), generalizability (Subtask B), and learning transfer (Subtask C). In addressing this task, participants utilized a range of techniques, including rules, knowledge bases, n-grams, word embeddings, and pretrained language models (LM). RESULTS: A total of 15 teams participated, and the top teams utilized pretrained deep learning LM. The top team across all subtasks used a sequence-to-sequence approach achieving 0.901 F1 for Subtask A, 0.774 F1 Subtask B, and 0.889 F1 for Subtask C. CONCLUSIONS: Similar to many NLP tasks and domains, pretrained LM yielded the best performance, including generalizability and learning transfer. An error analysis indicates extraction performance varies by SDOH, with lower performance achieved for conditions, like substance use and homelessness, which increase health risks (risk factors) and higher performance achieved for conditions, like substance abstinence and living with family, which reduce health risks (protective factors).


Asunto(s)
Procesamiento de Lenguaje Natural , Determinantes Sociales de la Salud , Humanos , Almacenamiento y Recuperación de la Información , Registros Electrónicos de Salud
14.
AMIA Jt Summits Transl Sci Proc ; 2023: 622-631, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37350923

RESUMEN

Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.

15.
AMIA Annu Symp Proc ; 2023: 923-932, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38222433

RESUMEN

Natural Language Processing (NLP) methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and integration across different institutions for accurate and portable models. However, this can introduce a form of bias called confounding by provenance. When source-specific data distributions differ at deployment, this may harm model performance. To address this issue, we evaluate the utility of backdoor adjustment for text classification in a multi-site dataset of clinical notes annotated for mentions of substance abuse. Using an evaluation framework devised to measure robustness to distributional shifts, we assess the utility of backdoor adjustment. Our results indicate that backdoor adjustment can effectively mitigate for confounding shift.


Asunto(s)
Registros Electrónicos de Salud , Trastornos Relacionados con Sustancias , Humanos , Recolección de Datos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Estudios Multicéntricos como Asunto
16.
J Am Med Inform Assoc ; 30(8): 1389-1397, 2023 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-37130345

RESUMEN

OBJECTIVE: Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: (1) develop a natural language processing information extraction model to capture detailed SDOH information and (2) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data. MATERIALS AND METHODS: We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set with 225 089 patients and 430 406 notes with social history sections and compared the extracted SDOH information with existing structured data. RESULTS: The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative. CONCLUSIONS: Utilizing EHR data to identify SDOH health risk factors and social needs may improve patient care and outcomes. Semantic representations of text-encoded SDOH information can augment existing structured data, and this more comprehensive SDOH representation can assist health systems in identifying and addressing these social needs.


Asunto(s)
Registros Electrónicos de Salud , Determinantes Sociales de la Salud , Humanos , Procesamiento de Lenguaje Natural , Factores de Riesgo , Almacenamiento y Recuperación de la Información
17.
medRxiv ; 2023 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-37162963

RESUMEN

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective", "Object", "Assessment" and "Plan") framework with improved transferability. Materials and methods: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. Results: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples. Discussion: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. Conclusion: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.

18.
J Am Med Inform Assoc ; 31(1): 89-97, 2023 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-37725927

RESUMEN

OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability. MATERIALS AND METHODS: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain-adaptive pretraining and task-adaptive pretraining. We added in-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. RESULTS: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across 3 datasets. This improvement was equivalent to adding 35 in-domain annotated samples. DISCUSSION: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. CONCLUSION: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.


Asunto(s)
Instituciones de Salud , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Tamaño de la Muestra
19.
Sci Data ; 10(1): 586, 2023 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-37673893

RESUMEN

Recent immense breakthroughs in generative models such as in GPT4 have precipitated re-imagined ubiquitous usage of these models in all applications. One area that can benefit by improvements in artificial intelligence (AI) is healthcare. The note generation task from doctor-patient encounters, and its associated electronic medical record documentation, is one of the most arduous time-consuming tasks for physicians. It is also a natural prime potential beneficiary to advances in generative models. However with such advances, benchmarking is more critical than ever. Whether studying model weaknesses or developing new evaluation metrics, shared open datasets are an imperative part of understanding the current state-of-the-art. Unfortunately as clinic encounter conversations are not routinely recorded and are difficult to ethically share due to patient confidentiality, there are no sufficiently large clinic dialogue-note datasets to benchmark this task. Here we present the Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, the largest dataset to date tackling the problem of AI-assisted note generation from visit dialogue. We also present the benchmark performances of several common state-of-the-art approaches.


Asunto(s)
Inteligencia Artificial , Benchmarking , Instituciones de Salud , Humanos , Registros Electrónicos de Salud
20.
J Am Med Inform Assoc ; 30(12): 1954-1964, 2023 11 17.
Artículo en Inglés | MEDLINE | ID: mdl-37550244

RESUMEN

OBJECTIVE: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. MATERIALS AND METHODS: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. RESULTS: LeafAI matched a mean 43% of enrolled patients with 27 225 eligible across 8 clinical trials, compared to 27% matched and 14 587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. CONCLUSIONS: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival an experienced human programmer in finding patients eligible for clinical trials.


Asunto(s)
Procesamiento de Lenguaje Natural , Unified Medical Language System , Humanos , Bases del Conocimiento , Ensayos Clínicos como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA