Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-39225779

RESUMEN

OBJECTIVE: Clinical notes contain unstructured representations of patient histories, including the relationships between medical problems and prescription drugs. To investigate the relationship between cancer drugs and their associated symptom burden, we extract structured, semantic representations of medical problem and drug information from the clinical narratives of oncology notes. MATERIALS AND METHODS: We present Clinical concept Annotations for Cancer Events and Relations (CACER), a novel corpus with fine-grained annotations for over 48 000 medical problems and drug events and 10 000 drug-problem and problem-problem relations. Leveraging CACER, we develop and evaluate transformer-based information extraction models such as Bidirectional Encoder Representations from Transformers (BERT), Fine-tuned Language Net Text-To-Text Transfer Transformer (Flan-T5), Large Language Model Meta AI (Llama3), and Generative Pre-trained Transformers-4 (GPT-4) using fine-tuning and in-context learning (ICL). RESULTS: In event extraction, the fine-tuned BERT and Llama3 models achieved the highest performance at 88.2-88.0 F1, which is comparable to the inter-annotator agreement (IAA) of 88.4 F1. In relation extraction, the fine-tuned BERT, Flan-T5, and Llama3 achieved the highest performance at 61.8-65.3 F1. GPT-4 with ICL achieved the worst performance across both tasks. DISCUSSION: The fine-tuned models significantly outperformed GPT-4 in ICL, highlighting the importance of annotated training data and model optimization. Furthermore, the BERT models performed similarly to Llama3. For our task, large language models offer no performance advantage over the smaller BERT models. CONCLUSIONS: We introduce CACER, a novel corpus with fine-grained annotations for medical problems, drugs, and their relationships in clinical narratives of oncology notes. State-of-the-art transformer models achieved performance comparable to IAA for several extraction tasks.

2.
JAMA Surg ; 159(8): 928-937, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-38837145

RESUMEN

Importance: General-domain large language models may be able to perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient's electronic health record notes. Objective: To examine predictive performance on 8 different tasks: prediction of American Society of Anesthesiologists Physical Status (ASA-PS), hospital admission, intensive care unit (ICU) admission, unplanned admission, hospital mortality, postanesthesia care unit (PACU) phase 1 duration, hospital duration, and ICU duration. Design, Setting, and Participants: This prognostic study included task-specific datasets constructed from 2 years of retrospective electronic health records data collected during routine clinical care. Case and note data were formatted into prompts and given to the large language model GPT-4 Turbo (OpenAI) to generate a prediction and explanation. The setting included a quaternary care center comprising 3 academic hospitals and affiliated clinics in a single metropolitan area. Patients who had a surgery or procedure with anesthesia and at least 1 clinician-written note filed in the electronic health record before surgery were included in the study. Data were analyzed from November to December 2023. Exposures: Compared original notes, note summaries, few-shot prompting, and chain-of-thought prompting strategies. Main Outcomes and Measures: F1 score for binary and categorical outcomes. Mean absolute error for numerical duration outcomes. Results: Study results were measured on task-specific datasets, each with 1000 cases with the exception of unplanned admission, which had 949 cases, and hospital mortality, which had 576 cases. The best results for each task included an F1 score of 0.50 (95% CI, 0.47-0.53) for ASA-PS, 0.64 (95% CI, 0.61-0.67) for hospital admission, 0.81 (95% CI, 0.78-0.83) for ICU admission, 0.61 (95% CI, 0.58-0.64) for unplanned admission, and 0.86 (95% CI, 0.83-0.89) for hospital mortality prediction. Performance on duration prediction tasks was universally poor across all prompt strategies for which the large language model achieved a mean absolute error of 49 minutes (95% CI, 46-51 minutes) for PACU phase 1 duration, 4.5 days (95% CI, 4.2-5.0 days) for hospital duration, and 1.1 days (95% CI, 0.9-1.3 days) for ICU duration prediction. Conclusions and Relevance: Current general-domain large language models may assist clinicians in perioperative risk stratification on classification tasks but are inadequate for numerical duration predictions. Their ability to produce high-quality natural language explanations for the predictions may make them useful tools in clinical workflows and may be complementary to traditional risk prediction models.


Asunto(s)
Registros Electrónicos de Salud , Humanos , Medición de Riesgo/métodos , Estudios Retrospectivos , Pronóstico , Masculino , Femenino , Persona de Mediana Edad , Mortalidad Hospitalaria , Anciano , Unidades de Cuidados Intensivos
3.
J Am Med Inform Assoc ; 31(9): 2114-2124, 2024 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-38657567

RESUMEN

OBJECTIVES: Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks. Using a few examples (few-shot) or no examples (zero-shot) for prompt-tuning has enabled LLMs to achieve state-of-the-art performance in a broad range of NLP applications. This article by the American Medical Informatics Association (AMIA) NLP Working Group characterizes the opportunities, challenges, and best practices for our community to leverage and advance the integration of LLMs in downstream NLP applications effectively. This can be accomplished through a variety of approaches, including augmented prompting, instruction prompt tuning, and reinforcement learning from human feedback (RLHF). TARGET AUDIENCE: Our focus is on making LLMs accessible to the broader biomedical informatics community, including clinicians and researchers who may be unfamiliar with NLP. Additionally, NLP practitioners may gain insight from the described best practices. SCOPE: We focus on 3 broad categories of NLP tasks, namely natural language understanding, natural language inferencing, and natural language generation. We review the emerging trends in prompt tuning, instruction fine-tuning, and evaluation metrics used for LLMs while drawing attention to several issues that impact biomedical NLP applications, including falsehoods in generated text (confabulation/hallucinations), toxicity, and dataset contamination leading to overfitting. We also review potential approaches to address some of these current challenges in LLMs, such as chain of thought prompting, and the phenomena of emergent capabilities observed in LLMs that can be leveraged to address complex NLP challenge in biomedical applications.


Asunto(s)
Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Informática Médica , Humanos
5.
BMC Anesthesiol ; 23(1): 296, 2023 09 04.
Artículo en Inglés | MEDLINE | ID: mdl-37667258

RESUMEN

BACKGROUND: Electronic health records (EHR) contain large volumes of unstructured free-form text notes that richly describe a patient's health and medical comorbidities. It is unclear if perioperative risk stratification can be performed directly from these notes without manual data extraction. We conduct a feasibility study using natural language processing (NLP) to predict the American Society of Anesthesiologists Physical Status Classification (ASA-PS) as a surrogate measure for perioperative risk. We explore prediction performance using four different model types and compare the use of different note sections versus the whole note. We use Shapley values to explain model predictions and analyze disagreement between model and human anesthesiologist predictions. METHODS: Single-center retrospective cohort analysis of EHR notes from patients undergoing procedures with anesthesia care spanning all procedural specialties during a 5 year period who were not assigned ASA VI and also had a preoperative evaluation note filed within 90 days prior to the procedure. NLP models were trained for each combination of 4 models and 8 text snippets from notes. Model performance was compared using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Shapley values were used to explain model predictions. Error analysis and model explanation using Shapley values was conducted for the best performing model. RESULTS: Final dataset includes 38,566 patients undergoing 61,503 procedures with anesthesia care. Prevalence of ASA-PS was 8.81% for ASA I, 31.4% for ASA II, 43.25% for ASA III, and 16.54% for ASA IV-V. The best performing models were the BioClinicalBERT model on the truncated note task (macro-average AUROC 0.845) and the fastText model on the full note task (macro-average AUROC 0.865). Shapley values reveal human-interpretable model predictions. Error analysis reveals that some original ASA-PS assignments may be incorrect and the model is making a reasonable prediction in these cases. CONCLUSIONS: Text classification models can accurately predict a patient's illness severity using only free-form text descriptions of patients without any manual data extraction. They can be an additional patient safety tool in the perioperative setting and reduce manual chart review for medical billing. Shapley feature attributions produce explanations that logically support model predictions and are understandable to clinicians.


Asunto(s)
Anestesia , Anestesiólogos , Humanos , Procesamiento de Lenguaje Natural , Estudios Retrospectivos , Estados Unidos
6.
Sci Data ; 10(1): 586, 2023 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-37673893

RESUMEN

Recent immense breakthroughs in generative models such as in GPT4 have precipitated re-imagined ubiquitous usage of these models in all applications. One area that can benefit by improvements in artificial intelligence (AI) is healthcare. The note generation task from doctor-patient encounters, and its associated electronic medical record documentation, is one of the most arduous time-consuming tasks for physicians. It is also a natural prime potential beneficiary to advances in generative models. However with such advances, benchmarking is more critical than ever. Whether studying model weaknesses or developing new evaluation metrics, shared open datasets are an imperative part of understanding the current state-of-the-art. Unfortunately as clinic encounter conversations are not routinely recorded and are difficult to ethically share due to patient confidentiality, there are no sufficiently large clinic dialogue-note datasets to benchmark this task. Here we present the Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, the largest dataset to date tackling the problem of AI-assisted note generation from visit dialogue. We also present the benchmark performances of several common state-of-the-art approaches.


Asunto(s)
Inteligencia Artificial , Benchmarking , Instituciones de Salud , Humanos , Registros Electrónicos de Salud
7.
J Am Med Inform Assoc ; 31(1): 89-97, 2023 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-37725927

RESUMEN

OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability. MATERIALS AND METHODS: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain-adaptive pretraining and task-adaptive pretraining. We added in-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. RESULTS: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across 3 datasets. This improvement was equivalent to adding 35 in-domain annotated samples. DISCUSSION: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. CONCLUSION: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.


Asunto(s)
Instituciones de Salud , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Tamaño de la Muestra
8.
BMC Pulm Med ; 23(1): 292, 2023 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-37559024

RESUMEN

BACKGROUND: Evolving ARDS epidemiology and management during COVID-19 have prompted calls to reexamine the construct validity of Berlin criteria, which have been rarely evaluated in real-world data. We developed a Berlin ARDS definition (EHR-Berlin) computable in electronic health records (EHR) to (1) assess its construct validity, and (2) assess how expanding its criteria affected validity. METHODS: We performed a retrospective cohort study at two tertiary care hospitals with one EHR, among adults hospitalized with COVID-19 February 2020-March 2021. We assessed five candidate definitions for ARDS: the EHR-Berlin definition modeled on Berlin criteria, and four alternatives informed by recent proposals to expand criteria and include patients on high-flow oxygen (EHR-Alternative 1), relax imaging criteria (EHR-Alternatives 2-3), and extend timing windows (EHR-Alternative 4). We evaluated two aspects of construct validity for the EHR-Berlin definition: (1) criterion validity: agreement with manual ARDS classification by experts, available in 175 patients; (2) predictive validity: relationships with hospital mortality, assessed by Pearson r and by area under the receiver operating curve (AUROC). We assessed predictive validity and timing of identification of EHR-Berlin definition compared to alternative definitions. RESULTS: Among 765 patients, mean (SD) age was 57 (18) years and 471 (62%) were male. The EHR-Berlin definition classified 171 (22%) patients as ARDS, which had high agreement with manual classification (kappa 0.85), and was associated with mortality (Pearson r = 0.39; AUROC 0.72, 95% CI 0.68, 0.77). In comparison, EHR-Alternative 1 classified 219 (29%) patients as ARDS, maintained similar relationships to mortality (r = 0.40; AUROC 0.74, 95% CI 0.70, 0.79, Delong test P = 0.14), and identified patients earlier in their hospitalization (median 13 vs. 15 h from admission, Wilcoxon signed-rank test P < 0.001). EHR-Alternative 3, which removed imaging criteria, had similar correlation (r = 0.41) but better discrimination for mortality (AUROC 0.76, 95% CI 0.72, 0.80; P = 0.036), and identified patients median 2 h (P < 0.001) from admission. CONCLUSIONS: The EHR-Berlin definition can enable ARDS identification with high criterion validity, supporting large-scale study and surveillance. There are opportunities to expand the Berlin criteria that preserve predictive validity and facilitate earlier identification.


Asunto(s)
COVID-19 , Síndrome de Dificultad Respiratoria , Humanos , Masculino , Adulto , Persona de Mediana Edad , Femenino , Estudios Retrospectivos , Registros Electrónicos de Salud , COVID-19/diagnóstico , Síndrome de Dificultad Respiratoria/diagnóstico , Medición de Riesgo
9.
J Am Med Inform Assoc ; 30(12): 1954-1964, 2023 11 17.
Artículo en Inglés | MEDLINE | ID: mdl-37550244

RESUMEN

OBJECTIVE: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. MATERIALS AND METHODS: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. RESULTS: LeafAI matched a mean 43% of enrolled patients with 27 225 eligible across 8 clinical trials, compared to 27% matched and 14 587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. CONCLUSIONS: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival an experienced human programmer in finding patients eligible for clinical trials.


Asunto(s)
Procesamiento de Lenguaje Natural , Unified Medical Language System , Humanos , Bases del Conocimiento , Ensayos Clínicos como Asunto
11.
AMIA Jt Summits Transl Sci Proc ; 2023: 622-631, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37350923

RESUMEN

Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.

12.
medRxiv ; 2023 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-37162963

RESUMEN

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective", "Object", "Assessment" and "Plan") framework with improved transferability. Materials and methods: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. Results: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples. Discussion: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. Conclusion: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.

13.
J Am Med Inform Assoc ; 30(8): 1389-1397, 2023 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-37130345

RESUMEN

OBJECTIVE: Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: (1) develop a natural language processing information extraction model to capture detailed SDOH information and (2) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data. MATERIALS AND METHODS: We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set with 225 089 patients and 430 406 notes with social history sections and compared the extracted SDOH information with existing structured data. RESULTS: The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative. CONCLUSIONS: Utilizing EHR data to identify SDOH health risk factors and social needs may improve patient care and outcomes. Semantic representations of text-encoded SDOH information can augment existing structured data, and this more comprehensive SDOH representation can assist health systems in identifying and addressing these social needs.


Asunto(s)
Registros Electrónicos de Salud , Determinantes Sociales de la Salud , Humanos , Procesamiento de Lenguaje Natural , Factores de Riesgo , Almacenamiento y Recuperación de la Información
14.
BMJ Open ; 13(4): e068832, 2023 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-37080616

RESUMEN

OBJECTIVE: Lung cancer is the most common cause of cancer-related death in the USA. While most patients are diagnosed following symptomatic presentation, no studies have compared symptoms and physical examination signs at or prior to diagnosis from electronic health records (EHRs) in the USA. We aimed to identify symptoms and signs in patients prior to diagnosis in EHR data. DESIGN: Case-control study. SETTING: Ambulatory care clinics at a large tertiary care academic health centre in the USA. PARTICIPANTS, OUTCOMES: We studied 698 primary lung cancer cases in adults diagnosed between 1 January 2012 and 31 December 2019, and 6841 controls matched by age, sex, smoking status and type of clinic. Coded and free-text data from the EHR were extracted from 2 years prior to diagnosis date for cases and index date for controls. Univariate and multivariable conditional logistic regression were used to identify symptoms and signs associated with lung cancer at time of diagnosis, and 1, 3, 6 and 12 months before the diagnosis/index dates. RESULTS: Eleven symptoms and signs recorded during the study period were associated with a significantly higher chance of being a lung cancer case in multivariable analyses. Of these, seven were significantly associated with lung cancer 6 months prior to diagnosis: haemoptysis (OR 3.2, 95% CI 1.9 to 5.3), cough (OR 3.1, 95% CI 2.4 to 4.0), chest crackles or wheeze (OR 3.1, 95% CI 2.3 to 4.1), bone pain (OR 2.7, 95% CI 2.1 to 3.6), back pain (OR 2.5, 95% CI 1.9 to 3.2), weight loss (OR 2.1, 95% CI 1.5 to 2.8) and fatigue (OR 1.6, 95% CI 1.3 to 2.1). CONCLUSIONS: Patients diagnosed with lung cancer appear to have symptoms and signs recorded in the EHR that distinguish them from similar matched patients in ambulatory care, often 6 months or more before diagnosis. These findings suggest opportunities to improve the diagnostic process for lung cancer.


Asunto(s)
Registros Electrónicos de Salud , Neoplasias Pulmonares , Adulto , Humanos , Estudios de Casos y Controles , Centros de Atención Terciaria , Neoplasias Pulmonares/diagnóstico , Atención Ambulatoria
15.
J Biomed Inform ; 139: 104302, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36754129

RESUMEN

An accurate and detailed account of patient medications, including medication changes within the patient timeline, is essential for healthcare providers to provide appropriate patient care. Healthcare providers or the patients themselves may initiate changes to patient medication. Medication changes take many forms, including prescribed medication and associated dosage modification. These changes provide information about the overall health of the patient and the rationale that led to the current care. Future care can then build on the resulting state of the patient. This work explores the automatic extraction of medication change information from free-text clinical notes. The Contextual Medication Event Dataset (CMED) is a corpus of clinical notes with annotations that characterize medication changes through multiple change-related attributes, including the type of change (start, stop, increase, etc.), initiator of the change, temporality, change likelihood, and negation. Using CMED, we identify medication mentions in clinical text and propose three novel high-performing BERT-based systems that resolve the annotated medication change characteristics. We demonstrate that our proposed systems improve medication change classification performance over the initial work exploring CMED.


Asunto(s)
Lenguaje , Procesamiento de Lenguaje Natural , Humanos , Narración
16.
J Am Med Inform Assoc ; 30(8): 1367-1378, 2023 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-36795066

RESUMEN

OBJECTIVE: The n2c2/UW SDOH Challenge explores the extraction of social determinant of health (SDOH) information from clinical notes. The objectives include the advancement of natural language processing (NLP) information extraction techniques for SDOH and clinical information more broadly. This article presents the shared task, data, participating teams, performance results, and considerations for future work. MATERIALS AND METHODS: The task used the Social History Annotated Corpus (SHAC), which consists of clinical text with detailed event-based annotations for SDOH events, such as alcohol, drug, tobacco, employment, and living situation. Each SDOH event is characterized through attributes related to status, extent, and temporality. The task includes 3 subtasks related to information extraction (Subtask A), generalizability (Subtask B), and learning transfer (Subtask C). In addressing this task, participants utilized a range of techniques, including rules, knowledge bases, n-grams, word embeddings, and pretrained language models (LM). RESULTS: A total of 15 teams participated, and the top teams utilized pretrained deep learning LM. The top team across all subtasks used a sequence-to-sequence approach achieving 0.901 F1 for Subtask A, 0.774 F1 Subtask B, and 0.889 F1 for Subtask C. CONCLUSIONS: Similar to many NLP tasks and domains, pretrained LM yielded the best performance, including generalizability and learning transfer. An error analysis indicates extraction performance varies by SDOH, with lower performance achieved for conditions, like substance use and homelessness, which increase health risks (risk factors) and higher performance achieved for conditions, like substance abstinence and living with family, which reduce health risks (protective factors).


Asunto(s)
Procesamiento de Lenguaje Natural , Determinantes Sociales de la Salud , Humanos , Almacenamiento y Recuperación de la Información , Registros Electrónicos de Salud
17.
J Digit Imaging ; 36(1): 91-104, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36253581

RESUMEN

Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging ("lesions") and other types of clinical problems ("medical problems"). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, and count. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9-93.4% F1 for finding triggers and 72.0-85.6% F1 for argument roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6% for finding triggers and 79.1-89.7% for argument roles, demonstrating that the model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.


Asunto(s)
Radiología , Humanos , Tomografía Computarizada por Rayos X , Semántica , Informe de Investigación , Procesamiento de Lenguaje Natural
18.
AMIA Annu Symp Proc ; 2023: 923-932, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38222433

RESUMEN

Natural Language Processing (NLP) methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and integration across different institutions for accurate and portable models. However, this can introduce a form of bias called confounding by provenance. When source-specific data distributions differ at deployment, this may harm model performance. To address this issue, we evaluate the utility of backdoor adjustment for text classification in a multi-site dataset of clinical notes annotated for mentions of substance abuse. Using an evaluation framework devised to measure robustness to distributional shifts, we assess the utility of backdoor adjustment. Our results indicate that backdoor adjustment can effectively mitigate for confounding shift.


Asunto(s)
Registros Electrónicos de Salud , Trastornos Relacionados con Sustancias , Humanos , Recolección de Datos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Estudios Multicéntricos como Asunto
19.
Cancers (Basel) ; 14(23)2022 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-36497238

RESUMEN

The diagnosis of lung cancer in ambulatory settings is often challenging due to non-specific clinical presentation, but there are currently no clinical quality measures (CQMs) in the United States used to identify areas for practice improvement in diagnosis. We describe the pre-diagnostic time intervals among a retrospective cohort of 711 patients identified with primary lung cancer from 2012-2019 from ambulatory care clinics in Seattle, Washington USA. Electronic health record data were extracted for two years prior to diagnosis, and Natural Language Processing (NLP) applied to identify symptoms/signs from free text clinical fields. Time points were defined for initial symptomatic presentation, chest imaging, specialist consultation, diagnostic confirmation, and treatment initiation. Median and interquartile ranges (IQR) were calculated for intervals spanning these time points. The mean age of the cohort was 67.3 years, 54.1% had Stage III or IV disease and the majority were diagnosed after clinical presentation (94.5%) rather than screening (5.5%). Median intervals from first recorded symptoms/signs to diagnosis was 570 days (IQR 273-691), from chest CT or chest X-ray imaging to diagnosis 43 days (IQR 11-240), specialist consultation to diagnosis 72 days (IQR 13-456), and from diagnosis to treatment initiation 7 days (IQR 0-36). Symptoms/signs associated with lung cancer can be identified over a year prior to diagnosis using NLP, highlighting the need for CQMs to improve timeliness of diagnosis.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA