Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Large language models for biomedicine: foundations, opportunities, challenges, and best practices.

Sahoo, Satya S; Plasek, Joseph M; Xu, Hua; Uzuner, Özlem; Cohen, Trevor; Yetisgen, Meliha; Liu, Hongfang; Meystre, Stéphane; Wang, Yanshan.

J Am Med Inform Assoc ; 2024 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-38657567

RESUMO

OBJECTIVES: Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks. Using a few examples (few-shot) or no examples (zero-shot) for prompt-tuning has enabled LLMs to achieve state-of-the-art performance in a broad range of NLP applications. This article by the American Medical Informatics Association (AMIA) NLP Working Group characterizes the opportunities, challenges, and best practices for our community to leverage and advance the integration of LLMs in downstream NLP applications effectively. This can be accomplished through a variety of approaches, including augmented prompting, instruction prompt tuning, and reinforcement learning from human feedback (RLHF). TARGET AUDIENCE: Our focus is on making LLMs accessible to the broader biomedical informatics community, including clinicians and researchers who may be unfamiliar with NLP. Additionally, NLP practitioners may gain insight from the described best practices. SCOPE: We focus on 3 broad categories of NLP tasks, namely natural language understanding, natural language inferencing, and natural language generation. We review the emerging trends in prompt tuning, instruction fine-tuning, and evaluation metrics used for LLMs while drawing attention to several issues that impact biomedical NLP applications, including falsehoods in generated text (confabulation/hallucinations), toxicity, and dataset contamination leading to overfitting. We also review potential approaches to address some of these current challenges in LLMs, such as chain of thought prompting, and the phenomena of emergent capabilities observed in LLMs that can be leveraged to address complex NLP challenge in biomedical applications.

2.

Clinical natural language processing for secondary uses.

Gao, Yanjun; Mahajan, Diwakar; Uzuner, Özlem; Yetisgen, Meliha.

J Biomed Inform ; 150: 104596, 2024 02.

Artigo em Inglês | MEDLINE | ID: mdl-38278312

Assuntos

Processamento de Linguagem Natural

3.

Improving model transferability for clinical note section classification models using continued pretraining.

Zhou, Weipeng; Yetisgen, Meliha; Afshar, Majid; Gao, Yanjun; Savova, Guergana; Miller, Timothy A.

J Am Med Inform Assoc ; 31(1): 89-97, 2023 12 22.

Artigo em Inglês | MEDLINE | ID: mdl-37725927

RESUMO

OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability. MATERIALS AND METHODS: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain-adaptive pretraining and task-adaptive pretraining. We added in-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. RESULTS: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across 3 datasets. This improvement was equivalent to adding 35 in-domain annotated samples. DISCUSSION: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. CONCLUSION: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.

Assuntos

Instalações de Saúde , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Redes Neurais de Computação , Tamanho da Amostra

4.

Prediction of American Society of Anesthesiologists Physical Status Classification from preoperative clinical text narratives using natural language processing.

Chung, Philip; Fong, Christine T; Walters, Andrew M; Yetisgen, Meliha; O'Reilly-Shah, Vikas N.

BMC Anesthesiol ; 23(1): 296, 2023 09 04.

Artigo em Inglês | MEDLINE | ID: mdl-37667258

RESUMO

BACKGROUND: Electronic health records (EHR) contain large volumes of unstructured free-form text notes that richly describe a patient's health and medical comorbidities. It is unclear if perioperative risk stratification can be performed directly from these notes without manual data extraction. We conduct a feasibility study using natural language processing (NLP) to predict the American Society of Anesthesiologists Physical Status Classification (ASA-PS) as a surrogate measure for perioperative risk. We explore prediction performance using four different model types and compare the use of different note sections versus the whole note. We use Shapley values to explain model predictions and analyze disagreement between model and human anesthesiologist predictions. METHODS: Single-center retrospective cohort analysis of EHR notes from patients undergoing procedures with anesthesia care spanning all procedural specialties during a 5 year period who were not assigned ASA VI and also had a preoperative evaluation note filed within 90 days prior to the procedure. NLP models were trained for each combination of 4 models and 8 text snippets from notes. Model performance was compared using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Shapley values were used to explain model predictions. Error analysis and model explanation using Shapley values was conducted for the best performing model. RESULTS: Final dataset includes 38,566 patients undergoing 61,503 procedures with anesthesia care. Prevalence of ASA-PS was 8.81% for ASA I, 31.4% for ASA II, 43.25% for ASA III, and 16.54% for ASA IV-V. The best performing models were the BioClinicalBERT model on the truncated note task (macro-average AUROC 0.845) and the fastText model on the full note task (macro-average AUROC 0.865). Shapley values reveal human-interpretable model predictions. Error analysis reveals that some original ASA-PS assignments may be incorrect and the model is making a reasonable prediction in these cases. CONCLUSIONS: Text classification models can accurately predict a patient's illness severity using only free-form text descriptions of patients without any manual data extraction. They can be an additional patient safety tool in the perioperative setting and reduce manual chart review for medical billing. Shapley feature attributions produce explanations that logically support model predictions and are understandable to clinicians.

Assuntos

Anestesia , Anestesiologistas , Humanos , Processamento de Linguagem Natural , Estudos Retrospectivos , Estados Unidos

5.

Aci-bench: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation.

Yim, Wen-Wai; Fu, Yujuan; Ben Abacha, Asma; Snider, Neal; Lin, Thomas; Yetisgen, Meliha.

Sci Data ; 10(1): 586, 2023 09 06.

Artigo em Inglês | MEDLINE | ID: mdl-37673893

RESUMO

Recent immense breakthroughs in generative models such as in GPT4 have precipitated re-imagined ubiquitous usage of these models in all applications. One area that can benefit by improvements in artificial intelligence (AI) is healthcare. The note generation task from doctor-patient encounters, and its associated electronic medical record documentation, is one of the most arduous time-consuming tasks for physicians. It is also a natural prime potential beneficiary to advances in generative models. However with such advances, benchmarking is more critical than ever. Whether studying model weaknesses or developing new evaluation metrics, shared open datasets are an imperative part of understanding the current state-of-the-art. Unfortunately as clinic encounter conversations are not routinely recorded and are difficult to ethically share due to patient confidentiality, there are no sufficiently large clinic dialogue-note datasets to benchmark this task. Here we present the Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, the largest dataset to date tackling the problem of AI-assisted note generation from visit dialogue. We also present the benchmark performances of several common state-of-the-art approaches.

Assuntos

Inteligência Artificial , Benchmarking , Instalações de Saúde , Humanos , Registros Eletrônicos de Saúde

6.

LeafAI: query generator for clinical cohort discovery rivaling a human programmer.

Dobbins, Nicholas J; Han, Bin; Zhou, Weipeng; Lan, Kristine F; Kim, H Nina; Harrington, Robert; Uzuner, Özlem; Yetisgen, Meliha.

J Am Med Inform Assoc ; 30(12): 1954-1964, 2023 11 17.

Artigo em Inglês | MEDLINE | ID: mdl-37550244

RESUMO

OBJECTIVE: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. MATERIALS AND METHODS: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. RESULTS: LeafAI matched a mean 43% of enrolled patients with 27â225 eligible across 8 clinical trials, compared to 27% matched and 14â587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. CONCLUSIONS: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival an experienced human programmer in finding patients eligible for clinical trials.

Assuntos

Processamento de Linguagem Natural , Unified Medical Language System , Humanos , Bases de Conhecimento , Ensaios Clínicos como Assunto

7.

Evaluating construct validity of computable acute respiratory distress syndrome definitions in adults hospitalized with COVID-19: an electronic health records based approach.

Sathe, Neha A; Xian, Su; Mabrey, F Linzee; Crosslin, David R; Mooney, Sean D; Morrell, Eric D; Lybarger, Kevin; Yetisgen, Meliha; Jarvik, Gail P; Bhatraju, Pavan K; Wurfel, Mark M.

BMC Pulm Med ; 23(1): 292, 2023 Aug 09.

Artigo em Inglês | MEDLINE | ID: mdl-37559024

RESUMO

BACKGROUND: Evolving ARDS epidemiology and management during COVID-19 have prompted calls to reexamine the construct validity of Berlin criteria, which have been rarely evaluated in real-world data. We developed a Berlin ARDS definition (EHR-Berlin) computable in electronic health records (EHR) to (1) assess its construct validity, and (2) assess how expanding its criteria affected validity. METHODS: We performed a retrospective cohort study at two tertiary care hospitals with one EHR, among adults hospitalized with COVID-19 February 2020-March 2021. We assessed five candidate definitions for ARDS: the EHR-Berlin definition modeled on Berlin criteria, and four alternatives informed by recent proposals to expand criteria and include patients on high-flow oxygen (EHR-Alternative 1), relax imaging criteria (EHR-Alternatives 2-3), and extend timing windows (EHR-Alternative 4). We evaluated two aspects of construct validity for the EHR-Berlin definition: (1) criterion validity: agreement with manual ARDS classification by experts, available in 175 patients; (2) predictive validity: relationships with hospital mortality, assessed by Pearson r and by area under the receiver operating curve (AUROC). We assessed predictive validity and timing of identification of EHR-Berlin definition compared to alternative definitions. RESULTS: Among 765 patients, mean (SD) age was 57 (18) years and 471 (62%) were male. The EHR-Berlin definition classified 171 (22%) patients as ARDS, which had high agreement with manual classification (kappa 0.85), and was associated with mortality (Pearson r = 0.39; AUROC 0.72, 95% CI 0.68, 0.77). In comparison, EHR-Alternative 1 classified 219 (29%) patients as ARDS, maintained similar relationships to mortality (r = 0.40; AUROC 0.74, 95% CI 0.70, 0.79, Delong test P = 0.14), and identified patients earlier in their hospitalization (median 13 vs. 15 h from admission, Wilcoxon signed-rank test P < 0.001). EHR-Alternative 3, which removed imaging criteria, had similar correlation (r = 0.41) but better discrimination for mortality (AUROC 0.76, 95% CI 0.72, 0.80; P = 0.036), and identified patients median 2 h (P < 0.001) from admission. CONCLUSIONS: The EHR-Berlin definition can enable ARDS identification with high criterion validity, supporting large-scale study and surveillance. There are opportunities to expand the Berlin criteria that preserve predictive validity and facilitate earlier identification.

Assuntos

COVID-19 , Síndrome do Desconforto Respiratório , Humanos , Masculino , Adulto , Pessoa de Meia-Idade , Feminino , Estudos Retrospectivos , Registros Eletrônicos de Saúde , COVID-19/diagnóstico , Síndrome do Desconforto Respiratório/diagnóstico , Medição de Risco

8.

Advancements in extracting social determinants of health information from narrative text.

Lybarger, Kevin; Bear Don't Walk, Oliver J; Yetisgen, Meliha; Uzuner, Özlem.

J Am Med Inform Assoc ; 30(8): 1363-1366, 2023 07 19.

Artigo em Inglês | MEDLINE | ID: mdl-37466054

Assuntos

Determinantes Sociais da Saúde , Envio de Mensagens de Texto , Narração

9.

Generalizing through Forgetting - Domain Generalization for Symptom Event Extraction in Clinical Notes.

Zhou, Sitong; Lybarger, Kevin; Yetisgen, Meliha; Ostendorf, Mari.

AMIA Jt Summits Transl Sci Proc ; 2023: 622-631, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37350923

RESUMO

Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.

10.

Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining.

Zhou, Weipeng; Yetisgen, Meliha; Afshar, Majid; Gao, Yanjun; Savova, Guergana; Miller, Timothy A.

medRxiv ; 2023 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-37162963

RESUMO

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective", "Object", "Assessment" and "Plan") framework with improved transferability. Materials and methods: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. Results: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples. Discussion: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. Conclusion: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.

11.

Leveraging natural language processing to augment structured social determinants of health data in the electronic health record.

Lybarger, Kevin; Dobbins, Nicholas J; Long, Ritche; Singh, Angad; Wedgeworth, Patrick; Uzuner, Özlem; Yetisgen, Meliha.

J Am Med Inform Assoc ; 30(8): 1389-1397, 2023 07 19.

Artigo em Inglês | MEDLINE | ID: mdl-37130345

RESUMO

OBJECTIVE: Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: (1) develop a natural language processing information extraction model to capture detailed SDOH information and (2) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data. MATERIALS AND METHODS: We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set with 225 089 patients and 430 406 notes with social history sections and compared the extracted SDOH information with existing structured data. RESULTS: The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative. CONCLUSIONS: Utilizing EHR data to identify SDOH health risk factors and social needs may improve patient care and outcomes. Semantic representations of text-encoded SDOH information can augment existing structured data, and this more comprehensive SDOH representation can assist health systems in identifying and addressing these social needs.

Assuntos

Registros Eletrônicos de Saúde , Determinantes Sociais da Saúde , Humanos , Processamento de Linguagem Natural , Fatores de Risco , Armazenamento e Recuperação da Informação

12.

Symptoms and signs of lung cancer prior to diagnosis: case-control study using electronic health records from ambulatory care within a large US-based tertiary care centre.

Prado, Maria G; Kessler, Larry G; Au, Margaret A; Burkhardt, Hannah A; Zigman Suchsland, Monica; Kowalski, Lesleigh; Stephens, Kari A; Yetisgen, Meliha; Walter, Fiona M; Neal, Richard D; Lybarger, Kevin; Thompson, Caroline A; Al Achkar, Morhaf; Sarma, Elizabeth A; Turner, Grace; Farjah, Farhood; Thompson, Matthew J.

BMJ Open ; 13(4): e068832, 2023 04 20.

Artigo em Inglês | MEDLINE | ID: mdl-37080616

RESUMO

OBJECTIVE: Lung cancer is the most common cause of cancer-related death in the USA. While most patients are diagnosed following symptomatic presentation, no studies have compared symptoms and physical examination signs at or prior to diagnosis from electronic health records (EHRs) in the USA. We aimed to identify symptoms and signs in patients prior to diagnosis in EHR data. DESIGN: Case-control study. SETTING: Ambulatory care clinics at a large tertiary care academic health centre in the USA. PARTICIPANTS, OUTCOMES: We studied 698 primary lung cancer cases in adults diagnosed between 1 January 2012 and 31 December 2019, and 6841 controls matched by age, sex, smoking status and type of clinic. Coded and free-text data from the EHR were extracted from 2 years prior to diagnosis date for cases and index date for controls. Univariate and multivariable conditional logistic regression were used to identify symptoms and signs associated with lung cancer at time of diagnosis, and 1, 3, 6 and 12 months before the diagnosis/index dates. RESULTS: Eleven symptoms and signs recorded during the study period were associated with a significantly higher chance of being a lung cancer case in multivariable analyses. Of these, seven were significantly associated with lung cancer 6 months prior to diagnosis: haemoptysis (OR 3.2, 95% CI 1.9 to 5.3), cough (OR 3.1, 95% CI 2.4 to 4.0), chest crackles or wheeze (OR 3.1, 95% CI 2.3 to 4.1), bone pain (OR 2.7, 95% CI 2.1 to 3.6), back pain (OR 2.5, 95% CI 1.9 to 3.2), weight loss (OR 2.1, 95% CI 1.5 to 2.8) and fatigue (OR 1.6, 95% CI 1.3 to 2.1). CONCLUSIONS: Patients diagnosed with lung cancer appear to have symptoms and signs recorded in the EHR that distinguish them from similar matched patients in ambulatory care, often 6 months or more before diagnosis. These findings suggest opportunities to improve the diagnostic process for lung cancer.

Assuntos

Registros Eletrônicos de Saúde , Neoplasias Pulmonares , Adulto , Humanos , Estudos de Casos e Controles , Centros de Atenção Terciária , Neoplasias Pulmonares/diagnóstico , Assistência Ambulatorial

13.

The 2022 n2c2/UW shared task on extracting social determinants of health.

Lybarger, Kevin; Yetisgen, Meliha; Uzuner, Özlem.

J Am Med Inform Assoc ; 30(8): 1367-1378, 2023 07 19.

Artigo em Inglês | MEDLINE | ID: mdl-36795066

RESUMO

OBJECTIVE: The n2c2/UW SDOH Challenge explores the extraction of social determinant of health (SDOH) information from clinical notes. The objectives include the advancement of natural language processing (NLP) information extraction techniques for SDOH and clinical information more broadly. This article presents the shared task, data, participating teams, performance results, and considerations for future work. MATERIALS AND METHODS: The task used the Social History Annotated Corpus (SHAC), which consists of clinical text with detailed event-based annotations for SDOH events, such as alcohol, drug, tobacco, employment, and living situation. Each SDOH event is characterized through attributes related to status, extent, and temporality. The task includes 3 subtasks related to information extraction (Subtask A), generalizability (Subtask B), and learning transfer (Subtask C). In addressing this task, participants utilized a range of techniques, including rules, knowledge bases, n-grams, word embeddings, and pretrained language models (LM). RESULTS: A total of 15 teams participated, and the top teams utilized pretrained deep learning LM. The top team across all subtasks used a sequence-to-sequence approach achieving 0.901 F1 for Subtask A, 0.774 F1 Subtask B, and 0.889 F1 for Subtask C. CONCLUSIONS: Similar to many NLP tasks and domains, pretrained LM yielded the best performance, including generalizability and learning transfer. An error analysis indicates extraction performance varies by SDOH, with lower performance achieved for conditions, like substance use and homelessness, which increase health risks (risk factors) and higher performance achieved for conditions, like substance abstinence and living with family, which reduce health risks (protective factors).

Assuntos

Processamento de Linguagem Natural , Determinantes Sociais da Saúde , Humanos , Armazenamento e Recuperação da Informação , Registros Eletrônicos de Saúde

14.

Extracting medication changes in clinical narratives using pre-trained language models.

Ramachandran, Giridhar Kaushik; Lybarger, Kevin; Liu, Yaya; Mahajan, Diwakar; Liang, Jennifer J; Tsou, Ching-Huei; Yetisgen, Meliha; Uzuner, Özlem.

J Biomed Inform ; 139: 104302, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36754129

RESUMO

An accurate and detailed account of patient medications, including medication changes within the patient timeline, is essential for healthcare providers to provide appropriate patient care. Healthcare providers or the patients themselves may initiate changes to patient medication. Medication changes take many forms, including prescribed medication and associated dosage modification. These changes provide information about the overall health of the patient and the rationale that led to the current care. Future care can then build on the resulting state of the patient. This work explores the automatic extraction of medication change information from free-text clinical notes. The Contextual Medication Event Dataset (CMED) is a corpus of clinical notes with annotations that characterize medication changes through multiple change-related attributes, including the type of change (start, stop, increase, etc.), initiator of the change, temporality, change likelihood, and negation. Using CMED, we identify medication mentions in clinical text and propose three novel high-performing BERT-based systems that resolve the annotated medication change characteristics. We demonstrate that our proposed systems improve medication change classification performance over the initial work exploring CMED.

Assuntos

Idioma , Processamento de Linguagem Natural , Humanos , Narração

15.

Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model.

Lau, Wilson; Lybarger, Kevin; Gunn, Martin L; Yetisgen, Meliha.

J Digit Imaging ; 36(1): 91-104, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36253581

RESUMO

Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging ("lesions") and other types of clinical problems ("medical problems"). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, and count. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9-93.4% F1 for finding triggers and 72.0-85.6% F1 for argument roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6% for finding triggers and 79.1-89.7% for argument roles, demonstrating that the model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.

Assuntos

Radiologia , Humanos , Tomografia Computadorizada por Raios X , Semântica , Relatório de Pesquisa , Processamento de Linguagem Natural

16.

Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes.

Ding, Xiruo; Sheng, Zhecheng; Yetisgen, Meliha; Pakhomov, Serguei; Cohen, Trevor.

AMIA Annu Symp Proc ; 2023: 923-932, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38222433

RESUMO

Natural Language Processing (NLP) methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and integration across different institutions for accurate and portable models. However, this can introduce a form of bias called confounding by provenance. When source-specific data distributions differ at deployment, this may harm model performance. To address this issue, we evaluate the utility of backdoor adjustment for text classification in a multi-site dataset of clinical notes annotated for mentions of substance abuse. Using an evaluation framework devised to measure robustness to distributional shifts, we assess the utility of backdoor adjustment. Our results indicate that backdoor adjustment can effectively mitigate for confounding shift.

Assuntos

Registros Eletrônicos de Saúde , Transtornos Relacionados ao Uso de Substâncias , Humanos , Coleta de Dados , Aprendizado de Máquina , Processamento de Linguagem Natural , Estudos Multicêntricos como Assunto

17.

How Timely Is Diagnosis of Lung Cancer? Cohort Study of Individuals with Lung Cancer Presenting in Ambulatory Care in the United States.

Zigman Suchsland, Monica; Kowalski, Lesleigh; Burkhardt, Hannah A; Prado, Maria G; Kessler, Larry G; Yetisgen, Meliha; Au, Maggie A; Stephens, Kari A; Farjah, Farhood; Schleyer, Anneliese M; Walter, Fiona M; Neal, Richard D; Lybarger, Kevin; Thompson, Caroline A; Achkar, Morhaf Al; Sarma, Elizabeth A; Turner, Grace; Thompson, Matthew.

Cancers (Basel) ; 14(23)2022 Nov 23.

Artigo em Inglês | MEDLINE | ID: mdl-36497238

RESUMO

The diagnosis of lung cancer in ambulatory settings is often challenging due to non-specific clinical presentation, but there are currently no clinical quality measures (CQMs) in the United States used to identify areas for practice improvement in diagnosis. We describe the pre-diagnostic time intervals among a retrospective cohort of 711 patients identified with primary lung cancer from 2012-2019 from ambulatory care clinics in Seattle, Washington USA. Electronic health record data were extracted for two years prior to diagnosis, and Natural Language Processing (NLP) applied to identify symptoms/signs from free text clinical fields. Time points were defined for initial symptomatic presentation, chest imaging, specialist consultation, diagnostic confirmation, and treatment initiation. Median and interquartile ranges (IQR) were calculated for intervals spanning these time points. The mean age of the cohort was 67.3 years, 54.1% had Stage III or IV disease and the majority were diagnosed after clinical presentation (94.5%) rather than screening (5.5%). Median intervals from first recorded symptoms/signs to diagnosis was 570 days (IQR 273-691), from chest CT or chest X-ray imaging to diagnosis 43 days (IQR 11-240), specialist consultation to diagnosis 72 days (IQR 13-456), and from diagnosis to treatment initiation 7 days (IQR 0-36). Symptoms/signs associated with lung cancer can be identified over a year prior to diagnosis using NLP, highlighting the need for CQMs to improve timeliness of diagnosis.

18.

The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria.

Dobbins, Nicholas J; Mullen, Tony; Uzuner, Özlem; Yetisgen, Meliha.

Sci Data ; 9(1): 490, 2022 08 11.

Artigo em Inglês | MEDLINE | ID: mdl-35953524

RESUMO

Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work.

Assuntos

Ensaios Clínicos como Assunto , Processamento de Linguagem Natural , Seleção de Pacientes , Ensaios Clínicos como Assunto/normas , Bases de Dados Factuais , Humanos , Armazenamento e Recuperação da Informação

19.

Call for papers: Special issue on clinical natural language processing for secondary use applications.

Yetisgen, Meliha; Uzuner, Ozlem; Gao, Yanjun; Mahajan, Diwakar.

J Biomed Inform ; 133: 104152, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35985622

Assuntos

Processamento de Linguagem Natural

20.

Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model.

Lybarger, Kevin; Damani, Aashka; Gunn, Martin; Uzuner, O Zlem; Yetisgen, Meliha.

AMIA Jt Summits Transl Sci Proc ; 2022: 339-348, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35854739

RESUMO

Medical imaging is critical to the diagnosis and treatment of numerous medical problems, including many forms of cancer. Medical imaging reports distill the findings and observations of radiologists, creating an unstructured textual representation of unstructured medical images. Large-scale use of this text-encoded information requires converting the unstructured text to a structured, semantic representation. We explore the extraction and normalization of anatomical information in radiology reports that is associated with radiological findings. We investigate this extraction and normalization task using a span-based relation extraction model that jointly extracts entities and relations using BERT. This work examines the factors that influence extraction and normalization performance, including the body part/organ system, frequency of occurrence, span length, and span diversity. It discusses approaches for improving performance and creating high-quality semantic representations of radiological phenomena.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA