Rechercher | Portail Régional BVS

1.

Family history as the strongest predictor of aortic and peripheral aneurysms in patients with intracranial aneurysms.

Lai, Pui Man Rosalind; Akama-Garren, Elliot; Can, Anil; Tirado, Selena-Rae; Castro, Victor M; Dligach, Dmitriy; Finan, Sean; Gainer, Vivian S; Shadick, Nancy A; Savova, Guergana; Murphy, Shawn N; Cai, Tianxi; Weiss, Scott T; Du, Rose.

J Clin Neurosci ; 126: 128-134, 2024 Jun 12.

Article de Anglais | MEDLINE | ID: mdl-38870642

RÉSUMÉ

OBJECTIVE: Intracranial aneurysms (IA) and aortic aneurysms (AA) are both abnormal dilations of arteries with familial predisposition and have been proposed to share co-prevalence and pathophysiology. Associations of IA and non-aortic peripheral aneurysms are less well-studied. The goal of the study was to understand the patterns of aortic and peripheral (extracranial) aneurysms in patients with IA, and risk factors associated with the development of these aneurysms. METHODS: 4701 patients were included in our retrospective analysis of all patients with intracranial aneurysms at our institution over the past 26 years. Patient demographics, comorbidities, and aneurysmal locations were analyzed. Univariate and multivariate analyses were performed to study associations with and without extracranial aneurysms. RESULTS: A total of 3.4% of patients (161 of 4701) with IA had at least one extracranial aneurysm. 2.8% had thoracic or abdominal aortic aneurysms. Age, male sex, hypertension, coronary artery disease, history of ischemic cerebral infarction, connective tissues disease, and family history of extracranial aneurysms in a 1st degree relative were associated with the presence of extracranial aneurysms and a higher number of extracranial aneurysms. In addition, family history of extracranial aneurysms in a second degree relative is associated with the presence of extracranial aneurysms and atrial fibrillation is associated with a higher number of extracranial aneurysms. CONCLUSION: Significant comorbidities are associated with extracranial aneurysms in patients with IA. Family history of extracranial aneurysms has the strongest association and suggests that IA patients with a family history of extracranial aneurysms may benefit from screening.

2.

LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models.

Yoon, WonJin; Chen, Shan; Gao, Yanjun; Zhao, Zhanzhan; Dligach, Dmitriy; Bitterman, Danielle S; Afshar, Majid; Miller, Timothy.

medRxiv ; 2024 Jul 02.

Article de Anglais | MEDLINE | ID: mdl-38585973

RÉSUMÉ

Objective: The application of Natural Language Processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models (LMs) capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent. Materials and Methods: To address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to instruction-tuned large language models. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations. Results and Discussion: Baseline models showed 28.9% for best-performing supervised models and 32.2% for GPT-4 in F1-metrics. Notes in our dataset have a median word count of 1687. Our analysis of the model outputs showed that our dataset is challenging for both models and human experts, but the models can find meaningful signals from the text. Conclusion: We expect our LCD benchmark to be a resource for the development of advanced supervised models, or prompting methods, tailored for clinical text. The benchmark dataset is available at https://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc.

3.

Development of a Human Evaluation Framework and Correlation with Automated Metrics for Natural Language Generation of Medical Diagnoses.

Croxford, Emma; Gao, Yanjun; Patterson, Brian; To, Daniel; Tesch, Samuel; Dligach, Dmitriy; Mayampurath, Anoop; Churpek, Matthew M; Afshar, Majid.

medRxiv ; 2024 Apr 09.

Article de Anglais | MEDLINE | ID: mdl-38562730

RÉSUMÉ

In the evolving landscape of clinical Natural Language Generation (NLG), assessing abstractive text quality remains challenging, as existing methods often overlook generative task complexities. This work aimed to examine the current state of automated evaluation metrics in NLG in healthcare. To have a robust and well-validated baseline with which to examine the alignment of these metrics, we created a comprehensive human evaluation framework. Employing ChatGPT-3.5-turbo generative output, we correlated human judgments with each metric. None of the metrics demonstrated high alignment; however, the SapBERT score-a Unified Medical Language System (UMLS)- showed the best results. This underscores the importance of incorporating domain-specific knowledge into evaluation efforts. Our work reveals the deficiency in quality evaluations for generated text and introduces our comprehensive human evaluation framework as a baseline. Future efforts should prioritize integrating medical knowledge databases to enhance the alignment of automated metrics, particularly focusing on refining the SapBERT score for improved assessments.

4.

Automated stratification of trauma injury severity across multiple body regions using multi-modal, multi-class machine learning models.

Gao, Jifan; Chen, Guanhua; O'Rourke, Ann P; Caskey, John; Carey, Kyle A; Oguss, Madeline; Stey, Anne; Dligach, Dmitriy; Miller, Timothy; Mayampurath, Anoop; Churpek, Matthew M; Afshar, Majid.

J Am Med Inform Assoc ; 31(6): 1291-1302, 2024 May 20.

Article de Anglais | MEDLINE | ID: mdl-38587875

RÉSUMÉ

OBJECTIVE: The timely stratification of trauma injury severity can enhance the quality of trauma care but it requires intense manual annotation from certified trauma coders. The objective of this study is to develop machine learning models for the stratification of trauma injury severity across various body regions using clinical text and structured electronic health records (EHRs) data. MATERIALS AND METHODS: Our study utilized clinical documents and structured EHR variables linked with the trauma registry data to create 2 machine learning models with different approaches to representing text. The first one fuses concept unique identifiers (CUIs) extracted from free text with structured EHR variables, while the second one integrates free text with structured EHR variables. Temporal validation was undertaken to ensure the models' temporal generalizability. Additionally, analyses to assess the variable importance were conducted. RESULTS: Both models demonstrated impressive performance in categorizing leg injuries, achieving high accuracy with macro-F1 scores of over 0.8. Additionally, they showed considerable accuracy, with macro-F1 scores exceeding or near 0.7, in assessing injuries in the areas of the chest and head. We showed in our variable importance analysis that the most important features in the model have strong face validity in determining clinically relevant trauma injuries. DISCUSSION: The CUI-based model achieves comparable performance, if not higher, compared to the free-text-based model, with reduced complexity. Furthermore, integrating structured EHR data improves performance, particularly when the text modalities are insufficiently indicative. CONCLUSIONS: Our multi-modal, multiclass models can provide accurate stratification of trauma injury severity and clinically relevant interpretations.

Sujet(s)

Dossiers médicaux électroniques , Apprentissage machine , Plaies et blessures , Humains , Plaies et blessures/classification , Score de gravité des lésions traumatiques , Enregistrements , Indices de gravité des traumatismes , Traitement du langage naturel

5.

Development and external validation of multimodal postoperative acute kidney injury risk machine learning models.

Karway, George K; Koyner, Jay L; Caskey, John; Spicer, Alexandra B; Carey, Kyle A; Gilbert, Emily R; Dligach, Dmitriy; Mayampurath, Anoop; Afshar, Majid; Churpek, Matthew M.

JAMIA Open ; 6(4): ooad109, 2023 Dec.

Article de Anglais | MEDLINE | ID: mdl-38144168

RÉSUMÉ

Objectives: To develop and externally validate machine learning models using structured and unstructured electronic health record data to predict postoperative acute kidney injury (AKI) across inpatient settings. Materials and Methods: Data for adult postoperative admissions to the Loyola University Medical Center (2009-2017) were used for model development and admissions to the University of Wisconsin-Madison (2009-2020) were used for validation. Structured features included demographics, vital signs, laboratory results, and nurse-documented scores. Unstructured text from clinical notes were converted into concept unique identifiers (CUIs) using the clinical Text Analysis and Knowledge Extraction System. The primary outcome was the development of Kidney Disease Improvement Global Outcomes stage 2 AKI within 7 days after leaving the operating room. We derived unimodal extreme gradient boosting machines (XGBoost) and elastic net logistic regression (GLMNET) models using structured-only data and multimodal models combining structured data with CUI features. Model comparison was performed using the receiver operating characteristic curve (AUROC), with Delong's test for statistical differences. Results: The study cohort included 138â389 adult patient admissions (mean [SD] age 58 [16] years; 11â506 [8%] African-American; and 70â826 [51%] female) across the 2 sites. Of those, 2959 (2.1%) developed stage 2 AKI or higher. Across all data types, XGBoost outperformed GLMNET (mean AUROC 0.81 [95% confidence interval (CI), 0.80-0.82] vs 0.78 [95% CI, 0.77-0.79]). The multimodal XGBoost model incorporating CUIs parameterized as term frequency-inverse document frequency (TF-IDF) showed the highest discrimination performance (AUROC 0.82 [95% CI, 0.81-0.83]) over unimodal models (AUROC 0.79 [95% CI, 0.78-0.80]). Discussion: A multimodality approach with structured data and TF-IDF weighting of CUIs increased model performance over structured data-only models. Conclusion: These findings highlight the predictive power of CUIs when merged with structured data for clinical prediction models, which may improve the detection of postoperative AKI.

6.

End-to-end clinical temporal information extraction with multi-head attention.

Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Savova, Guergana.

Proc Conf Assoc Comput Linguist Meet ; 2023: 313-319, 2023 Jul.

Article de Anglais | MEDLINE | ID: mdl-37780680

RÉSUMÉ

Understanding temporal relationships in text from electronic health records can be valuable for many important downstream clinical applications. Since Clinical TempEval 2017, there has been little work on end-to-end systems for temporal relation extraction, with most work focused on the setting where gold standard events and time expressions are given. In this work, we make use of a novel multi-headed attention mechanism on top of a pre-trained transformer encoder to allow the learning process to attend to multiple aspects of the contextualized embeddings. Our system achieves state of the art results on the THYME corpus by a wide margin, in both the in-domain and cross-domain settings.

7.

Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles.

Zhou, Weipeng; Dligach, Dmitriy; Afshar, Majid; Gao, Yanjun; Miller, Timothy A.

Proc Conf Assoc Comput Linguist Meet ; 2023: 125-130, 2023 Jul.

Article de Anglais | MEDLINE | ID: mdl-37786810

RÉSUMÉ

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.

8.

Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes.

Gao, Yanjun; Dligach, Dmitriy; Miller, Timothy; Churpek, Matthew M; Afshar, Majid.

Proc Conf Assoc Comput Linguist Meet ; 2023: 461-467, 2023 Jul.

Article de Anglais | MEDLINE | ID: mdl-37583489

RÉSUMÉ

The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers' decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.

9.

Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning.

Sharma, Brihat; Gao, Yanjun; Miller, Timothy; Churpek, Matthew M; Afshar, Majid; Dligach, Dmitriy.

Proc Conf Assoc Comput Linguist Meet ; 2023(ClinicalNLP): 78-85, 2023 Jul.

Article de Anglais | MEDLINE | ID: mdl-37492270

RÉSUMÉ

Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically-trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.

10.

Deployment of Real-time Natural Language Processing and Deep Learning Clinical Decision Support in the Electronic Health Record: Pipeline Implementation for an Opioid Misuse Screener in Hospitalized Adults.

Afshar, Majid; Adelaine, Sabrina; Resnik, Felice; Mundt, Marlon P; Long, John; Leaf, Margaret; Ampian, Theodore; Wills, Graham J; Schnapp, Benjamin; Chao, Michael; Brown, Randy; Joyce, Cara; Sharma, Brihat; Dligach, Dmitriy; Burnside, Elizabeth S; Mahoney, Jane; Churpek, Matthew M; Patterson, Brian W; Liao, Frank.

JMIR Med Inform ; 11: e44977, 2023 Apr 20.

Article de Anglais | MEDLINE | ID: mdl-37079367

RÉSUMÉ

BACKGROUND: The clinical narrative in electronic health records (EHRs) carries valuable information for predictive analytics; however, its free-text form is difficult to mine and analyze for clinical decision support (CDS). Large-scale clinical natural language processing (NLP) pipelines have focused on data warehouse applications for retrospective research efforts. There remains a paucity of evidence for implementing NLP pipelines at the bedside for health care delivery. OBJECTIVE: We aimed to detail a hospital-wide, operational pipeline to implement a real-time NLP-driven CDS tool and describe a protocol for an implementation framework with a user-centered design of the CDS tool. METHODS: The pipeline integrated a previously trained open-source convolutional neural network model for screening opioid misuse that leveraged EHR notes mapped to standardized medical vocabularies in the Unified Medical Language System. A sample of 100 adult encounters were reviewed by a physician informaticist for silent testing of the deep learning algorithm before deployment. An end user interview survey was developed to examine the user acceptability of a best practice alert (BPA) to provide the screening results with recommendations. The planned implementation also included a human-centered design with user feedback on the BPA, an implementation framework with cost-effectiveness, and a noninferiority patient outcome analysis plan. RESULTS: The pipeline was a reproducible workflow with a shared pseudocode for a cloud service to ingest, process, and store clinical notes as Health Level 7 messages from a major EHR vendor in an elastic cloud computing environment. Feature engineering of the notes used an open-source NLP engine, and the features were fed into the deep learning algorithm, with the results returned as a BPA in the EHR. On-site silent testing of the deep learning algorithm demonstrated a sensitivity of 93% (95% CI 66%-99%) and specificity of 92% (95% CI 84%-96%), similar to published validation studies. Before deployment, approvals were received across hospital committees for inpatient operations. Five interviews were conducted; they informed the development of an educational flyer and further modified the BPA to exclude certain patients and allow the refusal of recommendations. The longest delay in pipeline development was because of cybersecurity approvals, especially because of the exchange of protected health information between the Microsoft (Microsoft Corp) and Epic (Epic Systems Corp) cloud vendors. In silent testing, the resultant pipeline provided a BPA to the bedside within minutes of a provider entering a note in the EHR. CONCLUSIONS: The components of the real-time NLP pipeline were detailed with open-source tools and pseudocode for other health systems to benchmark. The deployment of medical artificial intelligence systems in routine clinical care presents an important yet unfulfilled opportunity, and our protocol aimed to close the gap in the implementation of artificial intelligence-driven CDS. TRIAL REGISTRATION: ClinicalTrials.gov NCT05745480; https://www.clinicaltrials.gov/ct2/show/NCT05745480.

11.

Progress Note Understanding - Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 shared task.

Gao, Yanjun; Dligach, Dmitriy; Miller, Timothy; Churpek, Matthew M; Uzuner, Ozlem; Afshar, Majid.

J Biomed Inform ; 142: 104346, 2023 06.

Article de Anglais | MEDLINE | ID: mdl-37061012

RÉSUMÉ

Daily progress notes are a common note type in the electronic health record (EHR) where healthcare providers document the patient's daily progress and treatment plans. The EHR is designed to document all the care provided to patients, but it also enables note bloat with extraneous information that distracts from the diagnoses and treatment plans. Applications of natural language processing (NLP) in the EHR is a growing field with the majority of methods in information extraction. Few tasks use NLP methods for downstream diagnostic decision support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3: Progress Note Understanding - Assessment and Plan Reasoning as one step towards a new suite of tasks. The Assessment and Plan Reasoning task focuses on the most critical components of progress notes, Assessment and Plan subsections where health problems and diagnoses are contained. The goal of the task was to develop and evaluate NLP systems that automatically predict causal relations between the overall status of the patient contained in the Assessment section and its relation to each component of the Plan section which contains the diagnoses and treatment plans. The goal of the task was to identify and prioritize diagnoses as the first steps in diagnostic decision support to find the most relevant information in long documents like daily progress notes. We present the results of the 2022 N2C2 Track 3 and provide a description of the data, evaluation, participation and system performance.

Sujet(s)

Dossiers médicaux électroniques , Mémorisation et recherche des informations , Humains , Traitement du langage naturel , Personnel de santé

12.

DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing.

Gao, Yanjun; Dligach, Dmitriy; Miller, Timothy; Caskey, John; Sharma, Brihat; Churpek, Matthew M; Afshar, Majid.

J Biomed Inform ; 138: 104286, 2023 02.

Article de Anglais | MEDLINE | ID: mdl-36706848

RÉSUMÉ

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgement that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, Dr.Bench, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models for diagnostic reasoning. The goal of DR. BENCH is to advance the science in cNLP to support downstream applications in computerized diagnostic decision support and improve the efficiency and accuracy of healthcare providers during patient care. We fine-tune and evaluate the state-of-the-art generative models on DR.BENCH. Experiments show that with domain adaptation pre-training on medical knowledge, the model demonstrated opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community. We also discuss the carbon footprint produced during the experiments and encourage future work on DR.BENCH to report the carbon footprint.

Sujet(s)

Intelligence artificielle , Traitement du langage naturel , Humains , Référenciation , Résolution de problème , Mémorisation et recherche des informations

13.

The Evaluation of a Clinical Decision Support Tool Using Natural Language Processing to Screen Hospitalized Adults for Unhealthy Substance Use: Protocol for a Quasi-Experimental Design.

Joyce, Cara; Markossian, Talar W; Nikolaides, Jenna; Ramsey, Elisabeth; Thompson, Hale M; Rojas, Juan C; Sharma, Brihat; Dligach, Dmitriy; Oguss, Madeline K; Cooper, Richard S; Afshar, Majid.

JMIR Res Protoc ; 11(12): e42971, 2022 Dec 19.

Article de Anglais | MEDLINE | ID: mdl-36534461

RÉSUMÉ

BACKGROUND: Automated and data-driven methods for screening using natural language processing (NLP) and machine learning may replace resource-intensive manual approaches in the usual care of patients hospitalized with conditions related to unhealthy substance use. The rigorous evaluation of tools that use artificial intelligence (AI) is necessary to demonstrate effectiveness before system-wide implementation. An NLP tool to use routinely collected data in the electronic health record was previously validated for diagnostic accuracy in a retrospective study for screening unhealthy substance use. Our next step is a noninferiority design incorporated into a research protocol for clinical implementation with prospective evaluation of clinical effectiveness in a large health system. OBJECTIVE: This study aims to provide a study protocol to evaluate health outcomes and the costs and benefits of an AI-driven automated screener compared to manual human screening for unhealthy substance use. METHODS: A pre-post design is proposed to evaluate 12 months of manual screening followed by 12 months of automated screening across surgical and medical wards at a single medical center. The preintervention period consists of usual care with manual screening by nurses and social workers and referrals to a multidisciplinary Substance Use Intervention Team (SUIT). Facilitated by a NLP pipeline in the postintervention period, clinical notes from the first 24 hours of hospitalization will be processed and scored by a machine learning model, and the SUIT will be similarly alerted to patients who flagged positive for substance misuse. Flowsheets within the electronic health record have been updated to capture rates of interventions for the primary outcome (brief intervention/motivational interviewing, medication-assisted treatment, naloxone dispensing, and referral to outpatient care). Effectiveness in terms of patient outcomes will be determined by noninferior rates of interventions (primary outcome), as well as rates of readmission within 6 months, average time to consult, and discharge rates against medical advice (secondary outcomes) in the postintervention period by a SUIT compared to the preintervention period. A separate analysis will be performed to assess the costs and benefits to the health system by using automated screening. Changes from the pre- to postintervention period will be assessed in covariate-adjusted generalized linear mixed-effects models. RESULTS: The study will begin in September 2022. Monthly data monitoring and Data Safety Monitoring Board reporting are scheduled every 6 months throughout the study period. We anticipate reporting final results by June 2025. CONCLUSIONS: The use of augmented intelligence for clinical decision support is growing with an increasing number of AI tools. We provide a research protocol for prospective evaluation of an automated NLP system for screening unhealthy substance use using a noninferiority design to demonstrate comprehensive screening that may be as effective as manual screening but less costly via automated solutions. TRIAL REGISTRATION: ClinicalTrials.gov NCT03833804; https://clinicaltrials.gov/ct2/show/NCT03833804. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/42971.

14.

Summarizing Patients' Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models.

Gao, Yanjun; Miller, Timothy; Xu, Dongfang; Dligach, Dmitriy; Churpek, Matthew M; Afshar, Majid.

Proc Int Conf Comput Ling ; 2022: 2979-2991, 2022 Oct.

Article de Anglais | MEDLINE | ID: mdl-36268128

RÉSUMÉ

Automatically summarizing patients' main problems from daily progress notes using natural language processing methods helps to battle against information and cognitive overload in hospital settings and potentially assists providers with computerized diagnostic decision support. Problem list summarization requires a model to understand, abstract, and generate clinical documentation. In this work, we propose a new NLP task that aims to generate a list of problems in a patient's daily care plan using input from the provider's progress notes during hospitalization. We investigate the performance of T5 and BART, two state-of-the-art seq2seq transformer architectures, in solving this problem. We provide a corpus built on top of progress notes from publicly available electronic health record progress notes in the Medical Information Mart for Intensive Care (MIMIC)-III. T5 and BART are trained on general domain text, and we experiment with a data augmentation method and a domain adaptation pre-training method to increase exposure to medical vocabulary and knowledge. Evaluation methods include ROUGE, BERTScore, cosine similarity on sentence embedding, and F-score on medical concepts. Results show that T5 with domain adaptive pre-training achieves significant performance gains compared to a rule-based system and general domain pre-trained language models, indicating a promising direction for tackling the problem summarization task.

15.

A scoping review of publicly available language tasks in clinical natural language processing.

Gao, Yanjun; Dligach, Dmitriy; Christensen, Leslie; Tesch, Samuel; Laffin, Ryan; Xu, Dongfang; Miller, Timothy; Uzuner, Ozlem; Churpek, Matthew M; Afshar, Majid.

J Am Med Inform Assoc ; 29(10): 1797-1806, 2022 09 12.

Article de Anglais | MEDLINE | ID: mdl-35923088

RÉSUMÉ

OBJECTIVE: To provide a scoping review of papers on clinical natural language processing (NLP) shared tasks that use publicly available electronic health record data from a cohort of patients. MATERIALS AND METHODS: We searched 6 databases, including biomedical research and computer science literature databases. A round of title/abstract screening and full-text screening were conducted by 2 reviewers. Our method followed the PRISMA-ScR guidelines. RESULTS: A total of 35 papers with 48 clinical NLP tasks met inclusion criteria between 2007 and 2021. We categorized the tasks by the type of NLP problems, including named entity recognition, summarization, and other NLP tasks. Some tasks were introduced as potential clinical decision support applications, such as substance abuse detection, and phenotyping. We summarized the tasks by publication venue and dataset type. DISCUSSION: The breadth of clinical NLP tasks continues to grow as the field of NLP evolves with advancements in language systems. However, gaps exist with divergent interests between the general domain NLP community and the clinical informatics community for task motivation and design, and in generalizability of the data sources. We also identified issues in data preparation. CONCLUSION: The existing clinical NLP tasks cover a wide range of topics and the field is expected to grow and attract more attention from both general domain NLP and clinical informatics community. We encourage future work to incorporate multidisciplinary collaboration, reporting transparency, and standardization in data preparation. We provide a listing of all the shared task papers and datasets from this review in a GitLab repository.

Sujet(s)

Dossiers médicaux électroniques , Traitement du langage naturel , Collecte de données , Gestion des données , Humains , Mémorisation et recherche des informations

16.

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding.

Gao, Yanjun; Dligach, Dmitriy; Miller, Timothy; Tesch, Samuel; Laffin, Ryan; Churpek, Matthew M; Afshar, Majid.

LREC Int Conf Lang Resour Eval ; 2022: 5484-5493, 2022 Jun.

Article de Anglais | MEDLINE | ID: mdl-35939277

RÉSUMÉ

Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization. We created an annotated corpus based on an extensive collection of publicly available daily progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization.

17.

Development and multimodal validation of a substance misuse algorithm for referral to treatment using artificial intelligence (SMART-AI): a retrospective deep learning study.

Afshar, Majid; Sharma, Brihat; Dligach, Dmitriy; Oguss, Madeline; Brown, Randall; Chhabra, Neeraj; Thompson, Hale M; Markossian, Talar; Joyce, Cara; Churpek, Matthew M; Karnik, Niranjan S.

Lancet Digit Health ; 4(6): e426-e435, 2022 06.

Article de Anglais | MEDLINE | ID: mdl-35623797

RÉSUMÉ

BACKGROUND: Substance misuse is a heterogeneous and complex set of behavioural conditions that are highly prevalent in hospital settings and frequently co-occur. Few hospital-wide solutions exist to comprehensively and reliably identify these conditions to prioritise care and guide treatment. The aim of this study was to apply natural language processing (NLP) to clinical notes collected in the electronic health record (EHR) to accurately screen for substance misuse. METHODS: The model was trained and developed on a reference dataset derived from a hospital-wide programme at Rush University Medical Center (RUMC), Chicago, IL, USA, that used structured diagnostic interviews to manually screen admitted patients over 27 months (between Oct 1, 2017, and Dec 31, 2019; n=54â915). The Alcohol Use Disorder Identification Test and Drug Abuse Screening Tool served as reference standards. The first 24 h of notes in the EHR were mapped to standardised medical vocabulary and fed into single-label, multilabel, and multilabel with auxillary-task neural network models. Temporal validation of the model was done using data from the subsequent 12 months on a subset of RUMC patients (n=16â917). External validation was done using data from Loyola University Medical Center, Chicago, IL, USA between Jan 1, 2007, and Sept 30, 2017 (n=1991 adult patients). The primary outcome was discrimination for alcohol misuse, opioid misuse, or non-opioid drug misuse. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Calibration slope and intercept were measured with the unreliability index. Bias assessments were performed across demographic subgroups. FINDINGS: The model was trained on a cohort that had 3·5% misuse (n=1â921) with any type of substance. 220 (11%) of 1921 patients with substance misuse had more than one type of misuse. The multilabel convolutional neural network classifier had a mean AUROC of 0·97 (95% CI 0·96-0·98) during temporal validation for all types of substance misuse. The model was well calibrated and showed good face validity with model features containing explicit mentions of aberrant drug-taking behaviour. A false-negative rate of 0·18-0·19 and a false-positive rate of 0·03 between non-Hispanic Black and non-Hispanic White groups occurred. In external validation, the AUROCs for alcohol and opioid misuse were 0·88 (95% CI 0·86-0·90) and 0·94 (0·92-0·95), respectively. INTERPRETATION: We developed a novel and accurate approach to leveraging the first 24 h of EHR notes for screening multiple types of substance misuse. FUNDING: National Institute On Drug Abuse, National Institutes of Health.

Sujet(s)

Alcoolisme , Apprentissage profond , Troubles liés aux opiacés , Adulte , Alcoolisme/complications , Alcoolisme/diagnostic , Alcoolisme/thérapie , Intelligence artificielle , Humains , Orientation vers un spécialiste , Études rétrospectives , États-Unis

18.

Correction: Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline.

Caskey, John; McConnell, Iain L; Oguss, Madeline; Dligach, Dmitriy; Kulikoff, Rachel; Grogan, Brittany; Gibson, Crystal; Wimmer, Elizabeth; DeSalvo, Traci E; Nyakoe-Nyasani, Edwin E; Churpek, Matthew M; Afshar, Majid.

JMIR Public Health Surveill ; 8(3): e37893, 2022 Mar 24.

Article de Anglais | MEDLINE | ID: mdl-35324453

RÉSUMÉ

[This corrects the article DOI: 10.2196/36119.].

19.

Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline.

Caskey, John; McConnell, Iain L; Oguss, Madeline; Dligach, Dmitriy; Kulikoff, Rachel; Grogan, Brittany; Gibson, Crystal; Wimmer, Elizabeth; DeSalvo, Traci E; Nyakoe-Nyasani, Edwin E; Churpek, Matthew M; Afshar, Majid.

JMIR Public Health Surveill ; 8(3): e36119, 2022 03 08.

Article de Anglais | MEDLINE | ID: mdl-35144241

RÉSUMÉ

BACKGROUND: In Wisconsin, COVID-19 case interview forms contain free-text fields that need to be mined to identify potential outbreaks for targeted policy making. We developed an automated pipeline to ingest the free text into a pretrained neural language model to identify businesses and facilities as outbreaks. OBJECTIVE: We aimed to examine the precision and recall of our natural language processing pipeline against existing outbreaks and potentially new clusters. METHODS: Data on cases of COVID-19 were extracted from the Wisconsin Electronic Disease Surveillance System (WEDSS) for Dane County between July 1, 2020, and June 30, 2021. Features from the case interview forms were fed into a Bidirectional Encoder Representations from Transformers (BERT) model that was fine-tuned for named entity recognition (NER). We also developed a novel location-mapping tool to provide addresses for relevant NER. Precision and recall were measured against manually verified outbreaks and valid addresses in WEDSS. RESULTS: There were 46,798 cases of COVID-19, with 4,183,273 total BERT tokens and 15,051 unique tokens. The recall and precision of the NER tool were 0.67 (95% CI 0.66-0.68) and 0.55 (95% CI 0.54-0.57), respectively. For the location-mapping tool, the recall and precision were 0.93 (95% CI 0.92-0.95) and 0.93 (95% CI 0.92-0.95), respectively. Across monthly intervals, the NER tool identified more potential clusters than were verified in WEDSS. CONCLUSIONS: We developed a novel pipeline of tools that identified existing outbreaks and novel clusters with associated addresses. Our pipeline ingests data from a statewide database and may be deployed to assist local health departments for targeted interventions.

Sujet(s)

COVID-19 , Traitement du langage naturel , COVID-19/épidémiologie , Traçage des contacts , Épidémies de maladies , Humains , Santé publique , SARS-CoV-2

20.

Geometric Features Associated with Middle Cerebral Artery Bifurcation Aneurysm Formation: A Matched Case-Control Study.

Zhang, Jian; Can, Anil; Lai, Pui Man Rosalind; Mukundan, Srinivasan; Castro, Victor M; Dligach, Dmitriy; Finan, Sean; Gainer, Vivian S; Shadick, Nancy A; Savova, Guergana; Murphy, Shawn N; Cai, Tianxi; Weiss, Scott T; Du, Rose.

J Stroke Cerebrovasc Dis ; 31(3): 106268, 2022 Mar.

Article de Anglais | MEDLINE | ID: mdl-34974241

RÉSUMÉ

OBJECTIVES: The pathogenesis of intracranial aneurysms is multifactorial and includes genetic, environmental, and anatomic influences. We aimed to identify image-based morphological parameters that were associated with middle cerebral artery (MCA) bifurcation aneurysms. MATERIALS AND METHODS: We evaluated three-dimensional morphological parameters obtained from CT angiography (CTA) or digital subtraction angiography (DSA) from 317 patients with unilateral MCA bifurcation aneurysms diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016. We chose the contralateral unaffected MCA bifurcation as the control group, in order to control for genetic and environmental risk factors. Diameters and angles of surrounding parent and daughter vessels of 634 MCAs were examined. RESULTS: Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses with smaller (≤ 3 mm) aneurysms only and with angles excluded, were also performed. In a multivariable conditional logistic regression model we showed that smaller diameter size ratio (OR 0.0004, 95% CI 0.0001-0.15), larger daughter-daughter angles (OR 1.08, 95% CI 1.06-1.11) and larger parent-daughter angle ratios (OR 4.24, 95% CI 1.77-10.16) were significantly associated with MCA aneurysm presence after correcting for other variables. In order to account for possible changes to the vasculature by the aneurysm, a subgroup analysis of small aneurysms (≤ 3 mm) was performed and showed that the results were similar. CONCLUSIONS: Easily measurable morphological parameters of the surrounding vasculature of the MCA may provide objective metrics to assess MCA aneurysm formation risk in high-risk patients.

Sujet(s)

Anévrysme intracrânien , Artère cérébrale moyenne , Études cas-témoins , Angiographie par tomodensitométrie , Femelle , Humains , Anévrysme intracrânien/imagerie diagnostique , Artère cérébrale moyenne/imagerie diagnostique

RÉSUMÉ

RÉSUMÉ

RÉSUMÉ

RÉSUMÉ

Sujet(s)

RÉSUMÉ

RÉSUMÉ

RÉSUMÉ

RÉSUMÉ

RÉSUMÉ

RÉSUMÉ

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

RÉSUMÉ

RÉSUMÉ

Sujet(s)

RÉSUMÉ

RÉSUMÉ

Sujet(s)

RÉSUMÉ

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

ENVOYER À:

SÉLECTION CITATIONS

DÉTAIL DE RECHERCHE