Pesquisa | Portal Regional da BVS

1.

Automated stratification of trauma injury severity across multiple body regions using multi-modal, multi-class machine learning models.

Gao, Jifan; Chen, Guanhua; O'Rourke, Ann P; Caskey, John; Carey, Kyle A; Oguss, Madeline; Stey, Anne; Dligach, Dmitriy; Miller, Timothy; Mayampurath, Anoop; Churpek, Matthew M; Afshar, Majid.

J Am Med Inform Assoc ; 31(6): 1291-1302, 2024 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-38587875

RESUMO

OBJECTIVE: The timely stratification of trauma injury severity can enhance the quality of trauma care but it requires intense manual annotation from certified trauma coders. The objective of this study is to develop machine learning models for the stratification of trauma injury severity across various body regions using clinical text and structured electronic health records (EHRs) data. MATERIALS AND METHODS: Our study utilized clinical documents and structured EHR variables linked with the trauma registry data to create 2 machine learning models with different approaches to representing text. The first one fuses concept unique identifiers (CUIs) extracted from free text with structured EHR variables, while the second one integrates free text with structured EHR variables. Temporal validation was undertaken to ensure the models' temporal generalizability. Additionally, analyses to assess the variable importance were conducted. RESULTS: Both models demonstrated impressive performance in categorizing leg injuries, achieving high accuracy with macro-F1 scores of over 0.8. Additionally, they showed considerable accuracy, with macro-F1 scores exceeding or near 0.7, in assessing injuries in the areas of the chest and head. We showed in our variable importance analysis that the most important features in the model have strong face validity in determining clinically relevant trauma injuries. DISCUSSION: The CUI-based model achieves comparable performance, if not higher, compared to the free-text-based model, with reduced complexity. Furthermore, integrating structured EHR data improves performance, particularly when the text modalities are insufficiently indicative. CONCLUSIONS: Our multi-modal, multiclass models can provide accurate stratification of trauma injury severity and clinically relevant interpretations.

Assuntos

Registros Eletrônicos de Saúde , Aprendizado de Máquina , Ferimentos e Lesões , Humanos , Ferimentos e Lesões/classificação , Escala de Gravidade do Ferimento , Sistema de Registros , Índices de Gravidade do Trauma , Processamento de Linguagem Natural

2.

LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction.

Yoon, WonJin; Chen, Shan; Gao, Yanjun; Dligach, Dmitriy; Bitterman, Danielle S; Afshar, Majid; Miller, Timothy.

medRxiv ; 2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38585973

RESUMO

Natural Language Processing (NLP) is a study of automated processing of text data. Application of NLP in the clinical domain is important due to the rich unstructured information implanted in clinical documents, which often remains inaccessible in structured data. Empowered by the recent advance of language models (LMs), there is a growing interest in their application within the clinical domain. When applying NLP methods to a certain domain, the role of benchmark datasets are crucial as benchmark datasets not only guide the selection of best-performing models but also enable assessing of the reliability of the generated outputs. Despite the recent availability of LMs capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent. To address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. Our notes have a median word count of 1687 and an interquartile range of 1308 to 2169. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to Hierarchical Transformer and an open-source instruction-tuned large language model. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations. We expect LCD benchmarks to become a resource for the development of advanced supervised models, prompting methods, or the foundation models themselves, tailored for clinical text. The benchmark dataset is available at https://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc.

3.

Development of a Human Evaluation Framework and Correlation with Automated Metrics for Natural Language Generation of Medical Diagnoses.

Croxford, Emma; Gao, Yanjun; Patterson, Brian; To, Daniel; Tesch, Samuel; Dligach, Dmitriy; Mayampurath, Anoop; Churpek, Matthew M; Afshar, Majid.

medRxiv ; 2024 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-38562730

RESUMO

In the evolving landscape of clinical Natural Language Generation (NLG), assessing abstractive text quality remains challenging, as existing methods often overlook generative task complexities. This work aimed to examine the current state of automated evaluation metrics in NLG in healthcare. To have a robust and well-validated baseline with which to examine the alignment of these metrics, we created a comprehensive human evaluation framework. Employing ChatGPT-3.5-turbo generative output, we correlated human judgments with each metric. None of the metrics demonstrated high alignment; however, the SapBERT score-a Unified Medical Language System (UMLS)- showed the best results. This underscores the importance of incorporating domain-specific knowledge into evaluation efforts. Our work reveals the deficiency in quality evaluations for generated text and introduces our comprehensive human evaluation framework as a baseline. Future efforts should prioritize integrating medical knowledge databases to enhance the alignment of automated metrics, particularly focusing on refining the SapBERT score for improved assessments.

4.

Development and external validation of multimodal postoperative acute kidney injury risk machine learning models.

Karway, George K; Koyner, Jay L; Caskey, John; Spicer, Alexandra B; Carey, Kyle A; Gilbert, Emily R; Dligach, Dmitriy; Mayampurath, Anoop; Afshar, Majid; Churpek, Matthew M.

JAMIA Open ; 6(4): ooad109, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38144168

RESUMO

Objectives: To develop and externally validate machine learning models using structured and unstructured electronic health record data to predict postoperative acute kidney injury (AKI) across inpatient settings. Materials and Methods: Data for adult postoperative admissions to the Loyola University Medical Center (2009-2017) were used for model development and admissions to the University of Wisconsin-Madison (2009-2020) were used for validation. Structured features included demographics, vital signs, laboratory results, and nurse-documented scores. Unstructured text from clinical notes were converted into concept unique identifiers (CUIs) using the clinical Text Analysis and Knowledge Extraction System. The primary outcome was the development of Kidney Disease Improvement Global Outcomes stage 2 AKI within 7 days after leaving the operating room. We derived unimodal extreme gradient boosting machines (XGBoost) and elastic net logistic regression (GLMNET) models using structured-only data and multimodal models combining structured data with CUI features. Model comparison was performed using the receiver operating characteristic curve (AUROC), with Delong's test for statistical differences. Results: The study cohort included 138â389 adult patient admissions (mean [SD] age 58 [16] years; 11â506 [8%] African-American; and 70â826 [51%] female) across the 2 sites. Of those, 2959 (2.1%) developed stage 2 AKI or higher. Across all data types, XGBoost outperformed GLMNET (mean AUROC 0.81 [95% confidence interval (CI), 0.80-0.82] vs 0.78 [95% CI, 0.77-0.79]). The multimodal XGBoost model incorporating CUIs parameterized as term frequency-inverse document frequency (TF-IDF) showed the highest discrimination performance (AUROC 0.82 [95% CI, 0.81-0.83]) over unimodal models (AUROC 0.79 [95% CI, 0.78-0.80]). Discussion: A multimodality approach with structured data and TF-IDF weighting of CUIs increased model performance over structured data-only models. Conclusion: These findings highlight the predictive power of CUIs when merged with structured data for clinical prediction models, which may improve the detection of postoperative AKI.

5.

End-to-end clinical temporal information extraction with multi-head attention.

Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Savova, Guergana.

Proc Conf Assoc Comput Linguist Meet ; 2023: 313-319, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37780680

RESUMO

Understanding temporal relationships in text from electronic health records can be valuable for many important downstream clinical applications. Since Clinical TempEval 2017, there has been little work on end-to-end systems for temporal relation extraction, with most work focused on the setting where gold standard events and time expressions are given. In this work, we make use of a novel multi-headed attention mechanism on top of a pre-trained transformer encoder to allow the learning process to attend to multiple aspects of the contextualized embeddings. Our system achieves state of the art results on the THYME corpus by a wide margin, in both the in-domain and cross-domain settings.

6.

Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles.

Zhou, Weipeng; Dligach, Dmitriy; Afshar, Majid; Gao, Yanjun; Miller, Timothy A.

Proc Conf Assoc Comput Linguist Meet ; 2023: 125-130, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37786810

RESUMO

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.

7.

Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes.

Gao, Yanjun; Dligach, Dmitriy; Miller, Timothy; Churpek, Matthew M; Afshar, Majid.

Proc Conf Assoc Comput Linguist Meet ; 2023: 461-467, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37583489

RESUMO

The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers' decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.

8.

Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning.

Sharma, Brihat; Gao, Yanjun; Miller, Timothy; Churpek, Matthew M; Afshar, Majid; Dligach, Dmitriy.

Proc Conf Assoc Comput Linguist Meet ; 2023(ClinicalNLP): 78-85, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37492270

RESUMO

Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically-trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.

9.

Deployment of Real-time Natural Language Processing and Deep Learning Clinical Decision Support in the Electronic Health Record: Pipeline Implementation for an Opioid Misuse Screener in Hospitalized Adults.

Afshar, Majid; Adelaine, Sabrina; Resnik, Felice; Mundt, Marlon P; Long, John; Leaf, Margaret; Ampian, Theodore; Wills, Graham J; Schnapp, Benjamin; Chao, Michael; Brown, Randy; Joyce, Cara; Sharma, Brihat; Dligach, Dmitriy; Burnside, Elizabeth S; Mahoney, Jane; Churpek, Matthew M; Patterson, Brian W; Liao, Frank.

JMIR Med Inform ; 11: e44977, 2023 Apr 20.

Artigo em Inglês | MEDLINE | ID: mdl-37079367

RESUMO

BACKGROUND: The clinical narrative in electronic health records (EHRs) carries valuable information for predictive analytics; however, its free-text form is difficult to mine and analyze for clinical decision support (CDS). Large-scale clinical natural language processing (NLP) pipelines have focused on data warehouse applications for retrospective research efforts. There remains a paucity of evidence for implementing NLP pipelines at the bedside for health care delivery. OBJECTIVE: We aimed to detail a hospital-wide, operational pipeline to implement a real-time NLP-driven CDS tool and describe a protocol for an implementation framework with a user-centered design of the CDS tool. METHODS: The pipeline integrated a previously trained open-source convolutional neural network model for screening opioid misuse that leveraged EHR notes mapped to standardized medical vocabularies in the Unified Medical Language System. A sample of 100 adult encounters were reviewed by a physician informaticist for silent testing of the deep learning algorithm before deployment. An end user interview survey was developed to examine the user acceptability of a best practice alert (BPA) to provide the screening results with recommendations. The planned implementation also included a human-centered design with user feedback on the BPA, an implementation framework with cost-effectiveness, and a noninferiority patient outcome analysis plan. RESULTS: The pipeline was a reproducible workflow with a shared pseudocode for a cloud service to ingest, process, and store clinical notes as Health Level 7 messages from a major EHR vendor in an elastic cloud computing environment. Feature engineering of the notes used an open-source NLP engine, and the features were fed into the deep learning algorithm, with the results returned as a BPA in the EHR. On-site silent testing of the deep learning algorithm demonstrated a sensitivity of 93% (95% CI 66%-99%) and specificity of 92% (95% CI 84%-96%), similar to published validation studies. Before deployment, approvals were received across hospital committees for inpatient operations. Five interviews were conducted; they informed the development of an educational flyer and further modified the BPA to exclude certain patients and allow the refusal of recommendations. The longest delay in pipeline development was because of cybersecurity approvals, especially because of the exchange of protected health information between the Microsoft (Microsoft Corp) and Epic (Epic Systems Corp) cloud vendors. In silent testing, the resultant pipeline provided a BPA to the bedside within minutes of a provider entering a note in the EHR. CONCLUSIONS: The components of the real-time NLP pipeline were detailed with open-source tools and pseudocode for other health systems to benchmark. The deployment of medical artificial intelligence systems in routine clinical care presents an important yet unfulfilled opportunity, and our protocol aimed to close the gap in the implementation of artificial intelligence-driven CDS. TRIAL REGISTRATION: ClinicalTrials.gov NCT05745480; https://www.clinicaltrials.gov/ct2/show/NCT05745480.

10.

Progress Note Understanding - Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 shared task.

Gao, Yanjun; Dligach, Dmitriy; Miller, Timothy; Churpek, Matthew M; Uzuner, Ozlem; Afshar, Majid.

J Biomed Inform ; 142: 104346, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37061012

RESUMO

Daily progress notes are a common note type in the electronic health record (EHR) where healthcare providers document the patient's daily progress and treatment plans. The EHR is designed to document all the care provided to patients, but it also enables note bloat with extraneous information that distracts from the diagnoses and treatment plans. Applications of natural language processing (NLP) in the EHR is a growing field with the majority of methods in information extraction. Few tasks use NLP methods for downstream diagnostic decision support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3: Progress Note Understanding - Assessment and Plan Reasoning as one step towards a new suite of tasks. The Assessment and Plan Reasoning task focuses on the most critical components of progress notes, Assessment and Plan subsections where health problems and diagnoses are contained. The goal of the task was to develop and evaluate NLP systems that automatically predict causal relations between the overall status of the patient contained in the Assessment section and its relation to each component of the Plan section which contains the diagnoses and treatment plans. The goal of the task was to identify and prioritize diagnoses as the first steps in diagnostic decision support to find the most relevant information in long documents like daily progress notes. We present the results of the 2022 N2C2 Track 3 and provide a description of the data, evaluation, participation and system performance.

Assuntos

Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Humanos , Processamento de Linguagem Natural , Pessoal de Saúde

11.

DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing.

Gao, Yanjun; Dligach, Dmitriy; Miller, Timothy; Caskey, John; Sharma, Brihat; Churpek, Matthew M; Afshar, Majid.

J Biomed Inform ; 138: 104286, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36706848

RESUMO

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgement that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, Dr.Bench, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models for diagnostic reasoning. The goal of DR. BENCH is to advance the science in cNLP to support downstream applications in computerized diagnostic decision support and improve the efficiency and accuracy of healthcare providers during patient care. We fine-tune and evaluate the state-of-the-art generative models on DR.BENCH. Experiments show that with domain adaptation pre-training on medical knowledge, the model demonstrated opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community. We also discuss the carbon footprint produced during the experiments and encourage future work on DR.BENCH to report the carbon footprint.

Assuntos

Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Benchmarking , Resolução de Problemas , Armazenamento e Recuperação da Informação

12.

The Evaluation of a Clinical Decision Support Tool Using Natural Language Processing to Screen Hospitalized Adults for Unhealthy Substance Use: Protocol for a Quasi-Experimental Design.

Joyce, Cara; Markossian, Talar W; Nikolaides, Jenna; Ramsey, Elisabeth; Thompson, Hale M; Rojas, Juan C; Sharma, Brihat; Dligach, Dmitriy; Oguss, Madeline K; Cooper, Richard S; Afshar, Majid.

JMIR Res Protoc ; 11(12): e42971, 2022 Dec 19.

Artigo em Inglês | MEDLINE | ID: mdl-36534461

RESUMO

BACKGROUND: Automated and data-driven methods for screening using natural language processing (NLP) and machine learning may replace resource-intensive manual approaches in the usual care of patients hospitalized with conditions related to unhealthy substance use. The rigorous evaluation of tools that use artificial intelligence (AI) is necessary to demonstrate effectiveness before system-wide implementation. An NLP tool to use routinely collected data in the electronic health record was previously validated for diagnostic accuracy in a retrospective study for screening unhealthy substance use. Our next step is a noninferiority design incorporated into a research protocol for clinical implementation with prospective evaluation of clinical effectiveness in a large health system. OBJECTIVE: This study aims to provide a study protocol to evaluate health outcomes and the costs and benefits of an AI-driven automated screener compared to manual human screening for unhealthy substance use. METHODS: A pre-post design is proposed to evaluate 12 months of manual screening followed by 12 months of automated screening across surgical and medical wards at a single medical center. The preintervention period consists of usual care with manual screening by nurses and social workers and referrals to a multidisciplinary Substance Use Intervention Team (SUIT). Facilitated by a NLP pipeline in the postintervention period, clinical notes from the first 24 hours of hospitalization will be processed and scored by a machine learning model, and the SUIT will be similarly alerted to patients who flagged positive for substance misuse. Flowsheets within the electronic health record have been updated to capture rates of interventions for the primary outcome (brief intervention/motivational interviewing, medication-assisted treatment, naloxone dispensing, and referral to outpatient care). Effectiveness in terms of patient outcomes will be determined by noninferior rates of interventions (primary outcome), as well as rates of readmission within 6 months, average time to consult, and discharge rates against medical advice (secondary outcomes) in the postintervention period by a SUIT compared to the preintervention period. A separate analysis will be performed to assess the costs and benefits to the health system by using automated screening. Changes from the pre- to postintervention period will be assessed in covariate-adjusted generalized linear mixed-effects models. RESULTS: The study will begin in September 2022. Monthly data monitoring and Data Safety Monitoring Board reporting are scheduled every 6 months throughout the study period. We anticipate reporting final results by June 2025. CONCLUSIONS: The use of augmented intelligence for clinical decision support is growing with an increasing number of AI tools. We provide a research protocol for prospective evaluation of an automated NLP system for screening unhealthy substance use using a noninferiority design to demonstrate comprehensive screening that may be as effective as manual screening but less costly via automated solutions. TRIAL REGISTRATION: ClinicalTrials.gov NCT03833804; https://clinicaltrials.gov/ct2/show/NCT03833804. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/42971.

13.

Summarizing Patients' Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models.

Gao, Yanjun; Miller, Timothy; Xu, Dongfang; Dligach, Dmitriy; Churpek, Matthew M; Afshar, Majid.

Proc Int Conf Comput Ling ; 2022: 2979-2991, 2022 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-36268128

RESUMO

Automatically summarizing patients' main problems from daily progress notes using natural language processing methods helps to battle against information and cognitive overload in hospital settings and potentially assists providers with computerized diagnostic decision support. Problem list summarization requires a model to understand, abstract, and generate clinical documentation. In this work, we propose a new NLP task that aims to generate a list of problems in a patient's daily care plan using input from the provider's progress notes during hospitalization. We investigate the performance of T5 and BART, two state-of-the-art seq2seq transformer architectures, in solving this problem. We provide a corpus built on top of progress notes from publicly available electronic health record progress notes in the Medical Information Mart for Intensive Care (MIMIC)-III. T5 and BART are trained on general domain text, and we experiment with a data augmentation method and a domain adaptation pre-training method to increase exposure to medical vocabulary and knowledge. Evaluation methods include ROUGE, BERTScore, cosine similarity on sentence embedding, and F-score on medical concepts. Results show that T5 with domain adaptive pre-training achieves significant performance gains compared to a rule-based system and general domain pre-trained language models, indicating a promising direction for tackling the problem summarization task.

14.

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding.

Gao, Yanjun; Dligach, Dmitriy; Miller, Timothy; Tesch, Samuel; Laffin, Ryan; Churpek, Matthew M; Afshar, Majid.

LREC Int Conf Lang Resour Eval ; 2022: 5484-5493, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35939277

RESUMO

Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization. We created an annotated corpus based on an extensive collection of publicly available daily progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization.

15.

A scoping review of publicly available language tasks in clinical natural language processing.

Gao, Yanjun; Dligach, Dmitriy; Christensen, Leslie; Tesch, Samuel; Laffin, Ryan; Xu, Dongfang; Miller, Timothy; Uzuner, Ozlem; Churpek, Matthew M; Afshar, Majid.

J Am Med Inform Assoc ; 29(10): 1797-1806, 2022 09 12.

Artigo em Inglês | MEDLINE | ID: mdl-35923088

RESUMO

OBJECTIVE: To provide a scoping review of papers on clinical natural language processing (NLP) shared tasks that use publicly available electronic health record data from a cohort of patients. MATERIALS AND METHODS: We searched 6 databases, including biomedical research and computer science literature databases. A round of title/abstract screening and full-text screening were conducted by 2 reviewers. Our method followed the PRISMA-ScR guidelines. RESULTS: A total of 35 papers with 48 clinical NLP tasks met inclusion criteria between 2007 and 2021. We categorized the tasks by the type of NLP problems, including named entity recognition, summarization, and other NLP tasks. Some tasks were introduced as potential clinical decision support applications, such as substance abuse detection, and phenotyping. We summarized the tasks by publication venue and dataset type. DISCUSSION: The breadth of clinical NLP tasks continues to grow as the field of NLP evolves with advancements in language systems. However, gaps exist with divergent interests between the general domain NLP community and the clinical informatics community for task motivation and design, and in generalizability of the data sources. We also identified issues in data preparation. CONCLUSION: The existing clinical NLP tasks cover a wide range of topics and the field is expected to grow and attract more attention from both general domain NLP and clinical informatics community. We encourage future work to incorporate multidisciplinary collaboration, reporting transparency, and standardization in data preparation. We provide a listing of all the shared task papers and datasets from this review in a GitLab repository.

Assuntos

Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Coleta de Dados , Gerenciamento de Dados , Humanos , Armazenamento e Recuperação da Informação

16.

Development and multimodal validation of a substance misuse algorithm for referral to treatment using artificial intelligence (SMART-AI): a retrospective deep learning study.

Afshar, Majid; Sharma, Brihat; Dligach, Dmitriy; Oguss, Madeline; Brown, Randall; Chhabra, Neeraj; Thompson, Hale M; Markossian, Talar; Joyce, Cara; Churpek, Matthew M; Karnik, Niranjan S.

Lancet Digit Health ; 4(6): e426-e435, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35623797

RESUMO

BACKGROUND: Substance misuse is a heterogeneous and complex set of behavioural conditions that are highly prevalent in hospital settings and frequently co-occur. Few hospital-wide solutions exist to comprehensively and reliably identify these conditions to prioritise care and guide treatment. The aim of this study was to apply natural language processing (NLP) to clinical notes collected in the electronic health record (EHR) to accurately screen for substance misuse. METHODS: The model was trained and developed on a reference dataset derived from a hospital-wide programme at Rush University Medical Center (RUMC), Chicago, IL, USA, that used structured diagnostic interviews to manually screen admitted patients over 27 months (between Oct 1, 2017, and Dec 31, 2019; n=54â915). The Alcohol Use Disorder Identification Test and Drug Abuse Screening Tool served as reference standards. The first 24 h of notes in the EHR were mapped to standardised medical vocabulary and fed into single-label, multilabel, and multilabel with auxillary-task neural network models. Temporal validation of the model was done using data from the subsequent 12 months on a subset of RUMC patients (n=16â917). External validation was done using data from Loyola University Medical Center, Chicago, IL, USA between Jan 1, 2007, and Sept 30, 2017 (n=1991 adult patients). The primary outcome was discrimination for alcohol misuse, opioid misuse, or non-opioid drug misuse. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Calibration slope and intercept were measured with the unreliability index. Bias assessments were performed across demographic subgroups. FINDINGS: The model was trained on a cohort that had 3·5% misuse (n=1â921) with any type of substance. 220 (11%) of 1921 patients with substance misuse had more than one type of misuse. The multilabel convolutional neural network classifier had a mean AUROC of 0·97 (95% CI 0·96-0·98) during temporal validation for all types of substance misuse. The model was well calibrated and showed good face validity with model features containing explicit mentions of aberrant drug-taking behaviour. A false-negative rate of 0·18-0·19 and a false-positive rate of 0·03 between non-Hispanic Black and non-Hispanic White groups occurred. In external validation, the AUROCs for alcohol and opioid misuse were 0·88 (95% CI 0·86-0·90) and 0·94 (0·92-0·95), respectively. INTERPRETATION: We developed a novel and accurate approach to leveraging the first 24 h of EHR notes for screening multiple types of substance misuse. FUNDING: National Institute On Drug Abuse, National Institutes of Health.

Assuntos

Alcoolismo , Aprendizado Profundo , Transtornos Relacionados ao Uso de Opioides , Adulto , Alcoolismo/complicações , Alcoolismo/diagnóstico , Alcoolismo/terapia , Inteligência Artificial , Humanos , Encaminhamento e Consulta , Estudos Retrospectivos , Estados Unidos

17.

Correction: Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline.

Caskey, John; McConnell, Iain L; Oguss, Madeline; Dligach, Dmitriy; Kulikoff, Rachel; Grogan, Brittany; Gibson, Crystal; Wimmer, Elizabeth; DeSalvo, Traci E; Nyakoe-Nyasani, Edwin E; Churpek, Matthew M; Afshar, Majid.

JMIR Public Health Surveill ; 8(3): e37893, 2022 Mar 24.

Artigo em Inglês | MEDLINE | ID: mdl-35324453

RESUMO

[This corrects the article DOI: 10.2196/36119.].

18.

Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline.

Caskey, John; McConnell, Iain L; Oguss, Madeline; Dligach, Dmitriy; Kulikoff, Rachel; Grogan, Brittany; Gibson, Crystal; Wimmer, Elizabeth; DeSalvo, Traci E; Nyakoe-Nyasani, Edwin E; Churpek, Matthew M; Afshar, Majid.

JMIR Public Health Surveill ; 8(3): e36119, 2022 03 08.

Artigo em Inglês | MEDLINE | ID: mdl-35144241

RESUMO

BACKGROUND: In Wisconsin, COVID-19 case interview forms contain free-text fields that need to be mined to identify potential outbreaks for targeted policy making. We developed an automated pipeline to ingest the free text into a pretrained neural language model to identify businesses and facilities as outbreaks. OBJECTIVE: We aimed to examine the precision and recall of our natural language processing pipeline against existing outbreaks and potentially new clusters. METHODS: Data on cases of COVID-19 were extracted from the Wisconsin Electronic Disease Surveillance System (WEDSS) for Dane County between July 1, 2020, and June 30, 2021. Features from the case interview forms were fed into a Bidirectional Encoder Representations from Transformers (BERT) model that was fine-tuned for named entity recognition (NER). We also developed a novel location-mapping tool to provide addresses for relevant NER. Precision and recall were measured against manually verified outbreaks and valid addresses in WEDSS. RESULTS: There were 46,798 cases of COVID-19, with 4,183,273 total BERT tokens and 15,051 unique tokens. The recall and precision of the NER tool were 0.67 (95% CI 0.66-0.68) and 0.55 (95% CI 0.54-0.57), respectively. For the location-mapping tool, the recall and precision were 0.93 (95% CI 0.92-0.95) and 0.93 (95% CI 0.92-0.95), respectively. Across monthly intervals, the NER tool identified more potential clusters than were verified in WEDSS. CONCLUSIONS: We developed a novel pipeline of tools that identified existing outbreaks and novel clusters with associated addresses. Our pipeline ingests data from a statewide database and may be deployed to assist local health departments for targeted interventions.

Assuntos

COVID-19 , Processamento de Linguagem Natural , COVID-19/epidemiologia , Busca de Comunicante , Surtos de Doenças , Humanos , Saúde Pública , SARS-CoV-2

19.

Geometric Features Associated with Middle Cerebral Artery Bifurcation Aneurysm Formation: A Matched Case-Control Study.

Zhang, Jian; Can, Anil; Lai, Pui Man Rosalind; Mukundan, Srinivasan; Castro, Victor M; Dligach, Dmitriy; Finan, Sean; Gainer, Vivian S; Shadick, Nancy A; Savova, Guergana; Murphy, Shawn N; Cai, Tianxi; Weiss, Scott T; Du, Rose.

J Stroke Cerebrovasc Dis ; 31(3): 106268, 2022 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-34974241

RESUMO

OBJECTIVES: The pathogenesis of intracranial aneurysms is multifactorial and includes genetic, environmental, and anatomic influences. We aimed to identify image-based morphological parameters that were associated with middle cerebral artery (MCA) bifurcation aneurysms. MATERIALS AND METHODS: We evaluated three-dimensional morphological parameters obtained from CT angiography (CTA) or digital subtraction angiography (DSA) from 317 patients with unilateral MCA bifurcation aneurysms diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016. We chose the contralateral unaffected MCA bifurcation as the control group, in order to control for genetic and environmental risk factors. Diameters and angles of surrounding parent and daughter vessels of 634 MCAs were examined. RESULTS: Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses with smaller (≤ 3 mm) aneurysms only and with angles excluded, were also performed. In a multivariable conditional logistic regression model we showed that smaller diameter size ratio (OR 0.0004, 95% CI 0.0001-0.15), larger daughter-daughter angles (OR 1.08, 95% CI 1.06-1.11) and larger parent-daughter angle ratios (OR 4.24, 95% CI 1.77-10.16) were significantly associated with MCA aneurysm presence after correcting for other variables. In order to account for possible changes to the vasculature by the aneurysm, a subgroup analysis of small aneurysms (≤ 3 mm) was performed and showed that the results were similar. CONCLUSIONS: Easily measurable morphological parameters of the surrounding vasculature of the MCA may provide objective metrics to assess MCA aneurysm formation risk in high-risk patients.

Assuntos

Aneurisma Intracraniano , Artéria Cerebral Média , Estudos de Casos e Controles , Angiografia por Tomografia Computadorizada , Feminino , Humanos , Aneurisma Intracraniano/diagnóstico por imagem , Artéria Cerebral Média/diagnóstico por imagem

20.

Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups.

Thompson, Hale M; Sharma, Brihat; Bhalla, Sameer; Boley, Randy; McCluskey, Connor; Dligach, Dmitriy; Churpek, Matthew M; Karnik, Niranjan S; Afshar, Majid.

J Am Med Inform Assoc ; 28(11): 2393-2403, 2021 10 12.

Artigo em Inglês | MEDLINE | ID: mdl-34383925

RESUMO

OBJECTIVES: To assess fairness and bias of a previously validated machine learning opioid misuse classifier. MATERIALS & METHODS: Two experiments were conducted with the classifier's original (n = 1000) and external validation (n = 53 974) datasets from 2 health systems. Bias was assessed via testing for differences in type II error rates across racial/ethnic subgroups (Black, Hispanic/Latinx, White, Other) using bootstrapped 95% confidence intervals. A local surrogate model was estimated to interpret the classifier's predictions by race and averaged globally from the datasets. Subgroup analyses and post-hoc recalibrations were conducted to attempt to mitigate biased metrics. RESULTS: We identified bias in the false negative rate (FNR = 0.32) of the Black subgroup compared to the FNR (0.17) of the White subgroup. Top features included "heroin" and "substance abuse" across subgroups. Post-hoc recalibrations eliminated bias in FNR with minimal changes in other subgroup error metrics. The Black FNR subgroup had higher risk scores for readmission and mortality than the White FNR subgroup, and a higher mortality risk score than the Black true positive subgroup (P < .05). DISCUSSION: The Black FNR subgroup had the greatest severity of disease and risk for poor outcomes. Similar features were present between subgroups for predicting opioid misuse, but inequities were present. Post-hoc mitigation techniques mitigated bias in type II error rate without creating substantial type I error rates. From model design through deployment, bias and data disadvantages should be systematically addressed. CONCLUSION: Standardized, transparent bias assessments are needed to improve trustworthiness in clinical machine learning models.

Assuntos

Processamento de Linguagem Natural , Transtornos Relacionados ao Uso de Opioides , Registros Eletrônicos de Saúde , Hispânico ou Latino , Humanos , Aprendizado de Máquina

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA