Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
J Biomed Inform ; 138: 104286, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36706848

RESUMEN

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgement that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, Dr.Bench, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models for diagnostic reasoning. The goal of DR. BENCH is to advance the science in cNLP to support downstream applications in computerized diagnostic decision support and improve the efficiency and accuracy of healthcare providers during patient care. We fine-tune and evaluate the state-of-the-art generative models on DR.BENCH. Experiments show that with domain adaptation pre-training on medical knowledge, the model demonstrated opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community. We also discuss the carbon footprint produced during the experiments and encourage future work on DR.BENCH to report the carbon footprint.


Asunto(s)
Inteligencia Artificial , Procesamiento de Lenguaje Natural , Humanos , Benchmarking , Solución de Problemas , Almacenamiento y Recuperación de la Información
2.
BMC Med Inform Decis Mak ; 20(1): 79, 2020 04 29.
Artículo en Inglés | MEDLINE | ID: mdl-32349766

RESUMEN

BACKGROUND: Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. METHODS: An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. RESULTS: Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms 'Heroin' and 'Victim of abuse'. CONCLUSIONS: We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.


Asunto(s)
Aprendizaje Automático , Procesamiento de Lenguaje Natural , Trastornos Relacionados con Opioides/diagnóstico , Adulto , Registros Electrónicos de Salud , Humanos , Pacientes Internos , Registros Médicos , Unified Medical Language System
3.
NPJ Digit Med ; 7(1): 227, 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39251868

RESUMEN

Transferring and replicating predictive algorithms across healthcare systems constitutes a unique yet crucial challenge that needs to be addressed to enable the widespread adoption of machine learning in healthcare. In this study, we explored the impact of important differences across healthcare systems and the associated Electronic Health Records (EHRs) on machine-learning algorithms to predict mental health crises, up to 28 days in advance. We evaluated both the transferability and replicability of such machine learning models, and for this purpose, we trained six models using features and methods developed on EHR data from the Birmingham and Solihull Mental Health NHS Foundation Trust in the UK. These machine learning models were then used to predict the mental health crises of 2907 patients seen at the Rush University System for Health in the US between 2018 and 2020. The best one was trained on a combination of US-specific structured features and frequency features from anonymized patient notes and achieved an AUROC of 0.837. A model with comparable performance, originally trained using UK structured data, was transferred and then tuned using US data, achieving an AUROC of 0.826. Our findings establish the feasibility of transferring and replicating machine learning models to predict mental health crises across diverse hospital systems.

4.
Addiction ; 119(4): 766-771, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38011858

RESUMEN

BACKGROUND AND AIMS: Accurate case discovery is critical for disease surveillance, resource allocation and research. International Classification of Disease (ICD) diagnosis codes are commonly used for this purpose. We aimed to determine the sensitivity, specificity and positive predictive value (PPV) of ICD-10 codes for opioid misuse case discovery in the emergency department (ED) setting. DESIGN AND SETTING: Retrospective cohort study of ED encounters from January 2018 to December 2020 at an urban academic hospital in the United States. A sample of ED encounters enriched for opioid misuse was developed by oversampling ED encounters with positive urine opiate screens or pre-existing opioid-related diagnosis codes in addition to other opioid misuse risk factors. CASES: A total of 1200 randomly selected encounters were annotated by research staff for the presence of opioid misuse within health record documentation using a 5-point scale for likelihood of opioid misuse and dichotomized into cohorts of opioid misuse and no opioid misuse. MEASUREMENTS: Using manual annotation as ground truth, the sensitivity and specificity of ICD-10 codes entered during the encounter were determined with PPV adjusted for oversampled data. Metrics were also determined by disposition subgroup: discharged home or admitted. FINDINGS: There were 541 encounters annotated as opioid misuse and 617 with no opioid misuse. The majority were males (54.4%), average age was 47 years and 68.5% were discharged directly from the ED. The sensitivity of ICD-10 codes was 0.56 (95% confidence interval [CI], 0.51-0.60), specificity 0.99 (95% CI, 0.97-0.99) and adjusted PPV 0.78 (95% CI, 0.65-0.92). The sensitivity was higher for patients discharged from the ED (0.65; 95% CI, 0.60-0.69) than those admitted (0.31; 95% CI, 0.24-0.39). CONCLUSIONS: International Classification of Disease-10 codes appear to have low sensitivity but high specificity and positive predictive value in detecting opioid misuse among emergency department patients in the United States.


Asunto(s)
Clasificación Internacional de Enfermedades , Trastornos Relacionados con Opioides , Masculino , Humanos , Estados Unidos/epidemiología , Persona de Mediana Edad , Femenino , Estudios Retrospectivos , Trastornos Relacionados con Opioides/diagnóstico , Trastornos Relacionados con Opioides/epidemiología , Valor Predictivo de las Pruebas , Servicio de Urgencia en Hospital
5.
Proc Conf Assoc Comput Linguist Meet ; 2023(ClinicalNLP): 78-85, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37492270

RESUMEN

Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically-trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.

6.
JMIR Med Inform ; 11: e44977, 2023 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-37079367

RESUMEN

BACKGROUND: The clinical narrative in electronic health records (EHRs) carries valuable information for predictive analytics; however, its free-text form is difficult to mine and analyze for clinical decision support (CDS). Large-scale clinical natural language processing (NLP) pipelines have focused on data warehouse applications for retrospective research efforts. There remains a paucity of evidence for implementing NLP pipelines at the bedside for health care delivery. OBJECTIVE: We aimed to detail a hospital-wide, operational pipeline to implement a real-time NLP-driven CDS tool and describe a protocol for an implementation framework with a user-centered design of the CDS tool. METHODS: The pipeline integrated a previously trained open-source convolutional neural network model for screening opioid misuse that leveraged EHR notes mapped to standardized medical vocabularies in the Unified Medical Language System. A sample of 100 adult encounters were reviewed by a physician informaticist for silent testing of the deep learning algorithm before deployment. An end user interview survey was developed to examine the user acceptability of a best practice alert (BPA) to provide the screening results with recommendations. The planned implementation also included a human-centered design with user feedback on the BPA, an implementation framework with cost-effectiveness, and a noninferiority patient outcome analysis plan. RESULTS: The pipeline was a reproducible workflow with a shared pseudocode for a cloud service to ingest, process, and store clinical notes as Health Level 7 messages from a major EHR vendor in an elastic cloud computing environment. Feature engineering of the notes used an open-source NLP engine, and the features were fed into the deep learning algorithm, with the results returned as a BPA in the EHR. On-site silent testing of the deep learning algorithm demonstrated a sensitivity of 93% (95% CI 66%-99%) and specificity of 92% (95% CI 84%-96%), similar to published validation studies. Before deployment, approvals were received across hospital committees for inpatient operations. Five interviews were conducted; they informed the development of an educational flyer and further modified the BPA to exclude certain patients and allow the refusal of recommendations. The longest delay in pipeline development was because of cybersecurity approvals, especially because of the exchange of protected health information between the Microsoft (Microsoft Corp) and Epic (Epic Systems Corp) cloud vendors. In silent testing, the resultant pipeline provided a BPA to the bedside within minutes of a provider entering a note in the EHR. CONCLUSIONS: The components of the real-time NLP pipeline were detailed with open-source tools and pseudocode for other health systems to benchmark. The deployment of medical artificial intelligence systems in routine clinical care presents an important yet unfulfilled opportunity, and our protocol aimed to close the gap in the implementation of artificial intelligence-driven CDS. TRIAL REGISTRATION: ClinicalTrials.gov NCT05745480; https://www.clinicaltrials.gov/ct2/show/NCT05745480.

7.
Artículo en Inglés | MEDLINE | ID: mdl-35886733

RESUMEN

The emergency department (ED) is a critical setting for the treatment of patients with opioid misuse. Detecting relevant clinical profiles allows for tailored treatment approaches. We sought to identify and characterize subphenotypes of ED patients with opioid-related encounters. A latent class analysis was conducted using 14,057,302 opioid-related encounters from 2016 through 2017 using the National Emergency Department Sample (NEDS), the largest all-payer ED database in the United States. The optimal model was determined by face validity and information criteria-based metrics. A three-step approach assessed class structure, assigned individuals to classes, and examined characteristics between classes. Class associations were determined for hospitalization, in-hospital death, and ED charges. The final five-class model consisted of the following subphenotypes: Chronic pain (class 1); Alcohol use (class 2); Depression and pain (class 3); Psychosis, liver disease, and polysubstance use (class 4); and Pregnancy (class 5). Using class 1 as the reference, the greatest odds for hospitalization occurred in classes 3 and 4 (Ors 5.24 and 5.33, p < 0.001) and for in-hospital death in class 4 (OR 3.44, p < 0.001). Median ED charges ranged from USD 2177 (class 1) to USD 2881 (class 4). These subphenotypes provide a basis for examining patient-tailored approaches for this patient population.


Asunto(s)
Analgésicos Opioides , Servicio de Urgencia en Hospital , Analgésicos Opioides/uso terapéutico , Mortalidad Hospitalaria , Humanos , Análisis de Clases Latentes , Evaluación de Resultado en la Atención de Salud , Estados Unidos
8.
Addiction ; 117(4): 925-933, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34729829

RESUMEN

BACKGROUND AND AIMS: Unhealthy alcohol use (UAU) is one of the leading causes of global morbidity. A machine learning approach to alcohol screening could accelerate best practices when integrated into electronic health record (EHR) systems. This study aimed to validate externally a natural language processing (NLP) classifier developed at an independent medical center. DESIGN: Retrospective cohort study. SETTING: The site for validation was a midwestern United States tertiary-care, urban medical center that has an inpatient structured universal screening model for unhealthy substance use and an active addiction consult service. PARTICIPANTS/CASES: Unplanned admissions of adult patients between October 23, 2017 and December 31, 2019, with EHR documentation of manual alcohol screening were included in the cohort (n = 57 605). MEASUREMENTS: The Alcohol Use Disorders Identification Test (AUDIT) served as the reference standard. AUDIT scores ≥5 for females and ≥8 for males served as cases for UAU. To examine error in manual screening or under-reporting, a post hoc error analysis was conducted, reviewing discordance between the NLP classifier and AUDIT-derived reference. All clinical notes excluding the manual screening and AUDIT documentation from the EHR were included in the NLP analysis. FINDINGS: Using clinical notes from the first 24 hours of each encounter, the NLP classifier demonstrated an area under the receiver operating characteristic curve (AUCROC) and precision-recall area under the curve (PRAUC) of 0.91 (95% CI = 0.89-0.92) and 0.56 (95% CI = 0.53-0.60), respectively. At the optimal cut point of 0.5, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 0.66 (95% CI = 0.62-0.69), 0.98 (95% CI = 0.98-0.98), 0.35 (95% CI = 0.33-0.38), and 1.0 (95% CI = 1.0-1.0), respectively. CONCLUSIONS: External validation of a publicly available alcohol misuse classifier demonstrates adequate sensitivity and specificity for routine clinical use as an automated screening tool for identifying at-risk patients.


Asunto(s)
Alcoholismo , Adulto , Consumo de Bebidas Alcohólicas , Alcoholismo/diagnóstico , Etanol , Femenino , Humanos , Aprendizaje Automático , Masculino , Procesamiento de Lenguaje Natural , Estudios Retrospectivos
9.
Lancet Digit Health ; 4(6): e426-e435, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35623797

RESUMEN

BACKGROUND: Substance misuse is a heterogeneous and complex set of behavioural conditions that are highly prevalent in hospital settings and frequently co-occur. Few hospital-wide solutions exist to comprehensively and reliably identify these conditions to prioritise care and guide treatment. The aim of this study was to apply natural language processing (NLP) to clinical notes collected in the electronic health record (EHR) to accurately screen for substance misuse. METHODS: The model was trained and developed on a reference dataset derived from a hospital-wide programme at Rush University Medical Center (RUMC), Chicago, IL, USA, that used structured diagnostic interviews to manually screen admitted patients over 27 months (between Oct 1, 2017, and Dec 31, 2019; n=54 915). The Alcohol Use Disorder Identification Test and Drug Abuse Screening Tool served as reference standards. The first 24 h of notes in the EHR were mapped to standardised medical vocabulary and fed into single-label, multilabel, and multilabel with auxillary-task neural network models. Temporal validation of the model was done using data from the subsequent 12 months on a subset of RUMC patients (n=16 917). External validation was done using data from Loyola University Medical Center, Chicago, IL, USA between Jan 1, 2007, and Sept 30, 2017 (n=1991 adult patients). The primary outcome was discrimination for alcohol misuse, opioid misuse, or non-opioid drug misuse. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Calibration slope and intercept were measured with the unreliability index. Bias assessments were performed across demographic subgroups. FINDINGS: The model was trained on a cohort that had 3·5% misuse (n=1 921) with any type of substance. 220 (11%) of 1921 patients with substance misuse had more than one type of misuse. The multilabel convolutional neural network classifier had a mean AUROC of 0·97 (95% CI 0·96-0·98) during temporal validation for all types of substance misuse. The model was well calibrated and showed good face validity with model features containing explicit mentions of aberrant drug-taking behaviour. A false-negative rate of 0·18-0·19 and a false-positive rate of 0·03 between non-Hispanic Black and non-Hispanic White groups occurred. In external validation, the AUROCs for alcohol and opioid misuse were 0·88 (95% CI 0·86-0·90) and 0·94 (0·92-0·95), respectively. INTERPRETATION: We developed a novel and accurate approach to leveraging the first 24 h of EHR notes for screening multiple types of substance misuse. FUNDING: National Institute On Drug Abuse, National Institutes of Health.


Asunto(s)
Alcoholismo , Aprendizaje Profundo , Trastornos Relacionados con Opioides , Adulto , Alcoholismo/complicaciones , Alcoholismo/diagnóstico , Alcoholismo/terapia , Inteligencia Artificial , Humanos , Derivación y Consulta , Estudios Retrospectivos , Estados Unidos
10.
JMIR Res Protoc ; 11(12): e42971, 2022 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-36534461

RESUMEN

BACKGROUND: Automated and data-driven methods for screening using natural language processing (NLP) and machine learning may replace resource-intensive manual approaches in the usual care of patients hospitalized with conditions related to unhealthy substance use. The rigorous evaluation of tools that use artificial intelligence (AI) is necessary to demonstrate effectiveness before system-wide implementation. An NLP tool to use routinely collected data in the electronic health record was previously validated for diagnostic accuracy in a retrospective study for screening unhealthy substance use. Our next step is a noninferiority design incorporated into a research protocol for clinical implementation with prospective evaluation of clinical effectiveness in a large health system. OBJECTIVE: This study aims to provide a study protocol to evaluate health outcomes and the costs and benefits of an AI-driven automated screener compared to manual human screening for unhealthy substance use. METHODS: A pre-post design is proposed to evaluate 12 months of manual screening followed by 12 months of automated screening across surgical and medical wards at a single medical center. The preintervention period consists of usual care with manual screening by nurses and social workers and referrals to a multidisciplinary Substance Use Intervention Team (SUIT). Facilitated by a NLP pipeline in the postintervention period, clinical notes from the first 24 hours of hospitalization will be processed and scored by a machine learning model, and the SUIT will be similarly alerted to patients who flagged positive for substance misuse. Flowsheets within the electronic health record have been updated to capture rates of interventions for the primary outcome (brief intervention/motivational interviewing, medication-assisted treatment, naloxone dispensing, and referral to outpatient care). Effectiveness in terms of patient outcomes will be determined by noninferior rates of interventions (primary outcome), as well as rates of readmission within 6 months, average time to consult, and discharge rates against medical advice (secondary outcomes) in the postintervention period by a SUIT compared to the preintervention period. A separate analysis will be performed to assess the costs and benefits to the health system by using automated screening. Changes from the pre- to postintervention period will be assessed in covariate-adjusted generalized linear mixed-effects models. RESULTS: The study will begin in September 2022. Monthly data monitoring and Data Safety Monitoring Board reporting are scheduled every 6 months throughout the study period. We anticipate reporting final results by June 2025. CONCLUSIONS: The use of augmented intelligence for clinical decision support is growing with an increasing number of AI tools. We provide a research protocol for prospective evaluation of an automated NLP system for screening unhealthy substance use using a noninferiority design to demonstrate comprehensive screening that may be as effective as manual screening but less costly via automated solutions. TRIAL REGISTRATION: ClinicalTrials.gov NCT03833804; https://clinicaltrials.gov/ct2/show/NCT03833804. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/42971.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda