Búsqueda | BVS MTCI Américas

1.

Identification of recurrent atrial fibrillation using natural language processing applied to electronic health records.

Zheng, Chengyi; Lee, Ming-Sum; Bansal, Nisha; Go, Alan S; Chen, Cheng; Harrison, Teresa N; Fan, Dongjie; Allen, Amanda; Garcia, Elisha; Lidgard, Ben; Singer, Daniel; An, Jaejin.

Eur Heart J Qual Care Clin Outcomes ; 10(1): 77-88, 2024 Jan 12.

Artículo en Inglés | MEDLINE | ID: mdl-36997334

RESUMEN

AIMS: This study aimed to develop and apply natural language processing (NLP) algorithms to identify recurrent atrial fibrillation (AF) episodes following rhythm control therapy initiation using electronic health records (EHRs). METHODS AND RESULTS: We included adults with new-onset AF who initiated rhythm control therapies (ablation, cardioversion, or antiarrhythmic medication) within two US integrated healthcare delivery systems. A code-based algorithm identified potential AF recurrence using diagnosis and procedure codes. An automated NLP algorithm was developed and validated to capture AF recurrence from electrocardiograms, cardiac monitor reports, and clinical notes. Compared with the reference standard cases confirmed by physicians' adjudication, the F-scores, sensitivity, and specificity were all above 0.90 for the NLP algorithms at both sites. We applied the NLP and code-based algorithms to patients with incident AF (n = 22 970) during the 12 months after initiating rhythm control therapy. Applying the NLP algorithms, the percentages of patients with AF recurrence for sites 1 and 2 were 60.7% and 69.9% (ablation), 64.5% and 73.7% (cardioversion), and 49.6% and 55.5% (antiarrhythmic medication), respectively. In comparison, the percentages of patients with code-identified AF recurrence for sites 1 and 2 were 20.2% and 23.7% for ablation, 25.6% and 28.4% for cardioversion, and 20.0% and 27.5% for antiarrhythmic medication, respectively. CONCLUSION: When compared with a code-based approach alone, this study's high-performing automated NLP method identified significantly more patients with recurrent AF. The NLP algorithms could enable efficient evaluation of treatment effectiveness of AF therapies in large populations and help develop tailored interventions.

Asunto(s)

Fibrilación Atrial , Registros Electrónicos de Salud , Adulto , Humanos , Fibrilación Atrial/epidemiología , Fibrilación Atrial/terapia , Procesamiento de Lenguaje Natural , Resultado del Tratamiento , Algoritmos

2.

Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition.

Zhou, Huixue; Austin, Robin; Lu, Sheng-Chieh; Silverman, Greg Marc; Zhou, Yuqi; Kilicoglu, Halil; Xu, Hua; Zhang, Rui.

J Am Med Inform Assoc ; 31(2): 426-434, 2024 Jan 18.

Artículo en Inglés | MEDLINE | ID: mdl-37952122

RESUMEN

OBJECTIVE: To construct an exhaustive Complementary and Integrative Health (CIH) Lexicon (CIHLex) to help better represent the often underrepresented physical and psychological CIH approaches in standard terminologies, and to also apply state-of-the-art natural language processing (NLP) techniques to help recognize them in the biomedical literature. MATERIALS AND METHODS: We constructed the CIHLex by integrating various resources, compiling and integrating data from biomedical literature and relevant sources of knowledge. The Lexicon encompasses 724 unique concepts with 885 corresponding unique terms. We matched these concepts to the Unified Medical Language System (UMLS), and we developed and utilized BERT models comparing their efficiency in CIH named entity recognition to well-established models including MetaMap and CLAMP, as well as the large language model GPT3.5-turbo. RESULTS: Of the 724 unique concepts in CIHLex, 27.2% could be matched to at least one term in the UMLS. About 74.9% of the mapped UMLS Concept Unique Identifiers were categorized as "Therapeutic or Preventive Procedure." Among the models applied to CIH named entity recognition, BLUEBERT delivered the highest macro-average F1-score of 0.91, surpassing other models. CONCLUSION: Our CIHLex significantly augments representation of CIH approaches in biomedical literature. Demonstrating the utility of advanced NLP models, BERT notably excelled in CIH entity recognition. These results highlight promising strategies for enhancing standardization and recognition of CIH terminology in biomedical contexts.

Asunto(s)

Algoritmos , Unified Medical Language System , Procesamiento de Lenguaje Natural , Lenguaje

3.

Role play with large language models.

Shanahan, Murray; McDonell, Kyle; Reynolds, Laria.

Nature ; 623(7987): 493-498, 2023 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-37938776

RESUMEN

As dialogue agents become increasingly human-like in their performance, we must develop effective ways to describe their behaviour in high-level terms without falling into the trap of anthropomorphism. Here we foreground the concept of role play. Casting dialogue-agent behaviour in terms of role play allows us to draw on familiar folk psychological terms, without ascribing human characteristics to language models that they in fact lack. Two important cases of dialogue-agent behaviour are addressed this way, namely, (apparent) deception and (apparent) self-awareness.

Asunto(s)

Conducta Imitativa , Procesamiento de Lenguaje Natural , Terminología como Asunto , Humanos , Decepción , Autoevaluación (Psicología)

4.

Using natural language processing to characterize and predict homeopathic product-associated adverse events in consumer reviews: comparison to reports to FDA Adverse Event Reporting System (FAERS).

Konkel, Karen; Oner, Nurettin; Ahmed, Abdulaziz; Jones, S Christopher; Berner, Eta S; Zengul, Ferhat D.

J Am Med Inform Assoc ; 31(1): 70-78, 2023 12 22.

Artículo en Inglés | MEDLINE | ID: mdl-37847653

RESUMEN

OBJECTIVE: Apply natural language processing (NLP) to Amazon consumer reviews to identify adverse events (AEs) associated with unapproved over the counter (OTC) homeopathic drugs and compare findings with reports to the US Food and Drug Administration Adverse Event Reporting System (FAERS). MATERIALS AND METHODS: Data were extracted from publicly available Amazon reviews and analyzed using JMP 16 Pro Text Explorer. Topic modeling identified themes. Sentiment analysis (SA) explored consumer perceptions. A machine learning model optimized prediction of AEs in reviews. Reports for the same time interval and product class were obtained from the FAERS public dashboard and analyzed. RESULTS: Homeopathic cough/cold products were the largest category common to both data sources (Amazon = 616, FAERS = 445) and were analyzed further. Oral symptoms and unpleasant taste were described in both datasets. Amazon reviews describing an AE had lower Amazon ratings (X2 = 224.28, P < .0001). The optimal model for predicting AEs was Neural Boosted 5-fold combining topic modeling and Amazon ratings as predictors (mean AUC = 0.927). DISCUSSION: Topic modeling and SA of Amazon reviews provided information about consumers' perceptions and opinions of homeopathic OTC cough and cold products. Amazon ratings appear to be a good indicator of the presence or absence of AEs, and identified events were similar to FAERS. CONCLUSION: Amazon reviews may complement traditional data sources to identify AEs associated with unapproved OTC homeopathic products. This study is the first to use NLP in this context and lays the groundwork for future larger scale efforts.

Asunto(s)

Sistemas de Registro de Reacción Adversa a Medicamentos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Estados Unidos , Humanos , Procesamiento de Lenguaje Natural , Programas Informáticos , United States Food and Drug Administration , Tos

5.

Development of a novel drug information provision system for Kampo medicine using natural language processing technology.

Maeda-Minami, Ayako; Yoshino, Tetsuhiro; Yumoto, Tetsuro; Sato, Kayoko; Sagara, Atsunobu; Inaba, Kenjiro; Kominato, Hidenori; Kimura, Takao; Takishita, Tetsuya; Watanabe, Gen; Nakamura, Tomonori; Mano, Yasunari; Horiba, Yuko; Watanabe, Kenji; Kamei, Junzo.

BMC Med Inform Decis Mak ; 23(1): 119, 2023 07 13.

Artículo en Inglés | MEDLINE | ID: mdl-37442993

RESUMEN

BACKGROUND: Kampo medicine is widely used in Japan; however, most physicians and pharmacists have insufficient knowledge and experience in it. Although a chatbot-style system using machine learning and natural language processing has been used in some clinical settings and proven useful, the system developed specifically for the Japanese language using this method has not been validated by research. The purpose of this study is to develop a novel drug information provision system for Kampo medicines using a natural language classifier® (NLC®) based on IBM Watson. METHODS: The target Kampo formulas were 33 formulas listed in the 17th revision of the Japanese Pharmacopoeia. The information included in the system comes from the package inserts of Kampo medicines, Manuals for Management of Individual Serious Adverse Drug Reactions, and data on off-label usage. The system developed in this study classifies questions about the drug information of Kampo formulas input by natural language into preset questions and outputs preset answers for the questions. The system uses morphological analysis, synonym conversion by thesaurus, and NLC®. We fine-tuned the information registered into NLC® and increased the thesaurus. To validate the system, 900 validation questions were provided by six pharmacists who were classified into high or low levels of knowledge and experience of Kampo medicines and three pharmacy students. RESULTS: The precision, recall, and F-measure of the system performance were 0.986, 0.915, and 0.949, respectively. The results were stable even with differences in the amount of expertise of the question authors. CONCLUSIONS: We developed a system using natural language classification that can give appropriate answers to most of the validation questions.

Asunto(s)

Medicina Kampo , Médicos , Humanos , Procesamiento de Lenguaje Natural , Farmacéuticos , Tecnología , Japón

6.

Using Natural Language Processing and Machine Learning to Identify Internal Medicine-Pediatrics Residency Values in Applications.

Drum, Benjamin; Shi, Jianlin; Peterson, Bennet; Lamb, Sara; Hurdle, John F; Gradick, Casey.

Acad Med ; 98(11): 1278-1282, 2023 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-37506388

RESUMEN

PROBLEM: Although holistic review has been used successfully in some residency programs to decrease bias, such review is time-consuming and unsustainable for many programs without initial prescreening. The unstructured qualitative data in residency applications, including notable experiences, letters of recommendation, personal statement, and medical student performance evaluations, require extensive time, resources, and metrics to evaluate; therefore, previous applicant screening relied heavily on quantitative metrics, which can be socioeconomically and racially biased. APPROACH: Using residency applications to the University of Utah internal medicine-pediatrics program from 2015 to 2019, the authors extracted relevant snippets of text from the narrative sections of applications. Expert reviewers annotated these snippets into specific values (academic strength; intellectual curiosity; compassion; communication; work ethic; teamwork; leadership; self-awareness; diversity, equity, and inclusion; professionalism; and adaptability) previously identified as associated with resident success. The authors prospectively applied a machine learning model (MLM) to snippets from applications from 2023, and output was compared with a manual holistic review performed without knowledge of MLM results. OUTCOMES: Overall, the MLM had a sensitivity of 0.64, specificity of 0.97, positive predictive value of 0.62, negative predictive value of 0.97, and F1 score of 0.63. The mean (SD) total number of annotations per application was significantly correlated with invited for interview status (invited: 208.6 [59.1]; not invited: 145.2 [57.2]; P < .001). In addition, 8 of the 10 individual values were significantly predictive of an applicant's invited for interview status. NEXT STEPS: The authors created an MLM that can identify several values important for resident success in internal medicine-pediatrics programs with moderate sensitivity and high specificity. The authors will continue to refine the MLM by increasing the number of annotations, exploring parameter tuning and feature engineering options, and identifying which application sections have the highest correlation with invited for interview status.

Asunto(s)

Internado y Residencia , Humanos , Niño , Procesamiento de Lenguaje Natural , Medicina Interna/educación , Profesionalismo , Comunicación

7.

Patient Dietary Supplements Use: Do Results from Natural Language Processing of Clinical Notes Agree with Survey Data?

Redd, Douglas; Workman, Terri Elizabeth; Shao, Yijun; Cheng, Yan; Tekle, Senait; Garvin, Jennifer H; Brandt, Cynthia A; Zeng-Treitler, Qing.

Med Sci (Basel) ; 11(2)2023 05 23.

Artículo en Inglés | MEDLINE | ID: mdl-37367736

RESUMEN

There is widespread use of dietary supplements, some prescribed but many taken without a physician's guidance. There are many potential interactions between supplements and both over-the-counter and prescription medications in ways that are unknown to patients. Structured medical records do not adequately document supplement use; however, unstructured clinical notes often contain extra information on supplements. We studied a group of 377 patients from three healthcare facilities and developed a natural language processing (NLP) tool to detect supplement use. Using surveys of these patients, we investigated the correlation between self-reported supplement use and NLP extractions from the clinical notes. Our model achieved an F1 score of 0.914 for detecting all supplements. Individual supplement detection had a variable correlation with survey responses, ranging from an F1 of 0.83 for calcium to an F1 of 0.39 for folic acid. Our study demonstrated good NLP performance while also finding that self-reported supplement use is not always consistent with the documented use in clinical records.

Asunto(s)

Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Humanos , Suplementos Dietéticos , Autoinforme

8.

TRSRD: a database for research on risky substances in tea using natural language processing and knowledge graph-based techniques.

Wang, Yongmei; Wang, Peng; Zhang, Yongheng; Yao, Siyi; Xu, Zhipeng; Zhang, Youhua.

Database (Oxford) ; 20232023 05 09.

Artículo en Inglés | MEDLINE | ID: mdl-37159240

RESUMEN

During the production and processing of tea, harmful substances are often introduced. However, they have never been systematically integrated, and it is impossible to understand the harmful substances that may be introduced during tea production and their related relationships when searching for papers. To address these issues, a database on tea risk substances and their research relationships was constructed. These data were correlated by knowledge mapping techniques, and a Neo4j graph database centered on tea risk substance research was constructed, containing 4189 nodes and 9400 correlations (e.g. research category-PMID, risk substance category-PMID, and risk substance-PMID). This is the first knowledge-based graph database that is specifically designed for integrating and analyzing risk substances in tea and related research, containing nine main types of tea risk substances (including a comprehensive discussion of inclusion pollutants, heavy metals, pesticides, environmental pollutants, mycotoxins, microorganisms, radioactive isotopes, plant growth regulators, and others) and six types of tea research papers (including reviews, safety evaluations/risk assessments, prevention and control measures, detection methods, residual/pollution situations, and data analysis/data measurement). It is an essential reference for exploring the causes of the formation of risk substances in tea and the safety standards of tea in the future. Database URL http://trsrd.wpengxs.cn.

Asunto(s)

Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas , Bases de Datos Factuales , Conocimiento , Té

9.

Extracting Pain Care Quality Indicators from U.S. Veterans Health Administration Chiropractic Care Using Natural Language Processing.

C Coleman, Brian; Finch, Dezon; Wang, Rixin; L Luther, Stephen; Heapy, Alicia; Brandt, Cynthia; J Lisi, Anthony.

Appl Clin Inform ; 14(3): 600-608, 2023 05.

Artículo en Inglés | MEDLINE | ID: mdl-37164327

RESUMEN

BACKGROUND: Musculoskeletal pain is common in the Veterans Health Administration (VHA), and there is growing national use of chiropractic services within the VHA. Rapid expansion requires scalable and autonomous solutions, such as natural language processing (NLP), to monitor care quality. Previous work has defined indicators of pain care quality that represent essential elements of guideline-concordant, comprehensive pain assessment, treatment planning, and reassessment. OBJECTIVE: Our purpose was to identify pain care quality indicators and assess patterns across different clinic visit types using NLP on VHA chiropractic clinic documentation. METHODS: Notes from ambulatory or in-hospital chiropractic care visits from October 1, 2018 to September 30, 2019 for patients in the Women Veterans Cohort Study were included in the corpus, with visits identified as consultation visits and/or evaluation and management (E&M) visits. Descriptive statistics of pain care quality indicator classes were calculated and compared across visit types. RESULTS: There were 11,752 patients who received any chiropractic care during FY2019, with 63,812 notes included in the corpus. Consultation notes had more than twice the total number of annotations per note (87.9) as follow-up visit notes (34.7). The mean number of total classes documented per note across the entire corpus was 9.4 (standard deviation [SD] = 1.5). More total indicator classes were documented during consultation visits with (mean = 14.8, SD = 0.9) or without E&M (mean = 13.9, SD = 1.2) compared to follow-up visits with (mean = 9.1, SD = 1.4) or without E&M (mean = 8.6, SD = 1.5). Co-occurrence of pain care quality indicators describing pain assessment was high. CONCLUSION: VHA chiropractors frequently document pain care quality indicators, identifiable using NLP, with variability across different visit types.

Asunto(s)

Quiropráctica , Humanos , Femenino , Indicadores de Calidad de la Atención de Salud , Salud de los Veteranos , Procesamiento de Lenguaje Natural , Estudios de Cohortes , Calidad de la Atención de Salud , Dolor

10.

quEHRy: a question answering system to query electronic health records.

Soni, Sarvesh; Datta, Surabhi; Roberts, Kirk.

J Am Med Inform Assoc ; 30(6): 1091-1102, 2023 05 19.

Artículo en Inglés | MEDLINE | ID: mdl-37087111

RESUMEN

OBJECTIVE: We propose a system, quEHRy, to retrieve precise, interpretable answers to natural language questions from structured data in electronic health records (EHRs). MATERIALS AND METHODS: We develop/synthesize the main components of quEHRy: concept normalization (MetaMap), time frame classification (new), semantic parsing (existing), visualization with question understanding (new), and query module for FHIR mapping/processing (new). We evaluate quEHRy on 2 clinical question answering (QA) datasets. We evaluate each component separately as well as holistically to gain deeper insights. We also conduct a thorough error analysis for a crucial subcomponent, medical concept normalization. RESULTS: Using gold concepts, the precision of quEHRy is 98.33% and 90.91% for the 2 datasets, while the overall accuracy was 97.41% and 87.75%. Precision was 94.03% and 87.79% even after employing an automated medical concept extraction system (MetaMap). Most incorrectly predicted medical concepts were broader in nature than gold-annotated concepts (representative of the ones present in EHRs), eg, Diabetes versus Diabetes Mellitus, Non-Insulin-Dependent. DISCUSSION: The primary performance barrier to deployment of the system is due to errors in medical concept extraction (a component not studied in this article), which affects the downstream generation of correct logical structures. This indicates the need to build QA-specific clinical concept normalizers that understand EHR context to extract the "relevant" medical concepts from questions. CONCLUSION: We present an end-to-end QA system that allows information access from EHRs using natural language and returns an exact, verifiable answer. Our proposed system is high-precision and interpretable, checking off the requirements for clinical use.

Asunto(s)

Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Semántica , Acceso a la Información , Oro

11.

A New Tool for Holistic Residency Application Review: Using Natural Language Processing of Applicant Experiences to Predict Interview Invitation.

Mahtani, Arun Umesh; Reinstein, Ilan; Marin, Marina; Burk-Rafel, Jesse.

Acad Med ; 98(9): 1018-1021, 2023 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-36940395

RESUMEN

PROBLEM: Reviewing residency application narrative components is time intensive and has contributed to nearly half of applications not receiving holistic review. The authors developed a natural language processing (NLP)-based tool to automate review of applicants' narrative experience entries and predict interview invitation. APPROACH: Experience entries (n = 188,500) were extracted from 6,403 residency applications across 3 application cycles (2017-2019) at 1 internal medicine program, combined at the applicant level, and paired with the interview invitation decision (n = 1,224 invitations). NLP identified important words (or word pairs) with term frequency-inverse document frequency, which were used to predict interview invitation using logistic regression with L1 regularization. Terms remaining in the model were analyzed thematically. Logistic regression models were also built using structured application data and a combination of NLP and structured data. Model performance was evaluated on never-before-seen data using area under the receiver operating characteristic and precision-recall curves (AUROC, AUPRC). OUTCOMES: The NLP model had an AUROC of 0.80 (vs chance decision of 0.50) and AUPRC of 0.49 (vs chance decision of 0.19), showing moderate predictive strength. Phrases indicating active leadership, research, or work in social justice and health disparities were associated with interview invitation. The model's detection of these key selection factors demonstrated face validity. Adding structured data to the model significantly improved prediction (AUROC 0.92, AUPRC 0.73), as expected given reliance on such metrics for interview invitation. NEXT STEPS: This model represents a first step in using NLP-based artificial intelligence tools to promote holistic residency application review. The authors are assessing the practical utility of using this model to identify applicants screened out using traditional metrics. Generalizability must be determined through model retraining and evaluation at other programs. Work is ongoing to thwart model "gaming," improve prediction, and remove unwanted biases introduced during model training.

Asunto(s)

Internado y Residencia , Humanos , Procesamiento de Lenguaje Natural , Inteligencia Artificial , Selección de Personal , Liderazgo

12.

Automatic Classification of Tumor Response From Radiology Reports With Rule-Based Natural Language Processing Integrated Into the Clinical Oncology Workflow.

Laurent, Gery; Craynest, Franck; Thobois, Maxime; Hajjaji, Nawale.

JCO Clin Cancer Inform ; 7: e2200139, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36780606

RESUMEN

PURPOSE: Imaging reports in oncology provide critical information about the disease evolution that should be timely shared to tailor the clinical decision making and care coordination of patients with advanced cancer. However, tumor response stays unstructured in free-text and underexploited. Natural language processing (NLP) methods can help provide this critical information into the electronic health records (EHR) in real time to assist health care workers. METHODS: A rule-based algorithm was developed using SAS tools to automatically extract and categorize tumor response within progression or no progression categories. 2,970 magnetic resonance imaging, computed tomography scan, and positron emission tomography French reports were extracted from the EHR of a large comprehensive cancer center to build a 2,637-document training set and a 603-document validation set. The model was also tested on 189 imaging reports from 46 different radiology centers. A tumor dashboard was created in the EHR using the Timeline tool of the vis.js javascript library. RESULTS: An NLP methodology was applied to create an ontology of radiographic terms defining tumor response, mapping text to five main concepts, and application decision rules on the basis of clinical practice RECIST guidelines. The model achieved an overall accuracy of 0.88 (ranging from 0.87 to 0.94), with similar performance on both progression and no progression classification. The overall accuracy was 0.82 on reports from different radiology centers. Data were visualized and organized in a dynamic tumor response timeline. This tool was deployed successfully at our institution both retrospectively and prospectively as part of an automatic pipeline to screen reports and classify tumor response in real time for all metastatic patients. CONCLUSION: Our approach provides an NLP-based framework to structure and classify tumor response from the EHR and integrate tumor response classification into the clinical oncology workflow.

Asunto(s)

Neoplasias , Radiología , Humanos , Estudios Retrospectivos , Procesamiento de Lenguaje Natural , Flujo de Trabajo , Neoplasias/diagnóstico por imagen , Neoplasias/terapia , Oncología Médica

13.

A Natural Language Processing and Machine Learning Approach to Identification of Incidental Radiology Findings in Trauma Patients Discharged from the Emergency Department.

Evans, Christopher S; Dorris, Hugh D; Kane, Michael T; Mervak, Benjamin; Brice, Jane H; Gray, Benjamin; Moore, Carlton.

Ann Emerg Med ; 81(3): 262-269, 2023 03.

Artículo en Inglés | MEDLINE | ID: mdl-36328850

RESUMEN

STUDY OBJECTIVE: Patients undergoing diagnostic imaging studies in the emergency department (ED) commonly have incidental findings, which may represent unrecognized serious medical conditions, including cancer. Recognition of incidental findings frequently relies on manual review of textual radiology reports and can be overlooked in a busy clinical environment. Our study aimed to develop and validate a supervised machine learning model using natural language processing to automate the recognition of incidental findings in radiology reports of patients discharged from the ED. METHODS: We performed a retrospective analysis of computed tomography (CT) reports from trauma patients discharged home across an integrated health system in 2019. Two independent annotators manually labeled CT reports for the presence of an incidental finding as a reference standard. We used regular expressions to derive and validate a random forest model using open-source and machine learning software. Final model performance was assessed across different ED types. RESULTS: The study CT reports were divided into derivation (690 reports) and validation (282 reports) sets, with a prevalence of incidental findings of 22.3%, and 22.7%, respectively. The random forest model had an area under the curve of 0.88 (95% confidence interval [CI], 0.84 to 0.92) on the derivation set and 0.92 (95% CI, 0.88 to 0.96) on the validation set. The final model was found to have a sensitivity of 92.2%, a specificity of 79.4%, and a negative predictive value of 97.2%. Similarly, strong model performance was found when stratified to a dedicated trauma center, high-volume, and low-volume community EDs. CONCLUSION: Machine learning and natural language processing can classify incidental findings in CT reports of ED patients with high sensitivity and high negative predictive value across a broad range of ED settings. These findings suggest the utility of natural language processing in automating the review of free-text reports to identify incidental findings and may facilitate interventions to improve timely follow-up.

Asunto(s)

Procesamiento de Lenguaje Natural , Radiología , Humanos , Estudios Retrospectivos , Alta del Paciente , Aprendizaje Automático , Servicio de Urgencia en Hospital , Hallazgos Incidentales

14.

Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning.

Carrell, David S; Gruber, Susan; Floyd, James S; Bann, Maralyssa A; Cushing-Haugen, Kara L; Johnson, Ron L; Graham, Vina; Cronkite, David J; Hazlehurst, Brian L; Felcher, Andrew H; Bejan, Cosmin A; Kennedy, Adee; Shinde, Mayura U; Karami, Sara; Ma, Yong; Stojanovic, Danijela; Zhao, Yueqin; Ball, Robert; Nelson, Jennifer C.

Am J Epidemiol ; 192(2): 283-295, 2023 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-36331289

RESUMEN

We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.

Asunto(s)

Anafilaxia , Procesamiento de Lenguaje Natural , Humanos , Anafilaxia/diagnóstico , Anafilaxia/epidemiología , Aprendizaje Automático , Algoritmos , Servicio de Urgencia en Hospital , Registros Electrónicos de Salud

15.

How do others cope? Extracting coping strategies for adverse drug events from social media.

Dirkson, Anne; Verberne, Suzan; van Oortmerssen, Gerard; Gelderblom, Hans; Kraaij, Wessel.

J Biomed Inform ; 139: 104228, 2023 03.

Artículo en Inglés | MEDLINE | ID: mdl-36309197

RESUMEN

Patients advise their peers on how to cope with their illness in daily life on online support groups. To date, no efforts have been made to automatically extract recommended coping strategies from online patient discussion groups. We introduce this new task, which poses a number of challenges including complex, long entities, a large long-tailed label space, and cross-document relations. We present an initial ontology for coping strategies as a starting point for future research on coping strategies, and the first end-to-end pipeline for extracting coping strategies for side effects. We also compared two possible computational solutions for this novel and highly challenging task; multi-label classification and named entity recognition (NER) with entity linking (EL). We evaluated our methods on the discussion forum from the Facebook group of the worldwide patient support organization 'GIST support international' (GSI); GIST support international donated the data to us. We found that coping strategy extraction is difficult and both methods attain limited performance (measured with F1 score) on held out test sets; multi-label classification outperforms NER+EL (F1=0.220 vs F1=0.155). An inspection of the multi-label classification output revealed that for some of the incorrect predictions, the reference label is close to the predicted label in the ontology (e.g. the predicted label 'juice' instead of the more specific reference label 'grapefruit juice'). Performance increased to F1=0.498 when we evaluated at a coarser level of the ontology. We conclude that our pipeline can be used in a semi-automatic setting, in interaction with domain experts to discover coping strategies for side effects from a patient forum. For example, we found that patients recommend ginger tea for nausea and magnesium and potassium supplements for cramps. This information can be used as input for patient surveys or clinical studies.

Asunto(s)

Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Tumores del Estroma Gastrointestinal , Medios de Comunicación Sociales , Humanos , Procesamiento de Lenguaje Natural

16.

Natural Language Processing for Improved Characterization of COVID-19 Symptoms: Observational Study of 350,000 Patients in a Large Integrated Health Care System.

Malden, Deborah E; Tartof, Sara Y; Ackerson, Bradley K; Hong, Vennis; Skarbinski, Jacek; Yau, Vincent; Qian, Lei; Fischer, Heidi; Shaw, Sally F; Caparosa, Susan; Xie, Fagen.

JMIR Public Health Surveill ; 8(12): e41529, 2022 12 30.

Artículo en Inglés | MEDLINE | ID: mdl-36446133

RESUMEN

BACKGROUND: Natural language processing (NLP) of unstructured text from electronic medical records (EMR) can improve the characterization of COVID-19 signs and symptoms, but large-scale studies demonstrating the real-world application and validation of NLP for this purpose are limited. OBJECTIVE: The aim of this paper is to assess the contribution of NLP when identifying COVID-19 signs and symptoms from EMR. METHODS: This study was conducted in Kaiser Permanente Southern California, a large integrated health care system using data from all patients with positive SARS-CoV-2 laboratory tests from March 2020 to May 2021. An NLP algorithm was developed to extract free text from EMR on 12 established signs and symptoms of COVID-19, including fever, cough, headache, fatigue, dyspnea, chills, sore throat, myalgia, anosmia, diarrhea, vomiting or nausea, and abdominal pain. The proportion of patients reporting each symptom and the corresponding onset dates were described before and after supplementing structured EMR data with NLP-extracted signs and symptoms. A random sample of 100 chart-reviewed and adjudicated SARS-CoV-2-positive cases were used to validate the algorithm performance. RESULTS: A total of 359,938 patients (mean age 40.4 [SD 19.2] years; 191,630/359,938, 53% female) with confirmed SARS-CoV-2 infection were identified over the study period. The most common signs and symptoms identified through NLP-supplemented analyses were cough (220,631/359,938, 61%), fever (185,618/359,938, 52%), myalgia (153,042/359,938, 43%), and headache (144,705/359,938, 40%). The NLP algorithm identified an additional 55,568 (15%) symptomatic cases that were previously defined as asymptomatic using structured data alone. The proportion of additional cases with each selected symptom identified in NLP-supplemented analysis varied across the selected symptoms, from 29% (63,742/220,631) of all records for cough to 64% (38,884/60,865) of all records with nausea or vomiting. Of the 295,305 symptomatic patients, the median time from symptom onset to testing was 3 days using structured data alone, whereas the NLP algorithm identified signs or symptoms approximately 1 day earlier. When validated against chart-reviewed cases, the NLP algorithm successfully identified signs and symptoms with consistently high sensitivity (ranging from 87% to 100%) and specificity (94% to 100%). CONCLUSIONS: These findings demonstrate that NLP can identify and characterize a broad set of COVID-19 signs and symptoms from unstructured EMR data with enhanced detail and timeliness compared with structured data alone.

Asunto(s)

COVID-19 , Humanos , Femenino , Adulto , Masculino , SARS-CoV-2 , Procesamiento de Lenguaje Natural , Mialgia , Tos/etiología , Cefalea/etiología , Fiebre/etiología

17.

Deep Mobile Linguistic Therapy for Patients with ASD.

Ortiz Castellanos, Ari Ernesto; Liu, Chuan-Ming; Shi, Chongyang.

Int J Environ Res Public Health ; 19(19)2022 10 07.

Artículo en Inglés | MEDLINE | ID: mdl-36232157

RESUMEN

Autistic spectrum disorder (ASD) is one of the most complex groups of neurobehavioral and developmental conditions. The reason is the presence of three different impaired domains, such as social interaction, communication, and restricted repetitive behaviors. Some children with ASD may not be able to communicate using language or speech. Many experts propose that continued therapy in the form of software training in this area might help to bring improvement. In this work, we propose a design of software speech therapy system for ASD. We combined different devices, technologies, and features with techniques of home rehabilitation. We used TensorFlow for Image Classification, ArKit for Text-to-Speech, Cloud Database, Binary Search, Natural Language Processing, Dataset of Sentences, and Dataset of Images with two different Operating Systems designed for Smart Mobile devices in daily life. This software is a combination of different Deep Learning Technologies and makes Human-Computer Interaction Therapy very easy to conduct. In addition, we explain the way these were connected and put to work together. Additionally, we explain in detail the architecture of software and how each component works together as an integrated Therapy System. Finally, it allows the patient with ASD to perform the therapy anytime and everywhere, as well as transmitting information to a medical specialist.

Asunto(s)

Trastorno del Espectro Autista , Trastornos Generalizados del Desarrollo Infantil , Trastorno del Espectro Autista/terapia , Niño , Humanos , Lenguaje , Lingüística , Procesamiento de Lenguaje Natural

18.

Leveraging an Informatics Approach to Identify an Unmet Clinical Need for BRCA1/2 Testing Among Patients With Ovarian Cancer.

Gray, Stacy W; Ottesen, Rebecca A; Currey, Madeline; Cristea, Mihaela; Nikowitz, Janet; Shehayeb, Susan; Lozano, Vanessa; Hom, Julie; Kilburn, Julie; Lopez, Lisa N; Wing, Sam; Sosa, Ernesto; Shen, Jenny; Morris, Michael; Dilsizian, Bedros; Joseph, Thomas; Shen, James; Adeimy, Camille; Phillips, Tanyanika; Bahadini, Bahareh; Niland, Joyce C.

JCO Clin Cancer Inform ; 6: e2200034, 2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-36049148

RESUMEN

PURPOSE: Although BRCA1/2 testing in ovarian cancer improves outcomes, it is vastly underutilized. Scalable approaches are urgently needed to improve genomically guided care. METHODS: We developed a Natural Language Processing (NLP) pipeline to extract electronic medical record information to identify recipients of BRCA testing. We applied the NLP pipeline to assess testing status in 308 patients with ovarian cancer receiving care at a National Cancer Institute Comprehensive Cancer Center (main campus [MC] and five affiliated clinical network sites [CNS]) from 2017 to 2019. We compared characteristics between (1) patients who had/had not received testing and (2) testing utilization by site. RESULTS: We found high uptake of BRCA testing (approximately 78%) from 2017 to 2019 with no significant differences between the MC and CNS. We observed an increase in testing over time (67%-85%), higher uptake of testing among younger patients (mean age tested = 61 years v untested = 65 years, P = .01), and higher testing among Hispanic (84%) compared with White, Non-Hispanic (78%), and Asian (75%) patients (P = .006). Documentation of referral for an internal genetics consultation for BRCA pathogenic variant carriers was higher at the MC compared with the CNS (94% v 31%). CONCLUSION: We were able to successfully use a novel NLP pipeline to assess use of BRCA testing among patients with ovarian cancer. Despite relatively high levels of BRCA testing at our institution, 22% of patients had no documentation of genetic testing and documentation of referral to genetics among BRCA carriers in the CNS was low. Given success of the NLP pipeline, such an informatics-based approach holds promise as a scalable solution to identify gaps in genetic testing to ensure optimal treatment interventions in a timely manner.

Asunto(s)

Proteína BRCA2 , Informática Aplicada a la Salud de los Consumidores , Neoplasias Ováricas , Proteína BRCA1/genética , Proteína BRCA2/genética , Informática Aplicada a la Salud de los Consumidores/métodos , Femenino , Pruebas Genéticas , Humanos , Persona de Mediana Edad , Procesamiento de Lenguaje Natural , Neoplasias Ováricas/diagnóstico , Neoplasias Ováricas/genética , Neoplasias Ováricas/patología , Derivación y Consulta

19.

A conversational agent system for dietary supplements use.

Singh, Esha; Bompelli, Anu; Wan, Ruyuan; Bian, Jiang; Pakhomov, Serguei; Zhang, Rui.

BMC Med Inform Decis Mak ; 22(Suppl 1): 153, 2022 07 07.

Artículo en Inglés | MEDLINE | ID: mdl-35799177

RESUMEN

BACKGROUND: Dietary supplements (DS) have been widely used by consumers, but the information around the efficacy and safety of DS is disparate or incomplete, thus creating barriers for consumers to find information effectively. Conversational agent (CA) systems have been applied to healthcare domain, but there is no such system to answer consumers regarding DS use, although widespread use of DS. In this study, we develop the first CA system for DS use. METHODS: Our CA system for DS use developed on the MindMeld framework, consists of three components: question understanding, DS knowledge base, and answer generation. We collected and annotated 1509 questions to develop a natural language understanding module (e.g., question type classifier, named entity recognizer) which was then integrated into MindMeld framework. CA then queries the DS knowledge base (i.e., iDISK) and generates answers using rule-based slot filling techniques. We evaluated the algorithms of each component and the CA system as a whole. RESULTS: CNN is the best question classifier with an F1 score of 0.81, and CRF is the best named entity recognizer with an F1 score of 0.87. The system achieves an overall accuracy of 81% and an average score of 1.82 with succ@3 + score of 76.2% and succ@2 + of 66% approximately. CONCLUSION: This study develops the first CA system for DS use using the MindMeld framework and iDISK domain knowledge base.

Asunto(s)

Algoritmos , Procesamiento de Lenguaje Natural , Suplementos Dietéticos , Humanos , Lenguaje

20.

Discovering novel drug-supplement interactions using SuppKG generated from the biomedical literature.

Schutte, Dalton; Vasilakes, Jake; Bompelli, Anu; Zhou, Yuqi; Fiszman, Marcelo; Xu, Hua; Kilicoglu, Halil; Bishop, Jeffrey R; Adam, Terrence; Zhang, Rui.

J Biomed Inform ; 131: 104120, 2022 07.

Artículo en Inglés | MEDLINE | ID: mdl-35709900

RESUMEN

OBJECTIVE: Develop a novel methodology to create a comprehensive knowledge graph (SuppKG) to represent a domain with limited coverage in the Unified Medical Language System (UMLS), specifically dietary supplement (DS) information for discovering drug-supplement interactions (DSI), by leveraging biomedical natural language processing (NLP) technologies and a DS domain terminology. MATERIALS AND METHODS: We created SemRepDS (an extension of an NLP tool, SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT model to remove incorrect relations before generating SuppKG. Two discovery pathways were applied to SuppKG to identify potential DSIs, which are then compared with an existing DSI database and also evaluated by medical professionals for mechanistic plausibility. RESULTS: SemRepDS returned 158.5% more DS entities and 206.9% more DS relations than SemRep. The fine-tuned PubMedBERT model (significantly outperformed other machine learning and BERT models) obtained an F1 score of 0.8605 and removed 43.86% of semantic relations, improving the precision of the relations by 26.4% over pre-filtering. SuppKG consists of 56,635 nodes and 595,222 directed edges with 2,928 DS-specific nodes and 164,738 edges. Manual review of findings identified 182 of 250 (72.8%) proposed DS-Gene-Drug and 77 of 100 (77%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible. DISCUSSION: With added DS terminology to the UMLS, SemRepDS has the capability to find more DS-specific semantic relationships from PubMed than SemRep. The utility of the resulting SuppKG was demonstrated using discovery patterns to find novel DSIs. CONCLUSION: For the domain with limited coverage in the traditional terminology (e.g., UMLS), we demonstrated an approach to leverage domain terminology and improve existing NLP tools to generate a more comprehensive knowledge graph for the downstream task. Even this study focuses on DSI, the method may be adapted to other domains.

Asunto(s)

Procesamiento de Lenguaje Natural , Unified Medical Language System , Suplementos Dietéticos , PubMed , Semántica

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA