Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 5.936
Filtrar
1.
Front Public Health ; 11: 1063466, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36860378

RESUMO

Purpose: The COVID-19 pandemic has drastically disrupted global healthcare systems. With the higher demand for healthcare and misinformation related to COVID-19, there is a need to explore alternative models to improve communication. Artificial Intelligence (AI) and Natural Language Processing (NLP) have emerged as promising solutions to improve healthcare delivery. Chatbots could fill a pivotal role in the dissemination and easy accessibility of accurate information in a pandemic. In this study, we developed a multi-lingual NLP-based AI chatbot, DR-COVID, which responds accurately to open-ended, COVID-19 related questions. This was used to facilitate pandemic education and healthcare delivery. Methods: First, we developed DR-COVID with an ensemble NLP model on the Telegram platform (https://t.me/drcovid_nlp_chatbot). Second, we evaluated various performance metrics. Third, we evaluated multi-lingual text-to-text translation to Chinese, Malay, Tamil, Filipino, Thai, Japanese, French, Spanish, and Portuguese. We utilized 2,728 training questions and 821 test questions in English. Primary outcome measurements were (A) overall and top 3 accuracies; (B) Area Under the Curve (AUC), precision, recall, and F1 score. Overall accuracy referred to a correct response for the top answer, whereas top 3 accuracy referred to an appropriate response for any one answer amongst the top 3 answers. AUC and its relevant matrices were obtained from the Receiver Operation Characteristics (ROC) curve. Secondary outcomes were (A) multi-lingual accuracy; (B) comparison to enterprise-grade chatbot systems. The sharing of training and testing datasets on an open-source platform will also contribute to existing data. Results: Our NLP model, utilizing the ensemble architecture, achieved overall and top 3 accuracies of 0.838 [95% confidence interval (CI): 0.826-0.851] and 0.922 [95% CI: 0.913-0.932] respectively. For overall and top 3 results, AUC scores of 0.917 [95% CI: 0.911-0.925] and 0.960 [95% CI: 0.955-0.964] were achieved respectively. We achieved multi-linguicism with nine non-English languages, with Portuguese performing the best overall at 0.900. Lastly, DR-COVID generated answers more accurately and quickly than other chatbots, within 1.12-2.15 s across three devices tested. Conclusion: DR-COVID is a clinically effective NLP-based conversational AI chatbot, and a promising solution for healthcare delivery in the pandemic era.


Assuntos
COVID-19 , Aprendizado Profundo , Humanos , Processamento de Linguagem Natural , Inteligência Artificial , Pandemias , Índia
3.
Am J Clin Nutr ; 117(3): 553-563, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36872019

RESUMO

BACKGROUND: Food categorization and nutrient profiling are labor intensive, time consuming, and costly tasks, given the number of products and labels in large food composition databases and the dynamic food supply. OBJECTIVES: This study used a pretrained language model and supervised machine learning to automate food category classification and nutrition quality score prediction based on manually coded and validated data, and compared prediction results with models using bag-of-words and structured nutrition facts as inputs for predictions. METHODS: Food product information from University of Toronto Food Label Information and Price Database 2017 (n = 17,448) and University of Toronto Food Label Information and Price Database 2020 (n = 74,445) databases were used. Health Canada's Table of Reference Amounts (TRA) (24 categories and 172 subcategories) was used for food categorization and the Food Standards of Australia and New Zealand (FSANZ) nutrient profiling system was used for nutrition quality score evaluation. TRA categories and FSANZ scores were manually coded and validated by trained nutrition researchers. A modified pretrained sentence-Bidirectional Encoder Representations from Transformers model was used to encode unstructured text from food labels into lower-dimensional vector representations, followed by supervised machine learning algorithms (i.e., elastic net, k-Nearest Neighbors, and XGBoost) for multiclass classification and regression tasks. RESULTS: Pretrained language model representations utilized by the XGBoost multiclass classification algorithm reached overall accuracy scores of 0.98 and 0.96 in predicting food TRA major and subcategories, outperforming bag-of-words methods. For FSANZ score prediction, our proposed method reached a similar prediction accuracy (R2: 0.87 and MSE: 14.4) compared with bag-of-words methods (R2: 0.72-0.84; MSE: 30.3-17.6), whereas structured nutrition facts machine learning model performed the best (R2: 0.98; MSE: 2.5). The pretrained language model had a higher generalizable ability on the external test datasets than bag-of-words methods. CONCLUSIONS: Our automation achieved high accuracy in classifying food categories and predicting nutrition quality scores using text information found on food labels. This approach is effective and generalizable in a dynamic food environment, where large amounts of food label data can be obtained from websites.


Assuntos
Alimentos , Processamento de Linguagem Natural , Humanos , Valor Nutritivo , Aprendizado de Máquina , Estado Nutricional
4.
JCO Clin Cancer Inform ; 7: e2200158, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36888934

RESUMO

PURPOSE: Patients who represent the negative biomarker population, those tested for a biomarker but found to be negative, are a critical component of the growing molecular data repository. Many next-generation sequencing (NGS)-based tumor sequencing panels test hundreds of genes, but most laboratories do not provide explicit negative results on test reports nor in their structured data. However, the need for a complete picture of the testing landscape is significant. Syapse has created an internal ingestion and data transformation pipeline that uses the power of natural language processing (NLP), terminology management, and internal rulesets to semantically align data and infer negative results not explicitly stated. PATIENTS AND METHODS: Patients within the learning health network with a cancer diagnosis and at least one NGS-based molecular report were included. To obtain this critical negative result data, laboratory gene panel information was extracted and transformed using NLP techniques into a semistructured format for analysis. A normalization ontology was created in tandem. With this approach, we were able to successfully leverage positive biomarker data to derive negative data and create a comprehensive data set for molecular testing paradigms. RESULTS: The application of this process resulted in a drastic improvement in data completeness and clarity, especially when compared with other similar data sets. CONCLUSION: The ability to accurately determine positivity and testing rates among patient populations is imperative. With only positive results, it is impossible to draw conclusions about the entire tested population or the characteristics of the subgroup who are negative for the biomarker in question. We leverage these values to perform quality checks on ingested data, and end users can easily monitor their adherence to testing recommendations.


Assuntos
Neoplasias , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Processamento de Linguagem Natural , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Técnicas de Diagnóstico Molecular
5.
Artif Intell Med ; 137: 102487, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36868684

RESUMO

Electronic systems are increasingly present in the healthcare system and are often related to improved medical care. However, the widespread use of these technologies ended up building a relationship of dependence that can disrupt the doctor-patient relationship. In this context, digital scribes are automated clinical documentation systems that capture the physician-patient conversation and then generate the documentation for the appointment, enabling the physician to engage with the patient entirely. We have performed a systematic literature review on intelligent solutions for automatic speech recognition (ASR) with automatic documentation during a medical interview. The scope included only original research on systems that could detect speech and transcribe it in a natural and structured fashion simultaneously with the doctor-patient interaction, excluding speech-to-text-only technologies. The search resulted in a total of 1995 titles, with eight articles remaining after filtering for the inclusion and exclusion criteria. The intelligent models mainly consisted of an ASR system with natural language processing capability, a medical lexicon, and structured text output. None of the articles had a commercially available product at the time of the publication and reported limited real-life experience. So far, none of the applications has been prospectively validated and tested in large-scale clinical studies. Nonetheless, these first reports suggest that automatic speech recognition may be a valuable tool in the future to facilitate medical registration in a faster and more reliable manner. Improving transparency, accuracy, and empathy could drastically change how patients and doctors experience a medical visit. Unfortunately, clinical data on the usability and benefits of such applications is almost non-existent. We believe that future work in this area is necessary and needed.


Assuntos
Relações Médico-Paciente , Médicos , Humanos , Comunicação , Documentação , Processamento de Linguagem Natural
6.
JAMA Netw Open ; 6(3): e231204, 2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36862411

RESUMO

Importance: Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassification may lead to underpowered studies. Objective: To evaluate the performance, feasibility, and power implications of using NLP to measure the primary outcome of EHR-documented goals-of-care discussions in a pragmatic randomized clinical trial of a communication intervention. Design, Setting, and Participants: This diagnostic study compared the performance, feasibility, and power implications of measuring EHR-documented goals-of-care discussions using 3 approaches: (1) deep-learning NLP, (2) NLP-screened human abstraction (manual verification of NLP-positive records), and (3) conventional manual abstraction. The study included hospitalized patients aged 55 years or older with serious illness enrolled between April 23, 2020, and March 26, 2021, in a pragmatic randomized clinical trial of a communication intervention in a multihospital US academic health system. Main Outcomes and Measures: Main outcomes were natural language processing performance characteristics, human abstractor-hours, and misclassification-adjusted statistical power of methods of measuring clinician-documented goals-of-care discussions. Performance of NLP was evaluated with receiver operating characteristic (ROC) curves and precision-recall (PR) analyses and examined the effects of misclassification on power using mathematical substitution and Monte Carlo simulation. Results: A total of 2512 trial participants (mean [SD] age, 71.7 [10.8] years; 1456 [58%] female) amassed 44 324 clinical notes during 30-day follow-up. In a validation sample of 159 participants, deep-learning NLP trained on a separate training data set from identified patients with documented goals-of-care discussions with moderate accuracy (maximal F1 score, 0.82; area under the ROC curve, 0.924; area under the PR curve, 0.879). Manual abstraction of the outcome from the trial data set would require an estimated 2000 abstractor-hours and would power the trial to detect a risk difference of 5.4% (assuming 33.5% control-arm prevalence, 80% power, and 2-sided α = .05). Measuring the outcome by NLP alone would power the trial to detect a risk difference of 7.6%. Measuring the outcome by NLP-screened human abstraction would require 34.3 abstractor-hours to achieve estimated sensitivity of 92.6% and would power the trial to detect a risk difference of 5.7%. Monte Carlo simulations corroborated misclassification-adjusted power calculations. Conclusions and Relevance: In this diagnostic study, deep-learning NLP and NLP-screened human abstraction had favorable characteristics for measuring an EHR outcome at scale. Adjusted power calculations accurately quantified power loss from NLP-related misclassification, suggesting that incorporation of this approach into the design of studies using NLP would be beneficial.


Assuntos
Ensaios Clínicos como Assunto , Coleta de Dados , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Planejamento de Assistência ao Paciente , Idoso , Feminino , Humanos , Masculino , Simulação por Computador , Estudos de Viabilidade , Aprendizado Profundo , Coleta de Dados/métodos , Pessoa de Meia-Idade , Hospitalização
7.
J Med Internet Res ; 25: e41100, 2023 03 08.
Artigo em Inglês | MEDLINE | ID: mdl-36884281

RESUMO

BACKGROUND: Drug-induced suicide has been debated as a crucial issue in both clinical and public health research. Published research articles contain valuable data on the drugs associated with suicidal adverse events. An automated process that extracts such information and rapidly detects drugs related to suicide risk is essential but has not been well established. Moreover, few data sets are available for training and validating classification models on drug-induced suicide. OBJECTIVE: This study aimed to build a corpus of drug-suicide relations containing annotated entities for drugs, suicidal adverse events, and their relations. To confirm the effectiveness of the drug-suicide relation corpus, we evaluated the performance of a relation classification model using the corpus in conjunction with various embeddings. METHODS: We collected the abstracts and titles of research articles associated with drugs and suicide from PubMed and manually annotated them along with their relations at the sentence level (adverse drug events, treatment, suicide means, or miscellaneous). To reduce the manual annotation effort, we preliminarily selected sentences with a pretrained zero-shot classifier or sentences containing only drug and suicide keywords. We trained a relation classification model using various Bidirectional Encoder Representations from Transformer embeddings with the proposed corpus. We then compared the performances of the model with different Bidirectional Encoder Representations from Transformer-based embeddings and selected the most suitable embedding for our corpus. RESULTS: Our corpus comprised 11,894 sentences extracted from the titles and abstracts of the PubMed research articles. Each sentence was annotated with drug and suicide entities and the relationship between these 2 entities (adverse drug events, treatment, means, and miscellaneous). All of the tested relation classification models that were fine-tuned on the corpus accurately detected sentences of suicidal adverse events regardless of their pretrained type and data set properties. CONCLUSIONS: To our knowledge, this is the first and most extensive corpus of drug-suicide relations.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Suicídio , Humanos , PubMed , Idioma , Processamento de Linguagem Natural
8.
JAMA Netw Open ; 6(3): e233079, 2023 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-36920391

RESUMO

Importance: Social determinants of health (SDOHs) are known to be associated with increased risk of suicidal behaviors, but few studies use SDOHs from unstructured electronic health record notes. Objective: To investigate associations between veterans' death by suicide and recent SDOHs, identified using structured and unstructured data. Design, Setting, and Participants: This nested case-control study included veterans who received care under the US Veterans Health Administration from October 1, 2010, to September 30, 2015. A natural language processing (NLP) system was developed to extract SDOHs from unstructured clinical notes. Structured data yielded 6 SDOHs (ie, social or familial problems, employment or financial problems, housing instability, legal problems, violence, and nonspecific psychosocial needs), NLP on unstructured data yielded 8 SDOHs (social isolation, job or financial insecurity, housing instability, legal problems, barriers to care, violence, transition of care, and food insecurity), and combining them yielded 9 SDOHs. Data were analyzed in May 2022. Exposures: Occurrence of SDOHs over a maximum span of 2 years compared with no occurrence of SDOH. Main Outcomes and Measures: Cases of suicide death were matched with 4 controls on birth year, cohort entry date, sex, and duration of follow-up. Suicide was ascertained by National Death Index, and patients were followed up for up to 2 years after cohort entry with a study end date of September 30, 2015. Adjusted odds ratios (aORs) and 95% CIs were estimated using conditional logistic regression. Results: Of 6 122 785 veterans, 8821 committed suicide during 23 725 382 person-years of follow-up (incidence rate 37.18 per 100 000 person-years). These 8821 veterans were matched with 35 284 control participants. The cohort was mostly male (42 540 [96.45%]) and White (34 930 [79.20%]), with 6227 (14.12%) Black veterans. The mean (SD) age was 58.64 (17.41) years. Across the 5 common SDOHs, NLP-extracted SDOH, on average, retained 49.92% of structured SDOHs and covered 80.03% of all SDOH occurrences. SDOHs, obtained by structured data and/or NLP, were significantly associated with increased risk of suicide. The 3 SDOHs with the largest effect sizes were legal problems (aOR, 2.66; 95% CI, 2.46-2.89), violence (aOR, 2.12; 95% CI, 1.98-2.27), and nonspecific psychosocial needs (aOR, 2.07; 95% CI, 1.92-2.23), when obtained by combining structured data and NLP. Conclusions and Relevance: In this study, NLP-extracted SDOHs, with and without structured SDOHs, were associated with increased risk of suicide among veterans, suggesting the potential utility of NLP in public health studies.


Assuntos
Suicídio , Veteranos , Humanos , Masculino , Pessoa de Meia-Idade , Feminino , Veteranos/psicologia , Estudos de Casos e Controles , Processamento de Linguagem Natural , Determinantes Sociais da Saúde , Suicídio/psicologia
9.
Health Informatics J ; 29(1): 14604582221136712, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36857033

RESUMO

Drugs have the potential of causing adverse reactions or side effects and prior knowledge of these reactions can help prevent hospitalizations and premature deaths. Public databases of common adverse drug reactions (ADRs) depend on individual reports from drug manufacturers and health professionals. However, this passive approach to ADR surveillance has been shown to suffer from severe under-reporting. Social media, such as online health forums where patients across the globe willingly share their drug intake experience, is a viable and rich source for detecting unreported ADRs. In this paper, we design an ADR Detection Framework (ADF) using Natural Language Processing techniques to identify ADRs in drug reviews mined from social media. We demonstrate the applicability of ADF in the domain of Diabetes by identifying ADRs associated with diabetes drugs using data extracted from three online patient-based health forums: askapatient.com, webmd.com, and iodine.com. Next, we analyze and visualize the ADRs identified and present valuable insights including prevalent and less prevalent ADRs, age and gender differences in ADRs detected, as well as the previously unknown ADRs detected by our framework. Our work could promote active (real-time) ADR surveillance and also advance pharmacovigilance research.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Mídias Sociais , Humanos , Processamento de Linguagem Natural , Bases de Dados Factuais , Pessoal de Saúde
10.
Genome Med ; 15(1): 18, 2023 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-36927505

RESUMO

BACKGROUND: Rapidly and efficiently identifying critically ill infants for whole genome sequencing (WGS) is a costly and challenging task currently performed by scarce, highly trained experts and is a major bottleneck for application of WGS in the NICU. There is a dire need for automated means to prioritize patients for WGS. METHODS: Institutional databases of electronic health records (EHRs) are logical starting points for identifying patients with undiagnosed Mendelian diseases. We have developed automated means to prioritize patients for rapid and whole genome sequencing (rWGS and WGS) directly from clinical notes. Our approach combines a clinical natural language processing (CNLP) workflow with a machine learning-based prioritization tool named Mendelian Phenotype Search Engine (MPSE). RESULTS: MPSE accurately and robustly identified NICU patients selected for WGS by clinical experts from Rady Children's Hospital in San Diego (AUC 0.86) and the University of Utah (AUC 0.85). In addition to effectively identifying patients for WGS, MPSE scores also strongly prioritize diagnostic cases over non-diagnostic cases, with projected diagnostic yields exceeding 50% throughout the first and second quartiles of score-ranked patients. CONCLUSIONS: Our results indicate that an automated pipeline for selecting acutely ill infants in neonatal intensive care units (NICU) for WGS can meet or exceed diagnostic yields obtained through current selection procedures, which require time-consuming manual review of clinical notes and histories by specialized personnel.


Assuntos
Unidades de Terapia Intensiva Neonatal , Processamento de Linguagem Natural , Humanos , Recém-Nascido , Sequenciamento Completo do Genoma/métodos , Fenótipo , Aprendizado de Máquina
11.
Sensors (Basel) ; 23(5)2023 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-36905052

RESUMO

The comprehension of spoken language is a crucial aspect of dialogue systems, encompassing two fundamental tasks: intent classification and slot filling. Currently, the joint modeling approach for these two tasks has emerged as the dominant method in spoken language understanding modeling. However, the existing joint models have limitations in terms of their relevancy and utilization of contextual semantic features between the multiple tasks. To address these limitations, a joint model based on BERT and semantic fusion (JMBSF) is proposed. The model employs pre-trained BERT to extract semantic features and utilizes semantic fusion to associate and integrate this information. The results of experiments on two benchmark datasets, ATIS and Snips, in spoken language comprehension demonstrate that the proposed JMBSF model attains 98.80% and 99.71% intent classification accuracy, 98.25% and 97.24% slot-filling F1-score, and 93.40% and 93.57% sentence accuracy, respectively. These results reveal a significant improvement compared to other joint models. Furthermore, comprehensive ablation studies affirm the effectiveness of each component in the design of JMBSF.


Assuntos
Idioma , Semântica , Processamento de Linguagem Natural , Intenção , Estimulação Acústica
12.
Med (N Y) ; 4(3): 139-140, 2023 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-36905924

RESUMO

Goodman et al. discuss how AI technologies like the natural language processing model Chat-GPT could potentially transform healthcare through knowledge dissemination and personalized patient education. Before these tools can be safely integrated into healthcare, research and development of robust oversight mechanisms are necessary to ensure their accuracy and reliability.


Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Reprodutibilidade dos Testes , Atenção à Saúde , Instalações de Saúde
13.
Sensors (Basel) ; 23(3)2023 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-36772095

RESUMO

Auxiliary clinical diagnosis has been researched to solve unevenly and insufficiently distributed clinical resources. However, auxiliary diagnosis is still dominated by human physicians, and how to make intelligent systems more involved in the diagnosis process is gradually becoming a concern. An interactive automated clinical diagnosis with a question-answering system and a question generation system can capture a patient's conditions from multiple perspectives with less physician involvement by asking different questions to drive and guide the diagnosis. This clinical diagnosis process requires diverse information to evaluate a patient from different perspectives to obtain an accurate diagnosis. Recently proposed medical question generation systems have not considered diversity. Thus, we propose a diversity learning-based visual question generation model using a multi-latent space to generate informative question sets from medical images. The proposed method generates various questions by embedding visual and language information in different latent spaces, whose diversity is trained by our newly proposed loss. We have also added control over the categories of generated questions, making the generated questions directional. Furthermore, we use a new metric named similarity to accurately evaluate the proposed model's performance. The experimental results on the Slake and VQA-RAD datasets demonstrate that the proposed method can generate questions with diverse information. Our model works with an answering model for interactive automated clinical diagnosis and generates datasets to replace the process of annotation that incurs huge labor costs.


Assuntos
Processamento de Linguagem Natural , Semântica , Humanos , Idioma
14.
Sensors (Basel) ; 23(3)2023 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-36772767

RESUMO

Digital twins have revolutionized manufacturing and maintenance, allowing us to interact with virtual yet realistic representations of the physical world in simulations to identify potential problems or opportunities for improvement. However, traditional digital twins do not have the ability to communicate with humans using natural language, which limits their potential usefulness. Although conventional natural language processing methods have proven to be effective in solving certain tasks, neuro-symbolic AI offers a new approach that leads to more robust and versatile solutions. In this paper, we propose neuro-symbolic reasoning (NSR)-a fundamental method for interacting with 3D digital twins using natural language. The method understands user requests and contexts to manipulate 3D components of digital twins and is able to read maintenance manuals and implement installations and removal procedures autonomously. A practical neuro-symbolic dataset of machine-understandable manuals, 3D models, and user queries is collected to train the neuro-symbolic reasoning interaction mechanism. The evaluation demonstrates that NSR can execute user commands accurately, achieving 96.2% accuracy on test data. The proposed method has industrial importance since it provides the technology to perform maintenance procedures, request information from manuals, and serve as a tool to interact with complex virtual machinery using natural language.


Assuntos
Inteligência Artificial , Idioma , Humanos , Resolução de Problemas , Processamento de Linguagem Natural , Tecnologia
15.
Sensors (Basel) ; 23(4)2023 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-36850400

RESUMO

COVID-19 forced a number of changes in many areas of life, which resulted in an increase in human activity in cyberspace. Furthermore, the number of cyberattacks has increased. In such circumstances, detection, accurate prioritisation, and timely removal of critical vulnerabilities is of key importance for ensuring the security of various organisations. One of the most-commonly used vulnerability assessment standards is the Common Vulnerability Scoring System (CVSS), which allows for assessing the degree of vulnerability criticality on a scale from 0 to 10. Unfortunately, not all detected vulnerabilities have defined CVSS base scores, or if they do, they are not always expressed using the latest standard (CVSS 3.x). In this work, we propose using machine learning algorithms to convert the CVSS vector from Version 2.0 to 3.x. We discuss in detail the individual steps of the conversion procedure, starting from data acquisition using vulnerability databases and Natural Language Processing (NLP) algorithms, to the vector mapping process based on the optimisation of ML algorithm parameters, and finally, the application of machine learning to calculate the CVSS 3.x vector components. The calculated example results showed the effectiveness of the proposed method for the conversion of the CVSS 2.0 vector to the CVSS 3.x standard.


Assuntos
COVID-19 , Humanos , Algoritmos , Bases de Dados Factuais , Aprendizado de Máquina , Processamento de Linguagem Natural
16.
Child Abuse Negl ; 138: 106090, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36758373

RESUMO

BACKGROUND: Rates of child maltreatment (CM) obtained from electronic health records are much lower than national child welfare prevalence rates indicate. There is a need to understand how CM is documented to improve reporting and surveillance. OBJECTIVES: To examine whether using natural language processing (NLP) in outpatient chart notes can identify cases of CM not documented by ICD diagnosis code, the overlap between the coding of child maltreatment by ICD and NLP, and any differences by age, gender, or race/ethnicity. METHODS: Outpatient chart notes of children age 0-18 years old within Kaiser Permanente Washington (KPWA) 2018-2020 were used to examine a selected set of maltreatment-related terms categorized into concept unique identifiers (CUI). Manual review of text snippets for each CUI was completed to flag for validated cases and retrain the NLP algorithm. RESULTS: The NLP results indicated a crude rate of 1.55 % to 2.36 % (2018-2020) of notes with reference to CM. The rate of CM identified by ICD code was 3.32 per 1000 children, whereas the rate identified by NLP was 37.38 per 1000 children. The groups that increased the most in identification of maltreatment from ICD to NLP were adolescents (13-18 yrs. old), females, Native American children, and those on Medicaid. Of note, all subgroups had substantially higher rates of maltreatment when using NLP. CONCLUSIONS: Use of NLP substantially increased the estimated number of children who have been impacted by CM. Accurately capturing this population will improve identification of vulnerable youth at high risk for mental health symptoms.


Assuntos
Maus-Tratos Infantis , Processamento de Linguagem Natural , Feminino , Adolescente , Criança , Humanos , Recém-Nascido , Lactente , Pré-Escolar , Classificação Internacional de Doenças , Washington/epidemiologia , Registros Eletrônicos de Saúde
17.
J Biomed Semantics ; 14(1): 1, 2023 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-36721225

RESUMO

BACKGROUND: Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health. OBJECTIVE: In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications. METHODS: We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen. RESULTS: We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents. CONCLUSIONS: We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisation algorithms were additionally evaluated on a small manually annotated data set shows that the data set that we have generated allows characterising pathogens of interest. TRIAL REGISTRATION: N/A.


Assuntos
Algoritmos , Processamento de Linguagem Natural , Bases de Dados Genéticas , MEDLINE , Aprendizado de Máquina
18.
PLoS One ; 18(2): e0281147, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36724184

RESUMO

The ongoing COVID-19 pandemic produced far-reaching effects throughout society, and science is no exception. The scale, speed, and breadth of the scientific community's COVID-19 response lead to the emergence of new research at the remarkable rate of more than 250 papers published per day. This posed a challenge for the scientific community as traditional methods of engagement with the literature were strained by the volume of new research being produced. Meanwhile, the urgency of response lead to an increasingly prominent role for preprint servers and a diffusion of relevant research through many channels simultaneously. These factors created a need for new tools to change the way scientific literature is organized and found by researchers. With this challenge in mind, we present an overview of COVIDScholar https://covidscholar.org, an automated knowledge portal which utilizes natural language processing (NLP) that was built to meet these urgent needs. The search interface for this corpus of more than 260,000 research articles, patents, and clinical trials served more than 33,000 users at an average of 2,000 monthly active users and a peak of more than 8,600 weekly active users in the summer of 2020. Additionally, we include an analysis of trends in COVID-19 research over the course of the pandemic with a particular focus on the first 10 months, which represents a unique period of rapid worldwide shift in scientific attention.


Assuntos
COVID-19 , Humanos , Pandemias , Publicações , Processamento de Linguagem Natural
19.
J Biomed Semantics ; 14(1): 2, 2023 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-36732862

RESUMO

BACKGROUND: Medical lexicons enable the natural language processing (NLP) of health texts. Lexicons gather terms and concepts from thesauri and ontologies, and linguistic data for part-of-speech (PoS) tagging, lemmatization or natural language generation. To date, there is no such type of resource for Spanish. CONSTRUCTION AND CONTENT: This article describes an unified medical lexicon for Medical Natural Language Processing in Spanish. MedLexSp includes terms and inflected word forms with PoS information and Unified Medical Language System[Formula: see text] (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used NLP techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs. 10, the Anatomical Therapeutic Chemical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. MedLexSp includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 UMLS CUIs. We report two use cases of MedLexSp. First, applying the lexicon to pre-annotate a corpus of 1200 texts related to clinical trials. Second, PoS tagging and lemmatizing texts about clinical cases. MedLexSp improved the scores for PoS tagging and lemmatization compared to the default Spacy and Stanza python libraries. CONCLUSIONS: The lexicon is distributed in a delimiter-separated value file; an XML file with the Lexical Markup Framework; a lemmatizer module for the Spacy and Stanza libraries; and complementary Lexical Record (LR) files. The embeddings and code to extract COVID-19 terms, and the Spacy and Stanza lemmatizers enriched with medical terms are provided in a public repository.


Assuntos
COVID-19 , Processamento de Linguagem Natural , Humanos , Idioma , Vocabulário Controlado , Unified Medical Language System , Semântica
20.
JCO Clin Cancer Inform ; 7: e2200139, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36780606

RESUMO

PURPOSE: Imaging reports in oncology provide critical information about the disease evolution that should be timely shared to tailor the clinical decision making and care coordination of patients with advanced cancer. However, tumor response stays unstructured in free-text and underexploited. Natural language processing (NLP) methods can help provide this critical information into the electronic health records (EHR) in real time to assist health care workers. METHODS: A rule-based algorithm was developed using SAS tools to automatically extract and categorize tumor response within progression or no progression categories. 2,970 magnetic resonance imaging, computed tomography scan, and positron emission tomography French reports were extracted from the EHR of a large comprehensive cancer center to build a 2,637-document training set and a 603-document validation set. The model was also tested on 189 imaging reports from 46 different radiology centers. A tumor dashboard was created in the EHR using the Timeline tool of the vis.js javascript library. RESULTS: An NLP methodology was applied to create an ontology of radiographic terms defining tumor response, mapping text to five main concepts, and application decision rules on the basis of clinical practice RECIST guidelines. The model achieved an overall accuracy of 0.88 (ranging from 0.87 to 0.94), with similar performance on both progression and no progression classification. The overall accuracy was 0.82 on reports from different radiology centers. Data were visualized and organized in a dynamic tumor response timeline. This tool was deployed successfully at our institution both retrospectively and prospectively as part of an automatic pipeline to screen reports and classify tumor response in real time for all metastatic patients. CONCLUSION: Our approach provides an NLP-based framework to structure and classify tumor response from the EHR and integrate tumor response classification into the clinical oncology workflow.


Assuntos
Neoplasias , Radiologia , Humanos , Estudos Retrospectivos , Processamento de Linguagem Natural , Fluxo de Trabalho , Neoplasias/diagnóstico por imagem , Neoplasias/terapia , Oncologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...