Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 5.477
Filtrar
1.
Front Public Health ; 10: 772592, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35493375

RESUMO

Alzheimer's disease (AD) is a neurodegenerative disease that is difficult to be detected using convenient and reliable methods. The language change in patients with AD is an important signal of their cognitive status, which potentially helps in early diagnosis. In this study, we developed a transfer learning model based on speech and natural language processing (NLP) technology for the early diagnosis of AD. The lack of large datasets limits the use of complex neural network models without feature engineering, while transfer learning can effectively solve this problem. The transfer learning model is firstly pre-trained on large text datasets to get the pre-trained language model, and then, based on such a model, an AD classification model is performed on small training sets. Concretely, a distilled bidirectional encoder representation (distilBert) embedding, combined with a logistic regression classifier, is used to distinguish AD from normal controls. The model experiment was evaluated on Alzheimer's dementia recognition through spontaneous speech datasets in 2020, including the balanced 78 healthy controls (HC) and 78 patients with AD. The accuracy of the proposed model is 0.88, which is almost equivalent to the champion score in the challenge and a considerable improvement over the baseline of 75% established by organizers of the challenge. As a result, the transfer learning method in this study improves AD prediction, which does not only reduces the need for feature engineering but also addresses the lack of sufficiently large datasets.


Assuntos
Doença de Alzheimer , Doenças Neurodegenerativas , Doença de Alzheimer/diagnóstico , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Fala
2.
J Hosp Med ; 17(1): 11-18, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35504534

RESUMO

BACKGROUND: Diagnostic codes can retrospectively identify samples of febrile infants, but sensitivity is low, resulting in many febrile infants eluding detection. To ensure study samples are representative, an improved approach is needed. OBJECTIVE: To derive and internally validate a natural language processing algorithm to identify febrile infants and compare its performance to diagnostic codes. METHODS: This cross-sectional study consisted of infants aged 0-90 days brought to one pediatric emergency department from January 2016 to December 2017. We aimed to identify infants with fever, defined as a documented temperature ≥38°C. We used 2017 clinical notes to develop two rule-based algorithms to identify infants with fever and tested them on data from 2016. Using manual abstraction as the gold standard, we compared performance of the two rule-based algorithms (Models 1, 2) to four previously published diagnostic code groups (Models 5-8) using area under the receiver-operating characteristics curve (AUC), sensitivity, and specificity. RESULTS: For the test set (n = 1190 infants), 184 infants were febrile (15.5%). The AUCs (0.92-0.95) and sensitivities (86%-92%) of Models 1 and 2 were significantly greater than Models 5-8 (0.67-0.74; 20%-74%) with similar specificities (93%-99%). In contrast to Models 5-8, samples from Models 1 and 2 demonstrated similar characteristics to the gold standard, including fever prevalence, median age, and rates of bacterial infections, hospitalizations, and severe outcomes. CONCLUSIONS: Findings suggest rule-based algorithms can accurately identify febrile infants with greater sensitivity while preserving specificity compared to diagnostic codes. If externally validated, rule-based algorithms may be important tools to create representative study samples, thereby improving generalizability of findings.


Assuntos
Febre , Processamento de Linguagem Natural , Algoritmos , Criança , Estudos Transversais , Febre/diagnóstico , Humanos , Lactente , Estudos Retrospectivos
3.
PLoS One ; 17(5): e0267901, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35507636

RESUMO

Early detection and management of adverse drug reactions (ADRs) is crucial for improving patients' quality of life. Hand-foot syndrome (HFS) is one of the most problematic ADRs for cancer patients. Recently, an increasing number of patients post their daily experiences to internet community, for example in blogs, where potential ADR signals not captured through routine clinic visits can be described. Therefore, this study aimed to identify patients with potential ADRs, focusing on HFS, from internet blogs by using natural language processing (NLP) deep-learning methods. From 10,646 blog posts, written in Japanese by cancer patients, 149 HFS-positive sentences were extracted after pre-processing, annotation and scrutiny by a certified oncology pharmacist. The HFS-positive sentences described not only HFS typical expressions like "pain" or "spoon nail", but also patient-derived unique expressions like onomatopoeic ones. The dataset was divided at a 4 to 1 ratio and used to train and evaluate three NLP deep-learning models: long short-term memory (LSTM), bidirectional LSTM and bidirectional encoder representations from transformers (BERT). The BERT model gave the best performance with precision 0.63, recall 0.82 and f1 score 0.71 in the HFS user identification task. Our results demonstrate that this NLP deep-learning model can successfully identify patients with potential HFS from blog posts, where patients' real wordings on symptoms or impacts on their daily lives are described. Thus, it should be feasible to utilize patient-generated text data to improve ADR management for individual patients.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Síndrome Mão-Pé , Neoplasias , Síndrome Mão-Pé/diagnóstico , Síndrome Mão-Pé/etiologia , Humanos , Processamento de Linguagem Natural , Qualidade de Vida
4.
Artif Intell Med ; 128: 102311, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35534148

RESUMO

BACKGROUND: The development of electronic health records has provided a large volume of unstructured biomedical information. Extracting patient characteristics from these data has become a major challenge, especially in languages other than English. METHODS: Inspired by the French Text Mining Challenge (DEFT 2021) [1] in which we participated, our study proposes a multilabel classification of clinical narratives, allowing us to automatically extract the main features of a patient report. Our system is an end-to-end pipeline from raw text to labels with two main steps: named entity recognition and multilabel classification. Both steps are based on a neural network architecture based on transformers. To train our final classifier, we extended the dataset with all English and French Unified Medical Language System (UMLS) vocabularies related to human diseases. We focus our study on the multilingualism of training resources and models, with experiments combining French and English in different ways (multilingual embeddings or translation). RESULTS: We obtained an overall average micro-F1 score of 0.811 for the multilingual version, 0.807 for the French-only version and 0.797 for the translated version. CONCLUSION: Our study proposes an original multilabel classification of French clinical notes for patient phenotyping. We show that a multilingual algorithm trained on annotated real clinical notes and UMLS vocabularies leads to the best results.


Assuntos
Multilinguismo , Processamento de Linguagem Natural , Mineração de Dados , Humanos , Idioma , Unified Medical Language System
5.
J Healthc Eng ; 2022: 4072563, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35529541

RESUMO

Multitask learning (MTL) is an open and challenging problem in various real-world applications, such as recommendation systems, natural language processing, and computer vision. The typical way of conducting multitask learning is establishing some global parameter sharing mechanism among all tasks or assigning each task an individual set of parameters with cross-connections between tasks. However, for most existing approaches, the raw features are abstracted step by step, semantic information is mined from input space, and matching relation features are not introduced into the model. To solve the above problems, we propose a novel MMOE-match network to model the matches between medical cases and syndrome elements and introduce the recommendation algorithm into traditional Chinese medicine (TCM) study. Accurate medical record recommendation is significant for intelligent medical treatment. Ranking algorithms can be introduced in multi-TCM scenarios, such as syndrome element recommendation, symptom recommendation, and drug prescription recommendation. The recommendation system includes two main stages: recalling and ranking. The core of recalling and ranking is a two-tower matching network and multitask learning. MMOE-match combines the advantages of recalling and ranking model to design a new network. Furtherly, we try to take the matching network output as the input of multitask learning and compare the matching features designed by the manual. The data show that our model can bring significant positive benefits.


Assuntos
Medicina Tradicional Chinesa , Processamento de Linguagem Natural , Algoritmos , Humanos , Semântica
6.
J Biomed Semantics ; 13(1): 13, 2022 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-35527259

RESUMO

BACKGROUND: The high volume of research focusing on extracting patient information from electronic health records (EHRs) has led to an increase in the demand for annotated corpora, which are a precious resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multipurpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field. METHODS: In this study, a semantically annotated corpus was developed using clinical text from multiple medical specialties, document types, and institutions. In addition, we present, (1) a survey listing common aspects, differences, and lessons learned from previous research, (2) a fine-grained annotation schema that can be replicated to guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations. RESULTS: This study resulted in SemClinBr, a corpus that has 1000 clinical notes, labeled with 65,117 entities and 11,263 relations. In addition, both negation cues and medical abbreviation dictionaries were generated from the annotations. The average annotator agreement score varied from 0.71 (applying strict match) to 0.92 (considering a relaxed match) while accepting partial overlaps and hierarchically related semantic types. The extrinsic evaluation, when applying the corpus to two downstream NLP tasks, demonstrated the reliability and usefulness of annotations, with the systems achieving results that were consistent with the agreement scores. CONCLUSION: The SemClinBr corpus and other resources produced in this work can support clinical NLP studies, providing a common development and evaluation resource for the research community, boosting the utilization of EHRs in both clinical practice and biomedical research. To the best of our knowledge, SemClinBr is the first available Portuguese clinical corpus.


Assuntos
Medicina , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Humanos , Portugal , Reprodutibilidade dos Testes
7.
Comput Intell Neurosci ; 2022: 6344571, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35528369

RESUMO

Feature extraction and Chinese translation of Internet-of-Things English terms are the basis of many natural language processing. Its main purpose is to extract rich semantic information from unstructured texts to allow computers to further calculate and process them to meet different types of NLP-based tasks. However, most of the current methods use simple neural network models to count the word frequency or probability of words in the text, and it is difficult to accurately understand and translate IoT English terms. In response to this problem, this study proposes a neural network for feature extraction and Chinese translation of IoT English terms based on LSTM, which can not only correctly extract and translate IoT English vocabulary but also realize the feature correspondence between English and Chinese. The neural network proposed in this study has been tested and trained on multiple datasets, and it basically fulfills the requirements of feature translation and Chinese translation of Internet-of-Things terms in English and has great potential in the follow-up work.


Assuntos
Idioma , Processamento de Linguagem Natural , China , Humanos , Internet , Traduções
8.
Artif Intell Med ; 127: 102264, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35430035

RESUMO

In a number of circumstances, obtaining health-related information from a patient is time-consuming, whereas a chatbot interacting efficiently with that patient might help saving health care professional time and better assisting the patient. Making a chatbot understand patients' answers uses Natural Language Understanding (NLU) technology that relies on 'intent' and 'slot' predictions. Over the last few years, language models (such as BERT) pre-trained on huge amounts of data achieved state-of-the-art intent and slot predictions by connecting a neural network architecture (e.g., linear, recurrent, long short-term memory, or bidirectional long short-term memory) and fine-tuning all language model and neural network parameters end-to-end. Currently, two language models are specialized in French language: FlauBERT and CamemBERT. This study was designed to find out which combination of language model and neural network architecture was the best for intent and slot prediction by a chatbot from a French corpus of clinical cases. The comparisons showed that FlauBERT performed better than CamemBERT whatever the network architecture used and that complex architectures did not significantly improve performance vs. simple ones whatever the language model. Thus, in the medical field, the results support recommending FlauBERT with a simple linear network architecture.


Assuntos
Idioma , Processamento de Linguagem Natural , Humanos , Intenção , Redes Neurais de Computação , Software
9.
Artif Intell Med ; 127: 102282, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35430042

RESUMO

Clinical named entity recognition (CNER) is a fundamental step for many clinical Natural Language Processing (NLP) systems, which aims to recognize and classify clinical entities such as diseases, symptoms, exams, body parts and treatments in clinical free texts. In recent years, with the development of deep learning technology, deep neural networks (DNNs) have been widely used in Chinese clinical named entity recognition and many other clinical NLP tasks. However, these state-of-the-art models failed to make full use of the global information and multi-level semantic features in clinical texts. We design an improved character-level representation approach which integrates the character embedding and the character-label embedding to enhance the specificity and diversity of feature representations. Then, a multi-head self-attention based Bi-directional Long Short-Term Memory Conditional Random Field (MUSA-BiLSTM-CRF) model is proposed. By introducing the multi-head self-attention and combining a medical dictionary, the model can more effectively capture the weight relationships between characters and multi-level semantic feature information, which is expected to greatly improve the performance of Chinese clinical named entity recognition. We evaluate our model on two CCKS challenge (CCKS2017 Task 2 and CCKS2018 Task 1) benchmark datasets and the experimental results show that our proposed model achieves the best performance competing with the state-of-the-art DNN based methods.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , China , Idioma , Redes Neurais de Computação
10.
Artif Intell Med ; 127: 102284, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35430043

RESUMO

The medical domain is often subject to information overload. The digitization of healthcare, constant updates to online medical repositories, and increasing availability of biomedical datasets make it challenging to effectively analyze the data. This creates additional workload for medical professionals who are heavily dependent on medical data to complete their research and consult their patients. This paper aims to show how different text highlighting techniques can capture relevant medical context. This would reduce the doctors' cognitive load and response time to patients by facilitating them in making faster decisions, thus improving the overall quality of online medical services. Three different word-level text highlighting methodologies are implemented and evaluated. The first method uses Term Frequency - Inverse Document Frequency (TF-IDF) scores directly to highlight important parts of the text. The second method is a combination of TF-IDF scores, Word2Vec and the application of Local Interpretable Model-Agnostic Explanations to classification models. The third method uses neural networks directly to make predictions on whether or not a word should be highlighted. Our numerical study shows that the neural network approach is successful in highlighting medically-relevant terms and its performance is improved as the size of the input segment increases.


Assuntos
Redes Neurais de Computação , Telemedicina , Humanos , Processamento de Linguagem Natural
11.
BMC Bioinformatics ; 23(1): 136, 2022 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-35428175

RESUMO

BACKGROUND: Medical information has rapidly increased on the internet and has become one of the main targets of search engine use. However, medical information on the internet is subject to the problems of quality and accessibility, so ordinary users are unable to obtain answers to their medical questions conveniently. As a solution, researchers build medical question answering (QA) systems. However, research on medical QA in the Chinese language lags behind work on English-based systems. This lag is mainly due to the difficulty of constructing a high-quality knowledge base and the underutilization of medical corpora in the Chinese language. RESULTS: This study developed an end-to-end solution to implement a medical QA system for the Chinese language with low cost and time. First, we created a high-quality medical knowledge graph from hospital data (electronic health/medical records) in a nearly automatic manner that trained a supervised model based on data labeled using bootstrapping techniques. Then, we designed a QA system based on a memory-based neural network and attention mechanism. Finally, we trained the system to generate answers from the knowledge base and a QA corpus on the internet. CONCLUSIONS: Bootstrapping and deep neural network techniques can construct a knowledge graph from electronic health/medical records with satisfactory precision and coverage. Our proposed context bridge mechanisms perform training with a variety of language features. Our QA system can achieve state-of-the-art quality in answering medical questions with constrained topics. As we evaluated, complex Chinese language processing techniques, such as segmentation and parsing, were not necessary for practice and complex architectures were not necessary to build the QA system. Lastly, we created an application using our method for internet QA usage.


Assuntos
Idioma , Redes Neurais de Computação , China , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural
12.
PLoS One ; 17(4): e0265621, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35436295

RESUMO

Given a pre-trained BERT, how can we compress it to a fast and lightweight one while maintaining its accuracy? Pre-training language model, such as BERT, is effective for improving the performance of natural language processing (NLP) tasks. However, heavy models like BERT have problems of large memory cost and long inference time. In this paper, we propose SensiMix (Sensitivity-Aware Mixed Precision Quantization), a novel quantization-based BERT compression method that considers the sensitivity of different modules of BERT. SensiMix effectively applies 8-bit index quantization and 1-bit value quantization to the sensitive and insensitive parts of BERT, maximizing the compression rate while minimizing the accuracy drop. We also propose three novel 1-bit training methods to minimize the accuracy drop: Absolute Binary Weight Regularization, Prioritized Training, and Inverse Layer-wise Fine-tuning. Moreover, for fast inference, we apply FP16 general matrix multiplication (GEMM) and XNOR-Count GEMM for 8-bit and 1-bit quantization parts of the model, respectively. Experiments on four GLUE downstream tasks show that SensiMix compresses the original BERT model to an equally effective but lightweight one, reducing the model size by a factor of 8× and shrinking the inference time by around 80% without noticeable accuracy drop.


Assuntos
Compressão de Dados , Processamento de Linguagem Natural , Idioma
13.
BMC Bioinformatics ; 23(1): 144, 2022 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-35448946

RESUMO

BACKGROUND: The abundance of biomedical text data coupled with advances in natural language processing (NLP) is resulting in novel biomedical NLP (BioNLP) applications. These NLP applications, or tasks, are reliant on the availability of domain-specific language models (LMs) that are trained on a massive amount of data. Most of the existing domain-specific LMs adopted bidirectional encoder representations from transformers (BERT) architecture which has limitations, and their generalizability is unproven as there is an absence of baseline results among common BioNLP tasks. RESULTS: We present 8 variants of BioALBERT, a domain-specific adaptation of a lite bidirectional encoder representations from transformers (ALBERT), trained on biomedical (PubMed and PubMed Central) and clinical (MIMIC-III) corpora and fine-tuned for 6 different tasks across 20 benchmark datasets. Experiments show that a large variant of BioALBERT trained on PubMed outperforms the state-of-the-art on named-entity recognition (+ 11.09% BLURB score improvement), relation extraction (+ 0.80% BLURB score), sentence similarity (+ 1.05% BLURB score), document classification (+ 0.62% F1-score), and question answering (+ 2.83% BLURB score). It represents a new state-of-the-art in 5 out of 6 benchmark BioNLP tasks. CONCLUSIONS: The large variant of BioALBERT trained on PubMed achieved a higher BLURB score than previous state-of-the-art models on 5 of the 6 benchmark BioNLP tasks. Depending on the task, 5 different variants of BioALBERT outperformed previous state-of-the-art models on 17 of the 20 benchmark datasets, showing that our model is robust and generalizable in the common BioNLP tasks. We have made BioALBERT freely available which will help the BioNLP community avoid computational cost of training and establish a new set of baselines for future efforts across a broad range of BioNLP tasks.


Assuntos
Benchmarking , Processamento de Linguagem Natural , Fontes de Energia Elétrica , Idioma , PubMed
14.
Sci Rep ; 12(1): 5436, 2022 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-35361890

RESUMO

Sentiment analysis (SA) is an important task because of its vital role in analyzing people's opinions. However, existing research is solely based on the English language with limited work on low-resource languages. This study introduced a new multi-class Urdu dataset based on user reviews for sentiment analysis. This dataset is gathered from various domains such as food and beverages, movies and plays, software and apps, politics, and sports. Our proposed dataset contains 9312 reviews manually annotated by human experts into three classes: positive, negative and neutral. The main goal of this research study is to create a manually annotated dataset for Urdu sentiment analysis and to set baseline results using rule-based, machine learning (SVM, NB, Adabbost, MLP, LR and RF) and deep learning (CNN-1D, LSTM, Bi-LSTM, GRU and Bi-GRU) techniques. Additionally, we fine-tuned Multilingual BERT(mBERT) for Urdu sentiment analysis. We used four text representations: word n-grams, char n-grams,pre-trained fastText and BERT word embeddings to train our classifiers. We trained these models on two different datasets for evaluation purposes. Finding shows that the proposed mBERT model with BERT pre-trained word embeddings outperformed deep learning, machine learning and rule-based classifiers and achieved an F1 score of 81.49%.


Assuntos
Idioma , Multilinguismo , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural
15.
Neurosurg Focus ; 52(4): E7, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35364584

RESUMO

OBJECTIVE: The purpose of this study was to develop natural language processing (NLP)-based machine learning algorithms to automatically differentiate lumbar disc herniation (LDH) and lumbar spinal stenosis (LSS) based on positive symptoms in free-text admission notes. The secondary purpose was to compare the performance of the deep learning algorithm with the ensemble model on the current task. METHODS: In total, 1921 patients whose principal diagnosis was LDH or LSS between June 2013 and June 2020 at Zhongda Hospital, affiliated with Southeast University, were retrospectively analyzed. The data set was randomly divided into a training set and testing set at a 7:3 ratio. Long Short-Term Memory (LSTM) and extreme gradient boosting (XGBoost) models were developed in this study. NLP algorithms were assessed on the testing set by the following metrics: receiver operating characteristic (ROC) curve, area under the curve (AUC), accuracy score, recall score, F1 score, and precision score. RESULTS: In the testing set, the LSTM model achieved an AUC of 0.8487, accuracy score of 0.7818, recall score of 0.9045, F1 score of 0.8108, and precision score of 0.7347. In comparison, the XGBoost model achieved an AUC of 0.7565, accuracy score of 0.6961, recall score of 0.7387, F1 score of 0.7153, and precision score of 0.6934. CONCLUSIONS: NLP-based machine learning algorithms were a promising auxiliary to the electronic health record in spine disease diagnosis. LSTM, the deep learning model, showed better capacity compared with the widely used ensemble model, XGBoost, in differentiation of LDH and LSS using positive symptoms. This study presents a proof of concept for the application of NLP in prediagnosis of spine disease.


Assuntos
Deslocamento do Disco Intervertebral , Estenose Espinal , Humanos , Deslocamento do Disco Intervertebral/diagnóstico , Aprendizado de Máquina , Processamento de Linguagem Natural , Estudos Retrospectivos , Estenose Espinal/diagnóstico
16.
Front Public Health ; 10: 778463, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35419333

RESUMO

Social determinants of health (SDoH) are important factors associated with cancer risk and treatment outcomes. There is an increasing interest in exploring SDoH captured in electronic health records (EHRs) to assess cancer risk and outcomes; however, most SDoH are only captured in free-text clinical narratives such as physicians' notes that are not readily accessible. In this study, we applied a natural language processing (NLP) system to identify 15 categories of SDoH from a total of 10,855 lung cancer patients at the University of Florida Health. We aggregated the SDoH concepts into patient-level and assessed how each of the 15 categories of SDoH were documented in cancer patient's notes. To the best of our knowledge, this is one of the first studies to examine the documentation of SDoH in clinical narratives from a real-world lung cancer patient cohort. This study could guide future studies to better utilize SDoH information documented in clinical narratives.


Assuntos
Neoplasias Pulmonares , Determinantes Sociais da Saúde , Documentação , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural
17.
BMC Bioinformatics ; 23(1): 120, 2022 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-35379166

RESUMO

BACKGROUND: Recently, automatically extracting biomedical relations has been a significant subject in biomedical research due to the rapid growth of biomedical literature. Since the adaptation to the biomedical domain, the transformer-based BERT models have produced leading results on many biomedical natural language processing tasks. In this work, we will explore the approaches to improve the BERT model for relation extraction tasks in both the pre-training and fine-tuning stages of its applications. In the pre-training stage, we add another level of BERT adaptation on sub-domain data to bridge the gap between domain knowledge and task-specific knowledge. Also, we propose methods to incorporate the ignored knowledge in the last layer of BERT to improve its fine-tuning. RESULTS: The experiment results demonstrate that our approaches for pre-training and fine-tuning can improve the BERT model performance. After combining the two proposed techniques, our approach outperforms the original BERT models with averaged F1 score improvement of 2.1% on relation extraction tasks. Moreover, our approach achieves state-of-the-art performance on three relation extraction benchmark datasets. CONCLUSIONS: The extra pre-training step on sub-domain data can help the BERT model generalization on specific tasks, and our proposed fine-tuning mechanism could utilize the knowledge in the last layer of BERT to boost the model performance. Furthermore, the combination of these two approaches further improves the performance of BERT model on the relation extraction tasks.


Assuntos
Pesquisa Biomédica , Mineração de Dados , Pesquisa Biomédica/métodos , Mineração de Dados/métodos , Fontes de Energia Elétrica , Processamento de Linguagem Natural , Publicações
18.
Comput Intell Neurosci ; 2022: 2455160, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35432519

RESUMO

Artificial Intelligence has guided technological progress in recent years; it has shown significant development with increased academic studies on Machine Learning and the high demand for this field in the sector. In addition to the advancement of technology day by day, the pandemic, which has become a part of our lives since early 2020, has led to social media occupying a larger place in the lives of individuals. Therefore, social media posts have become an excellent data source for the field of sentiment analysis. The main contribution of this study is based on the Natural Language Processing method, which is one of the machine learning topics in the literature. Sentiment analysis classification is a solid example for machine learning tasks that belongs to human-machine interaction. It is essential to make the computer understand people emotional situation with classifiers. There are a limited number of Turkish language studies in the literature. Turkish language has different types of linguistic features from English. Since Turkish is an agglutinative language, it is challenging to make sentiment analysis with that language. This paper aims to perform sentiment analysis of several machine learning algorithms on Turkish language datasets that are collected from Twitter. In this research, besides using public dataset that belongs to Beyaz (2021) to get more general results, another dataset is created to understand the impact of the pandemic on people and to learn about public opinions. Therefore, a custom dataset, namely, SentimentSet (Balli 2021), was created, consisting of Turkish tweets that were filtered with words such as pandemic and corona by manually marking as positive, negative, or neutral. Besides, SentimentSet could be used in future researches as benchmark dataset. Results show classification accuracy of not only up to ∼87% with test data from datasets of both datasets and trained models, but also up to ∼84% with small "Sample Test Data" generated by the same methods as SentimentSet dataset. These research results contributed to indicating Turkish language specific sentiment analysis that is dependent on language specifications.


Assuntos
Processamento de Linguagem Natural , Mídias Sociais , Inteligência Artificial , Humanos , Aprendizado de Máquina , Opinião Pública
19.
Front Public Health ; 10: 880207, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35480589

RESUMO

Sentiment Analysis (SA) is a novel branch of Natural Language Processing (NLP) that measures emotions or attitudes behind a written text. First applications of SA in healthcare were the detection of disease-related emotional polarities in social media. Now it is possible to extract more complex attitudes (rank attitudes from 1 to 5, assign appraisal values, apply multiple text classifiers) or feelings through NLP techniques, with clear benefits in cardiology; as emotions were proved to be veritable risk factors for the development of cardiovascular diseases (CVD). Our narrative review aimed to summarize the current directions of SA in cardiology and raise the awareness of cardiologists about the potentiality of this novel domain. This paper introduces the readers to basic concepts surrounding medical SA and the need for SA in cardiovascular healthcare. Our synthesis of the current literature proved SA's clinical potential in CVD. However, many other clinical utilities, such as the assessment of emotional consequences of illness, patient-physician relationship, physician intuitions in CVD are not yet explored. These issues constitute future research directions, along with proposing detailed regulations, popularizing health social media among elders, developing insightful definitions of emotional polarity, and investing research into the development of powerful SA algorithms.


Assuntos
Cardiologia , Doenças Cardiovasculares , Idoso , Emoções , Humanos , Processamento de Linguagem Natural
20.
BMJ Open ; 12(4): e057227, 2022 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-35459671

RESUMO

PURPOSE: NeuroBlu is a real-world data (RWD) repository that contains deidentified electronic health record (EHR) data from US mental healthcare providers operating the MindLinc EHR system. NeuroBlu enables users to perform statistical analysis through a secure web-based interface. Structured data are available for sociodemographic characteristics, mental health service contacts, hospital admissions, International Classification of Diseases ICD-9/ICD-10 diagnosis, prescribed medications, family history of mental disorders, Clinical Global Impression-Severity and Improvement (CGI-S/CGI-I) and Global Assessment of Functioning (GAF). To further enhance the data set, natural language processing (NLP) tools have been applied to obtain mental state examination (MSE) and social/environmental data. This paper describes the development and implementation of NeuroBlu, the procedures to safeguard data integrity and security and how the data set supports the generation of real-world evidence (RWE) in mental health. PARTICIPANTS: As of 31 July 2021, 562 940 individuals (48.9% men) were present in the data set with a mean age of 33.4 years (SD: 18.4 years). The most frequently recorded diagnoses were substance use disorders (1 52 790 patients), major depressive disorder (1 29 120 patients) and anxiety disorders (1 03 923 patients). The median duration of follow-up was 7 months (IQR: 1.3 to 24.4 months). FINDINGS TO DATE: The data set has supported epidemiological studies demonstrating increased risk of psychiatric hospitalisation and reduced antidepressant treatment effectiveness among people with comorbid substance use disorders. It has also been used to develop data visualisation tools to support clinical decision-making, evaluate comparative effectiveness of medications, derive models to predict treatment response and develop NLP applications to obtain clinical information from unstructured EHR data. FUTURE PLANS: The NeuroBlu data set will be further analysed to better understand factors related to poor clinical outcome, treatment responsiveness and the development of predictive analytic tools that may be incorporated into the source EHR system to support real-time clinical decision-making in the delivery of mental healthcare services.


Assuntos
Transtorno Depressivo Maior , Serviços de Saúde Mental , Adulto , Registros Eletrônicos de Saúde , Feminino , Humanos , Masculino , Saúde Mental , Processamento de Linguagem Natural
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...