Pesquisa | Secretaria de Estado da Saúde

1.

NanoBERTa-ASP: predicting nanobody paratope based on a pretrained RoBERTa model.

Li, Shangru; Meng, Xiangpeng; Li, Rui; Huang, Bingding; Wang, Xin.

BMC Bioinformatics ; 25(1): 122, 2024 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-38515052

RESUMO

BACKGROUND: Nanobodies, also known as VHH or single-domain antibodies, are unique antibody fragments derived solely from heavy chains. They offer advantages of small molecules and conventional antibodies, making them promising therapeutics. The paratope is the specific region on an antibody that binds to an antigen. Paratope prediction involves the identification and characterization of the antigen-binding site on an antibody. This process is crucial for understanding the specificity and affinity of antibody-antigen interactions. Various computational methods and experimental approaches have been developed to predict and analyze paratopes, contributing to advancements in antibody engineering, drug development, and immunotherapy. However, existing predictive models trained on traditional antibodies may not be suitable for nanobodies. Additionally, the limited availability of nanobody datasets poses challenges in constructing accurate models. METHODS: To address these challenges, we have developed a novel nanobody prediction model, named NanoBERTa-ASP (Antibody Specificity Prediction), which is specifically designed for predicting nanobody-antigen binding sites. The model adopts a training strategy more suitable for nanobodies, based on an advanced natural language processing (NLP) model called BERT (Bidirectional Encoder Representations from Transformers). To be more specific, the model utilizes a masked language modeling approach named RoBERTa (Robustly Optimized BERT Pretraining Approach) to learn the contextual information of the nanobody sequence and predict its binding site. RESULTS: NanoBERTa-ASP achieved exceptional performance in predicting nanobody binding sites, outperforming existing methods, indicating its proficiency in capturing sequence information specific to nanobodies and accurately identifying their binding sites. Furthermore, NanoBERTa-ASP provides insights into the interaction mechanisms between nanobodies and antigens, contributing to a better understanding of nanobodies and facilitating the design and development of nanobodies with therapeutic potential. CONCLUSION: NanoBERTa-ASP represents a significant advancement in nanobody paratope prediction. Its superior performance highlights the potential of deep learning approaches in nanobody research. By leveraging the increasing volume of nanobody data, NanoBERTa-ASP can further refine its predictions, enhance its performance, and contribute to the development of novel nanobody-based therapeutics. Github repository: https://github.com/WangLabforComputationalBiology/NanoBERTa-ASP.

Assuntos

Anticorpos de Domínio Único , Sítios de Ligação de Anticorpos , Anticorpos de Domínio Único/química , Anticorpos , Sítios de Ligação , Especificidade de Anticorpos

2.

Developing large language models to detect adverse drug events in posts on x.

Deng, Yu; Xing, Yunzhao; Quach, Jason; Chen, Xiaotian; Wu, Xiaoqiang; Zhang, Yafei; Moureaud, Charlotte; Yu, Mengjia; Zhao, Yujie; Wang, Li; Zhong, Sheng.

J Biopharm Stat ; : 1-12, 2024 Sep 20.

Artigo em Inglês | MEDLINE | ID: mdl-39300965

RESUMO

Adverse drug events (ADEs) are one of the major causes of hospital admissions and are associated with increased morbidity and mortality. Post-marketing ADE identification is one of the most important phases of drug safety surveillance. Traditionally, data sources for post-marketing surveillance mainly come from spontaneous reporting system such as the Food and Drug Administration Adverse Event Reporting System (FAERS). Social media data such as posts on X (formerly Twitter) contain rich patient and medication information and could potentially accelerate drug surveillance research. However, ADE information in social media data is usually locked in the text, making it difficult to be employed by traditional statistical approaches. In recent years, large language models (LLMs) have shown promise in many natural language processing tasks. In this study, we developed several LLMs to perform ADE classification on X data. We fine-tuned various LLMs including BERT-base, Bio_ClinicalBERT, RoBERTa, and RoBERTa-large. We also experimented ChatGPT few-shot prompting and ChatGPT fine-tuned on the whole training data. We then evaluated the model performance based on sensitivity, specificity, negative predictive value, positive predictive value, accuracy, F1-measure, and area under the ROC curve. Our results showed that RoBERTa-large achieved the best F1-measure (0.8) among all models followed by ChatGPT fine-tuned model with F1-measure of 0.75. Our feature importance analysis based on 1200 random samples and RoBERTa-Large showed the most important features are as follows: "withdrawals"/"withdrawal", "dry", "dealing", "mouth", and "paralysis". The good model performance and clinically relevant features show the potential of LLMs in augmenting ADE detection for post-marketing drug safety surveillance.

3.

Classification of Patients' Judgments of Their Physicians in Web-Based Written Reviews Using Natural Language Processing: Algorithm Development and Validation.

Madanay, Farrah; Tu, Karissa; Campagna, Ada; Davis, J Kelly; Doerstling, Steven S; Chen, Felicia; Ubel, Peter A.

J Med Internet Res ; 26: e50236, 2024 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-39088259

RESUMO

BACKGROUND: Patients increasingly rely on web-based physician reviews to choose a physician and share their experiences. However, the unstructured text of these written reviews presents a challenge for researchers seeking to make inferences about patients' judgments. Methods previously used to identify patient judgments within reviews, such as hand-coding and dictionary-based approaches, have posed limitations to sample size and classification accuracy. Advanced natural language processing methods can help overcome these limitations and promote further analysis of physician reviews on these popular platforms. OBJECTIVE: This study aims to train, test, and validate an advanced natural language processing algorithm for classifying the presence and valence of 2 dimensions of patient judgments in web-based physician reviews: interpersonal manner and technical competence. METHODS: We sampled 345,053 reviews for 167,150 physicians across the United States from Healthgrades.com, a commercial web-based physician rating and review website. We hand-coded 2000 written reviews and used those reviews to train and test a transformer classification algorithm called the Robustly Optimized BERT (Bidirectional Encoder Representations from Transformers) Pretraining Approach (RoBERTa). The 2 fine-tuned models coded the reviews for the presence and positive or negative valence of patients' interpersonal manner or technical competence judgments of their physicians. We evaluated the performance of the 2 models against 200 hand-coded reviews and validated the models using the full sample of 345,053 RoBERTa-coded reviews. RESULTS: The interpersonal manner model was 90% accurate with precision of 0.89, recall of 0.90, and weighted F1-score of 0.89. The technical competence model was 90% accurate with precision of 0.91, recall of 0.90, and weighted F1-score of 0.90. Positive-valence judgments were associated with higher review star ratings whereas negative-valence judgments were associated with lower star ratings. Analysis of the data by review rating and physician gender corresponded with findings in prior literature. CONCLUSIONS: Our 2 classification models coded interpersonal manner and technical competence judgments with high precision, recall, and accuracy. These models were validated using review star ratings and results from previous research. RoBERTa can accurately classify unstructured, web-based review text at scale. Future work could explore the use of this algorithm with other textual data, such as social media posts and electronic health records.

Assuntos

Algoritmos , Internet , Processamento de Linguagem Natural , Humanos , Feminino , Masculino , Médicos , Relações Médico-Paciente , Julgamento , Adulto , Pessoa de Meia-Idade

4.

Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer.

Li, Dongmei; Yang, Yu; Cui, Jinman; Meng, Xianghao; Qu, Jintao; Jiang, Zhuobin; Zhao, Yufeng.

BMC Med Inform Decis Mak ; 24(1): 218, 2024 Jul 31.

Artigo em Inglês | MEDLINE | ID: mdl-39085892

RESUMO

BACKGROUND: Most Chinese joint entity and relation extraction tasks in medicine involve numerous nested entities, overlapping relations, and other challenging extraction issues. In response to these problems, some traditional methods decompose the joint extraction task into multiple steps or multiple modules, resulting in local dependency in the meantime. METHODS: To alleviate this issue, we propose a joint extraction model of Chinese medical entities and relations based on RoBERTa and single-module global pointer, namely RSGP, which formulates joint extraction as a global pointer linking problem. Considering the uniqueness of Chinese language structure, we introduce the RoBERTa-wwm pre-trained language model at the encoding layer to obtain a better embedding representation. Then, we represent the input sentence as a third-order tensor and score each position in the tensor to prepare for the subsequent process of decoding the triples. In the end, we design a novel single-module global pointer decoding approach to alleviate the generation of redundant information. Specifically, we analyze the decoding process of single character entities individually, improving the time and space performance of RSGP to some extent. RESULTS: In order to verify the effectiveness of our model in extracting Chinese medical entities and relations, we carry out the experiments on the public dataset, CMeIE. Experimental results show that RSGP performs significantly better on the joint extraction of Chinese medical entities and relations, and achieves state-of-the-art results compared with baseline models. CONCLUSION: The proposed RSGP can effectively extract entities and relations from Chinese medical texts and help to realize the structure of Chinese medical texts, so as to provide high-quality data support for the construction of Chinese medical knowledge graphs.

Assuntos

Processamento de Linguagem Natural , Humanos , China , Mineração de Dados , População do Leste Asiático

5.

RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention Network.

Lv, Shaoqing; Dong, Jungang; Wang, Chichi; Wang, Xuanhong; Bao, Zhiqiang.

Sensors (Basel) ; 24(11)2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38894157

RESUMO

With the development of deep learning, several graph neural network (GNN)-based approaches have been utilized for text classification. However, GNNs encounter challenges when capturing contextual text information within a document sequence. To address this, a novel text classification model, RB-GAT, is proposed by combining RoBERTa-BiGRU embedding and a multi-head Graph ATtention Network (GAT). First, the pre-trained RoBERTa model is exploited to learn word and text embeddings in different contexts. Second, the Bidirectional Gated Recurrent Unit (BiGRU) is employed to capture long-term dependencies and bidirectional sentence information from the text context. Next, the multi-head graph attention network is applied to analyze this information, which serves as a node feature for the document. Finally, the classification results are generated through a Softmax layer. Experimental results on five benchmark datasets demonstrate that our method can achieve an accuracy of 71.48%, 98.45%, 80.32%, 90.84%, and 95.67% on Ohsumed, R8, MR, 20NG and R52, respectively, which is superior to the existing nine text classification approaches.

6.

RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes.

Laios, Alexandros; Kalampokis, Evangelos; Mamalis, Marios Evangelos; Tarabanis, Constantine; Nugent, David; Thangavelu, Amudha; Theophilou, Georgios; De Jong, Diederick.

Cancer Control ; 30: 10732748231209892, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37915208

RESUMO

INTRODUCTION: Contemporary efforts to predict surgical outcomes focus on the associations between traditional discrete surgical risk factors. We aimed to determine whether natural language processing (NLP) of unstructured operative notes improves the prediction of residual disease in women with advanced epithelial ovarian cancer (EOC) following cytoreductive surgery. METHODS: Electronic Health Records were queried to identify women with advanced EOC including their operative notes. The Term Frequency - Inverse Document Frequency (TF-IDF) score was used to quantify the discrimination capacity of sequences of words (n-grams) regarding the existence of residual disease. We employed the state-of-the-art RoBERTa-based classifier to process unstructured surgical notes. Discrimination was measured using standard performance metrics. An XGBoost model was then trained on the same dataset using both discrete and engineered clinical features along with the probabilities outputted by the RoBERTa classifier. RESULTS: The cohort consisted of 555 cases of EOC cytoreduction performed by eight surgeons between January 2014 and December 2019. Discrete word clouds weighted by n-gram TF-IDF score difference between R0 and non-R0 resection were identified. The words 'adherent' and 'miliary disease' best discriminated between the two groups. The RoBERTa model reached high evaluation metrics (AUROC .86; AUPRC .87, precision, recall, and F1 score of .77 and accuracy of .81). Equally, it outperformed models that used discrete clinical and engineered features and outplayed the performance of other state-of-the-art NLP tools. When the probabilities from the RoBERTa classifier were combined with commonly used predictors in the XGBoost model, a marginal improvement in the overall model's performance was observed (AUROC and AUPRC of .91, with all other metrics the same). CONCLUSION/IMPLICATIONS: We applied a sui generis approach to extract information from the abundant textual surgical data and demonstrated how it can be effectively used for classification prediction, outperforming models relying on conventional structured data. State-of-art NLP applications in biomedical texts can improve modern EOC care.

Assuntos

Procedimentos Cirúrgicos de Citorredução , Neoplasias Ovarianas , Humanos , Feminino , Aprendizado de Máquina , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Carcinoma Epitelial do Ovário/cirurgia , Neoplasias Ovarianas/cirurgia

7.

A Joint Extraction System Based on Conditional Layer Normalization for Health Monitoring.

Shi, Binbin; Fan, Rongli; Zhang, Lijuan; Huang, Jie; Xiong, Neal; Vasilakos, Athanasios; Wan, Jian; Zhang, Lei.

Sensors (Basel) ; 23(10)2023 May 16.

Artigo em Inglês | MEDLINE | ID: mdl-37430725

RESUMO

Natural language processing (NLP) technology has played a pivotal role in health monitoring as an important artificial intelligence method. As a key technology in NLP, relation triplet extraction is closely related to the performance of health monitoring. In this paper, a novel model is proposed for joint extraction of entities and relations, combining conditional layer normalization with the talking-head attention mechanism to strengthen the interaction between entity recognition and relation extraction. In addition, the proposed model utilizes position information to enhance the extraction accuracy of overlapping triplets. Experiments on the Baidu2019 and CHIP2020 datasets demonstrate that the proposed model can effectively extract overlapping triplets, which leads to significant performance improvements compared with baselines.

Assuntos

Inteligência Artificial , Processamento de Linguagem Natural , Reconhecimento Psicológico , Tecnologia

8.

Detecting defense mechanisms from Adult Attachment Interview (AAI) transcripts using machine learning.

Tasca, Anthony N; Carlucci, Samantha; Wiley, James C; Holden, Matthew; El-Roby, Ahmed; Tasca, Giorgio A.

Psychother Res ; 33(6): 757-767, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-36525586

RESUMO

OBJECTIVE: Defensive functioning (i.e., unconscious process used to manage real or perceived threats) may play a role in the development of various psychopathologies. It is typically assessed via observer rating measures, however, human coding of defensive functioning is resource-intensive and time-consuming. The purpose of this study was to develop a machine learning approach to automate coding of defense mechanisms from interview transcripts. METHOD: Participants included a clinical sample of women with binge-eating disorder (n = 92) and a community sample without binge-eating disorder (n = 66). We trained and evaluated five RoBERTa-based models to detect the presence of defenses in 16,785 interviewer-participant talk-turn pairs nested within 192 interviews. A model detected the presence of any defense, while four additional models detected the most common defenses in this sample (repression, intellectualization, reaction formation, undoing). RESULTS: The models were capable of distinguishing defenses (ROC-AUC .82-.90) but were not proficient enough to warrant replacing human coders (PR-AUC .28-.60). Follow-up analysis was performed to assess other practical uses of these models. DISCUSSION: Our machine learning models could be used to assist coders. Future research should conduct a deployment study to determine if human coding of defense mechanisms can be expedited using machine learning models.

Assuntos

Mecanismos de Defesa , Aprendizado de Máquina , Adulto , Feminino , Humanos , Transtorno da Compulsão Alimentar/psicologia

9.

Pre-trained ensemble model for identification of emotion during COVID-19 based on emergency response support system dataset.

Nimmi, K; Janet, B; Selvan, A Kalai; Sivakumaran, N.

Appl Soft Comput ; 122: 108842, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35465357

RESUMO

The COVID-19 precautions, lockdown, and quarantine implemented throughout the epidemic resulted in a worldwide economic disaster. People are facing unprecedented levels of intense threat, necessitating professional, systematic psychiatric intervention and assistance. New psychological services must be established as quickly as possible to support the mental healthcare needs of people in this pandemic condition. This study examines the contents of calls landed in the emergency response support system (ERSS) during the pandemic. Furthermore, a combined analysis of Twitter patterns connected to emergency services could be valuable in assisting people in this pandemic crisis and understanding and supporting people's emotions. The proposed Average Voting Ensemble Deep Learning model (AVEDL Model) is based on the Average Voting technique. The AVEDL Model is utilized to classify emotion based on COVID-19 associated emergency response support system calls (transcribed) along with tweets. Pre-trained transformer-based models BERT, DistilBERT, and RoBERTa are combined to build the AVEDL Model, which achieves the best results. The AVEDL Model is trained and tested for emotion detection using the COVID-19 labeled tweets and call content of the emergency response support system. This is the first deep learning ensemble model using COVID-19 emotion analysis to the best of our knowledge. The AVEDL Model outperforms standard deep learning and machine learning models by attaining an accuracy of 86.46 percent and Macro-average F1-score of 85.20 percent.

10.

A sui generis QA approach using RoBERTa for adverse drug event identification.

Jain, Harshit; Raj, Nishant; Mishra, Suyash.

BMC Bioinformatics ; 22(Suppl 11): 330, 2021 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-34674630

RESUMO

BACKGROUND: Extraction of adverse drug events from biomedical literature and other textual data is an important component to monitor drug-safety and this has attracted attention of many researchers in healthcare. Existing works are more pivoted around entity-relation extraction using bidirectional long short term memory networks (Bi-LSTM) which does not attain the best feature representations. RESULTS: In this paper, we introduce a question answering framework that exploits the robustness, masking and dynamic attention capabilities of RoBERTa by a technique of domain adaptation and attempt to overcome the aforementioned limitations. With formulation of an end-to-end pipeline, our model outperforms the prior work by 9.53% F1-Score. CONCLUSION: An end-to-end pipeline that leverages state of the art transformer architecture in conjunction with QA approach can bolster the performances of entity-relation extraction tasks in the biomedical domain. In particular, we believe our research would be helpful in identification of potential adverse drug reactions in mono as well as combination therapy related textual data.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Redes Neurais de Computação , Humanos

11.

COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets.

Malla, SreeJagadeesh; P J A, Alphonse.

Appl Soft Comput ; 107: 107495, 2021 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-36568257

RESUMO

On 11 March 2020, the (WHO) World Health Organization declared COVID-19 (CoronaVirus Disease 2019) as a pandemic. A further crisis has manifested mass fear and panic, driven by lack of information, or sometimes outright misinformation, alongside the coronavirus pandemic. Twitter is one of the prominent and trusted social media in this current outbreak. Over time, boundless COVID-19 headlines and vast awareness have been spreading, with tweets, updates, videos, and explosive posts. Few studies have been performed on the pandemic to detect and interrelate various disease types, including current coronavirus. However, it is pretty tricky to discriminate and detect a specific category. This work is motivated by the need to inform society about limiting irrelevant information and avoiding spreading negative emotions. In this context, the current work focuses on informative tweet detection in the pandemic to provide relevant information to the government, medical organizations, victims services, etc. This paper used a Majority Voting technique-based Ensemble Deep Learning (MVEDL) model. This MVEDL model is used to identify COVID-19 related (INFORMATIVE) tweets. The state-of-art deep learning models RoBERTa, BERTweet, and CT-BERT are used for best performance with the MVEDL model. The "COVID-19 English labeled tweets" dataset is used for training and testing the MVEDL model. The MVEDL model has shown 91.75 percent accuracy, 91.14 percent F1-score and outperforms the traditional machine learning and deep learning models. We also investigate how to use the MVEDL model for sentiment analysis on 226668 unlabeled COVID-19 tweets and their informative tweets. The application section discussed a comprehensive analysis of both actual and informative tweets. According to our knowledge, this is the first work on COVID-19 sentiment analysis using a deep learning ensemble model.

12.

Profile of microbial communities on carbonate stones of the medieval church of San Leonardo di Siponto (Italy) by Illumina-based deep sequencing.

Chimienti, Guglielmina; Piredda, Roberta; Pepe, Gabriella; van der Werf, Inez Dorothé; Sabbatini, Luigia; Crecchio, Carmine; Ricciuti, Patrizia; D'Erchia, Anna Maria; Manzari, Caterina; Pesole, Graziano.

Appl Microbiol Biotechnol ; 100(19): 8537-48, 2016 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-27283019

RESUMO

Comprehensive studies of the biodiversity of the microbial epilithic community on monuments may provide critical insights for clarifying factors involved in the colonization processes. We carried out a high-throughput investigation of the communities colonizing the medieval church of San Leonardo di Siponto (Italy) by Illumina-based deep sequencing. The metagenomic analysis of sequences revealed the presence of Archaea, Bacteria, and Eukarya. Bacteria were Actinobacteria, Proteobacteria, Bacteroidetes, Cyanobacteria, Chloroflexi, Firmicutes and Candidatus Saccharibacteria. The predominant phylum was Actinobacteria, with the orders Actynomycetales and Rubrobacteriales, represented by the genera Pseudokineococcus, Sporichthya, Blastococcus, Arthrobacter, Geodermatophilus, Friedmanniella, Modestobacter, and Rubrobacter, respectively. Cyanobacteria sequences showing strong similarity with an uncultured bacterium sequence were identified. The presence of the green algae Oocystaceae and Trebuxiaceae was revealed. The microbial diversity was explored at qualitative and quantitative levels, evaluating the richness (the number of operational taxonomic units (OTUs)) and the abundance of reads associated with each OTU. The rarefaction curves approached saturation, suggesting that the majority of OTUs were recovered. The results highlighted a structured community, showing low diversity, made up of extremophile organisms adapted to desiccation and UV radiation. Notably, the microbiome appeared to be composed not only of microorganisms possibly involved in biodeterioration but also of carbonatogenic bacteria, such as those belonging to the genus Arthrobacter, which could be useful in bioconservation. Our investigation demonstrated that molecular tools, and in particular the easy-to-run next-generation sequencing, are powerful to perform a microbiological diagnosis in order to plan restoration and protection strategies.

Assuntos

Biota , Carbonatos , Microbiologia Ambiental , Sequenciamento de Nucleotídeos em Larga Escala , Bactérias/classificação , Bactérias/genética , Clorófitas/classificação , Clorófitas/genética , Itália , Metagenômica , Análise de Sequência de DNA

13.

Role of Natural Language Processing in Automatic Detection of Unexpected Findings in Radiology Reports: A Comparative Study of RoBERTa, CNN, and ChatGPT.

López-Úbeda, Pilar; Martín-Noguerol, Teodoro; Escartín, Jorge; Luna, Antonio.

Acad Radiol ; 2024 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-39122584

RESUMO

RATIONALE AND OBJECTIVES: Large Language Models can capture the context of radiological reports, offering high accuracy in detecting unexpected findings. We aim to fine-tune a Robustly Optimized BERT Pretraining Approach (RoBERTa) model for the automatic detection of unexpected findings in radiology reports to assist radiologists in this relevant task. Second, we compared the performance of RoBERTa with classical convolutional neural network (CNN) and with GPT4 for this goal. MATERIALS AND METHODS: For this study, a dataset consisting of 44,631 radiological reports for training and 5293 for the initial test set was used. A smaller subset comprising 100 reports was utilized for the comparative test set. The complete dataset was obtained from our institution's Radiology Information System, including reports from various dates, examinations, genders, ages, etc. For the study's methodology, we evaluated two Large Language Models, specifically performing fine-tuning on RoBERTa and developing a prompt for ChatGPT. Furthermore, extending previous studies, we included a CNN in our comparison. RESULTS: The results indicate an accuracy of 86.15% in the initial test set using the RoBERTa model. Regarding the comparative test set, RoBERTa achieves an accuracy of 79%, ChatGPT 64%, and the CNN 49%. Notably, RoBERTa outperforms the other systems by 30% and 15%, respectively. CONCLUSION: Fine-tuned RoBERTa model can accurately detect unexpected findings in radiology reports outperforming the capability of CNN and ChatGPT for this task.

14.

Automatic text classification of prostate cancer malignancy scores in radiology reports using NLP models.

Collado-Montañez, Jaime; López-Úbeda, Pilar; Chizhikova, Mariia; Díaz-Galiano, M Carlos; Ureña-López, L Alfonso; Martín-Noguerol, Teodoro; Luna, Antonio; Martín-Valdivia, M Teresa.

Med Biol Eng Comput ; 2024 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-38844661

RESUMO

This paper presents the implementation of two automated text classification systems for prostate cancer findings based on the PI-RADS criteria. Specifically, a traditional machine learning model using XGBoost and a language model-based approach using RoBERTa were employed. The study focused on Spanish-language radiological MRI prostate reports, which has not been explored before. The results demonstrate that the RoBERTa model outperforms the XGBoost model, although both achieve promising results. Furthermore, the best-performing system was integrated into the radiological company's information systems as an API, operating in a real-world environment.

15.

Sentiment analysis of video danmakus based on MIBE-RoBERTa-FF-BiLSTM.

Zhao, Jianbo; Liu, Huailiang; Wang, Yakai; Zhang, Weili; Zhang, Xiaojin; Li, Bowei; Sun, Tong; Qi, Yanwei; Zhang, Shanzhuang.

Sci Rep ; 14(1): 5827, 2024 Mar 09.

Artigo em Inglês | MEDLINE | ID: mdl-38461303

RESUMO

Danmakus are user-generated comments that overlay on videos, enabling real-time interactions between viewers and video content. The emotional orientation of danmakus can reflect the attitudes and opinions of viewers on video segments, which can help video platforms optimize video content recommendation and evaluate users' abnormal emotion levels. Aiming at the problems of low transferability of traditional sentiment analysis methods in the danmaku domain, low accuracy of danmaku text segmentation, poor consistency of sentiment annotation, and insufficient semantic feature extraction, this paper proposes a video danmaku sentiment analysis method based on MIBE-RoBERTa-FF-BiLSTM. This paper constructs a "Bilibili Must-Watch List and Top Video Danmaku Sentiment Dataset" by ourselves, covering 10,000 positive and negative sentiment danmaku texts of 18 themes. A new word recognition algorithm based on mutual information (MI) and branch entropy (BE) is used to discover 2610 irregular network popular new words from trigrams to heptagrams in the dataset, forming a domain lexicon. The Maslow's hierarchy of needs theory is applied to guide the consistent sentiment annotation. The domain lexicon is integrated into the feature fusion layer of the RoBERTa-FF-BiLSTM model to fully learn the semantic features of word information, character information, and context information of danmaku texts and perform sentiment classification. Comparative experiments on the dataset show that the model proposed in this paper has the best comprehensive performance among the mainstream models for video danmaku text sentiment classification, with an F1 value of 94.06%, and its accuracy and robustness are also better than other models. The limitations of this paper are that the construction of the domain lexicon still requires manual participation and review, the semantic information of danmaku video content and the positive case preference are ignored.

16.

Identifying and resolving conflict in mobile application features through contradictory feedback analysis.

Gambo, Ishaya; Massenon, Rhodes; Ogundokun, Roseline Oluwaseun; Agarwal, Saurabh; Pak, Wooguil.

Heliyon ; 10(17): e36729, 2024 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-39281433

RESUMO

As mobile applications proliferate and user feedback becomes abundant, the task of identifying and resolving conflicts among application features is crucial for delivering satisfactory user experiences. This research, motivated to align application development with user preferences, introduces a novel methodology that leverages advanced Natural Language Processing techniques. The paper showcases the use of sentiment analysis using RoBERTa, topic modeling with Non-negative matrix factorization (NMF), and semantic similarity measures from Sentence-BERT. These techniques enable the identification of contradictory sentiments, the discovery of latent topics representing application features, and the clustering of related feedback instances. The approach detects conflicts by analyzing sentiment distributions within semantically similar clusters, further enhanced by incorporating antonym detection and negation handling. It employs majority voting, weighted ranking based on rating scores, and frequency analysis of feature mentions to resolve conflicts, providing actionable insights for prioritizing requirements. Comprehensive evaluations on large-scale iOS App Store and Google Play Store datasets demonstrate the approach's effectiveness, outperforming baseline methods and existing techniques. The research improves mobile application development and user experiences by aligning features with user preferences and providing interpretable conflict resolution strategies, thereby introducing a novel approach to the field of mobile application development.

17.

Entity recognition of railway signal equipment fault information based on RoBERTa-wwm and deep learning integration.

Lin, Junting; Li, Shan; Qin, Ning; Ding, Shuxin.

Math Biosci Eng ; 21(1): 1228-1248, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38303462

RESUMO

The operation and maintenance of railway signal systems create a significant and complex quantity of text data about faults. Aiming at the problems of fuzzy entity boundaries and low accuracy of entity recognition in the field of railway signal equipment faults, this paper provides a method for entity recognition of railway signal equipment fault information based on RoBERTa-wwm and deep learning integration. First, the model utilizes the RoBERTa-wwm pretrained language model to get the word vector of text sequences. Second, a parallel network consisting of a BiLSTM and a CNN is constructed to obtain the context feature information and the local attention information, respectively. Third, the feature vectors output from BiLSTM and CNN are combined and fed into MHA, focusing on extracting key feature information and mining the connection between different features. Finally, the label sequences with constraint relationships are outputted in CRF to complete the entity recognition task. The experimental analysis is carried out with fault text of railway signal equipment in the past ten years, and the experimental results show that the model has a higher evaluation index compared with the traditional model on this dataset, in which the precision, recall and F1 value are 93.25%, 92.45%, and 92.85%, respectively.

18.

Improving sentiment classification using a RoBERTa-based hybrid model.

Semary, Noura A; Ahmed, Wesam; Amin, Khalid; Plawiak, Pawel; Hammad, Mohamed.

Front Hum Neurosci ; 17: 1292010, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38130432

RESUMO

Introduction: Several attempts have been made to enhance text-based sentiment analysis's performance. The classifiers and word embedding models have been among the most prominent attempts. This work aims to develop a hybrid deep learning approach that combines the advantages of transformer models and sequence models with the elimination of sequence models' shortcomings. Methods: In this paper, we present a hybrid model based on the transformer model and deep learning models to enhance sentiment classification process. Robustly optimized BERT (RoBERTa) was selected for the representative vectors of the input sentences and the Long Short-Term Memory (LSTM) model in conjunction with the Convolutional Neural Networks (CNN) model was used to improve the suggested model's ability to comprehend the semantics and context of each input sentence. We tested the proposed model with two datasets with different topics. The first dataset is a Twitter review of US airlines and the second is the IMDb movie reviews dataset. We propose using word embeddings in conjunction with the SMOTE technique to overcome the challenge of imbalanced classes of the Twitter dataset. Results: With an accuracy of 96.28% on the IMDb reviews dataset and 94.2% on the Twitter reviews dataset, the hybrid model that has been suggested outperforms the standard methods. Discussion: It is clear from these results that the proposed hybrid RoBERTa-(CNN+ LSTM) method is an effective model in sentiment classification.

19.

Twitter-based gender recognition using transformers.

Nia, Zahra Movahedi; Ahmadi, Ali; Mellado, Bruce; Wu, Jianhong; Orbinski, James; Asgary, Ali; Kong, Jude D.

Math Biosci Eng ; 20(9): 15962-15981, 2023 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-37919997

RESUMO

Social media contains useful information about people and society that could help advance research in many different areas of health (e.g. by applying opinion mining, emotion/sentiment analysis and statistical analysis) such as mental health, health surveillance, socio-economic inequality and gender vulnerability. User demographics provide rich information that could help study the subject further. However, user demographics such as gender are considered private and are not freely available. In this study, we propose a model based on transformers to predict the user's gender from their images and tweets. The image-based classification model is trained in two different methods: using the profile image of the user and using various image contents posted by the user on Twitter. For the first method a Twitter gender recognition dataset, publicly available on Kaggle and for the second method the PAN-18 dataset is used. Several transformer models, i.e. vision transformers (ViT), LeViT and Swin Transformer are fine-tuned for both of the image datasets and then compared. Next, different transformer models, namely, bidirectional encoders representations from transformers (BERT), RoBERTa and ELECTRA are fine-tuned to recognize the user's gender by their tweets. This is highly beneficial, because not all users provide an image that indicates their gender. The gender of such users could be detected from their tweets. The significance of the image and text classification models were evaluated using the Mann-Whitney U test. Finally, the combination model improved the accuracy of image and text classification models by 11.73 and 5.26% for the Kaggle dataset and by 8.55 and 9.8% for the PAN-18 dataset, respectively. This shows that the image and text classification models are capable of complementing each other by providing additional information to one another. Our overall multimodal method has an accuracy of 88.11% for the Kaggle and 89.24% for the PAN-18 dataset and outperforms state-of-the-art models. Our work benefits research that critically require user demographic information such as gender to further analyze and study social media content for health-related issues.

Assuntos

Mídias Sociais , Humanos , Fontes de Energia Elétrica , Projetos de Pesquisa

20.

Off-label drug use during the COVID-19 pandemic in Africa: topic modelling and sentiment analysis of ivermectin in South Africa and Nigeria as a case study.

Movahedi Nia, Z; Bragazzi, N L; Ahamadi, A; Asgary, A; Mellado, B; Orbinski, J; Seyyed-Kalantari, L; Woldegerima, W A; Wu, J; Kong, J D.

J R Soc Interface ; 20(206): 20230200, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37700708

RESUMO

Although rejected by the World Health Organization, the human and even veterinary formulation of ivermectin has widely been used for prevention and treatment of COVID-19. In this work we leverage Twitter to understand the reasons for the drug use from ivermectin supporters, their source of information, their emotions, their gender demographics, and location information, in Nigeria and South Africa. Topic modelling is performed on a Twitter dataset gathered using keywords 'ivermectin' and 'ivm'. A model is fine-tuned on RoBERTa to find the stance of the tweets. Statistical analysis is performed to compare the stance and emotions. Most ivermectin supporters either redistribute conspiracy theories posted by influencers, or refer to flawed studies confirming ivermectin efficacy in vitro. Three emotions have the highest intensity, optimism, joy and disgust. The number of anti-ivermectin tweets has a significant positive correlation with vaccination rate. All the provinces in South Africa and most of the provinces of Nigeria are pro-ivermectin and have higher disgust polarity. This work makes the effort to understand public discussions regarding ivermectin during the COVID-19 pandemic to help policy-makers understand the rationale behind its popularity, and inform more targeted policies to discourage self-administration of ivermectin. Moreover, it is a lesson to future outbreaks.

Assuntos

COVID-19 , Uso Off-Label , Humanos , Nigéria/epidemiologia , África do Sul/epidemiologia , Análise de Sentimentos , Pandemias , COVID-19/epidemiologia , Ivermectina/uso terapêutico

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa