Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Offensive language detection in low resource languages: A use case of Persian language.

Mozafari, Marzieh; Mnassri, Khouloud; Farahbakhsh, Reza; Crespi, Noel.

PLoS One ; 19(6): e0304166, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38905214

RESUMO

THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. Different types of abusive content such as offensive language, hate speech, aggression, etc. have become prevalent in social media and many efforts have been dedicated to automatically detect this phenomenon in different resource-rich languages such as English. This is mainly due to the comparative lack of annotated data related to offensive language in low-resource languages, especially the ones spoken in Asian countries. To reduce the vulnerability among social media users from these regions, it is crucial to address the problem of offensive language in such low-resource languages. Hence, we present a new corpus of Persian offensive language consisting of 6,000 out of 520,000 randomly sampled micro-blog posts from X (Twitter) to deal with offensive language detection in Persian as a low-resource language in this area. We introduce a method for creating the corpus and annotating it according to the annotation practices of recent efforts for some benchmark datasets in other languages which results in categorizing offensive language and the target of offense as well. We perform extensive experiments with three classifiers in different levels of annotation with a number of classical Machine Learning (ML), Deep learning (DL), and transformer-based neural networks including monolingual and multilingual pre-trained language models. Furthermore, we propose an ensemble model integrating the aforementioned models to boost the performance of our offensive language detection task. Initial results on single models indicate that SVM trained on character or word n-grams are the best performing models accompanying monolingual transformer-based pre-trained language model ParsBERT in identifying offensive vs non-offensive content, targeted vs untargeted offense, and offensive towards individual or group. In addition, the stacking ensemble model outperforms the single models by a substantial margin, obtaining 5% respective macro F1-score improvement for three levels of annotation.

Assuntos

Idioma , Humanos , Mídias Sociais , Aprendizado de Máquina , Irã (Geográfico)

A survey on multi-lingual offensive language detection.

Mnassri, Khouloud; Farahbakhsh, Reza; Chalehchaleh, Razieh; Rajapaksha, Praboda; Jafari, Amir Reza; Li, Guanlin; Crespi, Noel.

PeerJ Comput Sci ; 10: e1934, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38660178

RESUMO

The prevalence of offensive content on online communication and social media platforms is growing more and more common, which makes its detection difficult, especially in multilingual settings. The term "Offensive Language" encompasses a wide range of expressions, including various forms of hate speech and aggressive content. Therefore, exploring multilingual offensive content, that goes beyond a single language, focus and represents more linguistic diversities and cultural factors. By exploring multilingual offensive content, we can broaden our understanding and effectively combat the widespread global impact of offensive language. This survey examines the existing state of multilingual offensive language detection, including a comprehensive analysis on previous multilingual approaches, and existing datasets, as well as provides resources in the field. We also explore the related community challenges on this task, which include technical, cultural, and linguistic ones, as well as their limitations. Furthermore, in this survey we propose several potential future directions toward more efficient solutions for multilingual offensive language detection, enabling safer digital communication environment worldwide.

Unveiling Human Values: Analyzing Emotions behind Arguments.

Jafari, Amir Reza; Rajapaksha, Praboda; Farahbakhsh, Reza; Li, Guanlin; Crespi, Noel.

Entropy (Basel) ; 26(4)2024 Apr 12.

Artigo em Inglês | MEDLINE | ID: mdl-38667881

RESUMO

Detecting the underlying human values within arguments is essential across various domains, ranging from social sciences to recent computational approaches. Identifying these values remains a significant challenge due to their vast numbers and implicit usage in discourse. This study explores the potential of emotion analysis as a key feature in improving the detection of human values and information extraction from this field. It aims to gain insights into human behavior by applying intensive analyses of different levels of human values. Additionally, we conduct experiments that integrate extracted emotion features to improve human value detection tasks. This approach holds the potential to provide fresh insights into the complex interactions between emotions and values within discussions, offering a deeper understanding of human behavior and decision making. Uncovering these emotions is crucial for comprehending the characteristics that underlie various values through data-driven analyses. Our experiment results show improvement in the performance of human value detection tasks in many categories.

Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach.

Mnassri, Khouloud; Farahbakhsh, Reza; Crespi, Noel.

Entropy (Basel) ; 26(4)2024 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-38667898

RESUMO

Social media platforms have surpassed cultural and linguistic boundaries, thus enabling online communication worldwide. However, the expanded use of various languages has intensified the challenge of online detection of hate speech content. Despite the release of multiple Natural Language Processing (NLP) solutions implementing cutting-edge machine learning techniques, the scarcity of data, especially labeled data, remains a considerable obstacle, which further requires the use of semisupervised approaches along with Generative Artificial Intelligence (Generative AI) techniques. This paper introduces an innovative approach, a multilingual semisupervised model combining Generative Adversarial Networks (GANs) and Pretrained Language Models (PLMs), more precisely mBERT and XLM-RoBERTa. Our approach proves its effectiveness in the detection of hate speech and offensive language in Indo-European languages (in English, German, and Hindi) when employing only 20% annotated data from the HASOC2019 dataset, thereby presenting significantly high performances in each of multilingual, zero-shot crosslingual, and monolingual training scenarios. Our study provides a robust mBERT-based semisupervised GAN model (SS-GAN-mBERT) that outperformed the XLM-RoBERTa-based model (SS-GAN-XLM) and reached an average F1 score boost of 9.23% and an accuracy increase of 5.75% over the baseline semisupervised mBERT model.

Deep Learning for Smart Healthcare-A Survey on Brain Tumor Detection from Medical Imaging.

Arabahmadi, Mahsa; Farahbakhsh, Reza; Rezazadeh, Javad.

Sensors (Basel) ; 22(5)2022 Mar 02.

Artigo em Inglês | MEDLINE | ID: mdl-35271115

RESUMO

Advances in technology have been able to affect all aspects of human life. For example, the use of technology in medicine has made significant contributions to human society. In this article, we focus on technology assistance for one of the most common and deadly diseases to exist, which is brain tumors. Every year, many people die due to brain tumors; based on "braintumor" website estimation in the U.S., about 700,000 people have primary brain tumors, and about 85,000 people are added to this estimation every year. To solve this problem, artificial intelligence has come to the aid of medicine and humans. Magnetic resonance imaging (MRI) is the most common method to diagnose brain tumors. Additionally, MRI is commonly used in medical imaging and image processing to diagnose dissimilarity in different parts of the body. In this study, we conducted a comprehensive review on the existing efforts for applying different types of deep learning methods on the MRI data and determined the existing challenges in the domain followed by potential future directions. One of the branches of deep learning that has been very successful in processing medical images is CNN. Therefore, in this survey, various architectures of CNN were reviewed with a focus on the processing of medical images, especially brain MRI images.

Assuntos

Neoplasias Encefálicas , Aprendizado Profundo , Inteligência Artificial , Encéfalo , Neoplasias Encefálicas/diagnóstico por imagem , Atenção à Saúde , Humanos , Imageamento por Ressonância Magnética/métodos

GANBOT: a GAN-based framework for social bot detection.

Najari, Shaghayegh; Salehi, Mostafa; Farahbakhsh, Reza.

Soc Netw Anal Min ; 12(1): 4, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34804252

RESUMO

Nowadays, a massive number of people are involved in various social media. This fact enables organizations and institutions to more easily access their audiences across the globe. Some of them use social bots as an automatic entity to gain intangible access and influence on their users by faster content propagation. Thereby, malicious social bots are populating more and more to fool humans with their unrealistic behavior and content. Hence, that's necessary to distinguish these fake social accounts from real ones. Multiple approaches have been investigated in the literature to answer this problem. Statistical machine learning methods are one of them focusing on handcrafted features to represent characteristics of social bots. Although they reached successful results in some cases, they relied on the bot's behavior and failed in the behavioral change patterns of bots. On the other hands, more advanced deep neural network-based methods aim to overcome this limitation. Generative adversarial network (GAN) as new technology from this domain is a semi-supervised method that demonstrates to extract the behavioral pattern of the data. In this work, we use GAN to leak more information of bot samples for state-of-the-art textual bot detection method (Contextual LSTM). Although GAN augments low labeled data, original textual GAN (Sequence Generative Adversarial Net (SeqGAN)) has the known limitation of convergence. In this paper, we invested this limitation and customized the GAN idea in a new framework called GANBOT, in which the generator and classifier connect by an LSTM layer as a shared channel between them. Our experimental results on a bench-marked dataset of Twitter social bot show our proposed framework outperforms the existing contextual LSTM method by increasing bot detection probabilities.

Hate speech detection and racial bias mitigation in social media based on BERT model.

Mozafari, Marzieh; Farahbakhsh, Reza; Crespi, Noël.

PLoS One ; 15(8): e0237861, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32853205

RESUMO

Disparate biases associated with datasets and trained classifiers in hateful and abusive content identification tasks have raised many concerns recently. Although the problem of biased datasets on abusive language detection has been addressed more frequently, biases arising from trained classifiers have not yet been a matter of concern. In this paper, we first introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers) and evaluate the proposed model on two publicly available datasets that have been annotated for racism, sexism, hate or offensive content on Twitter. Next, we introduce a bias alleviation mechanism to mitigate the effect of bias in training set during the fine-tuning of our pre-trained BERT-based model for hate speech detection. Toward that end, we use an existing regularization method to reweight input samples, thereby decreasing the effects of high correlated training set' s n-grams with class labels, and then fine-tune our pre-trained BERT-based model with the new re-weighted samples. To evaluate our bias alleviation mechanism, we employed a cross-domain approach in which we use the trained classifiers on the aforementioned datasets to predict the labels of two new datasets from Twitter, AAE-aligned and White-aligned groups, which indicate tweets written in African-American English (AAE) and Standard American English (SAE), respectively. The results show the existence of systematic racial bias in trained classifiers, as they tend to assign tweets written in AAE from AAE-aligned group to negative classes such as racism, sexism, hate, and offensive more often than tweets written in SAE from White-aligned group. However, the racial bias in our classifiers reduces significantly after our bias alleviation mechanism is incorporated. This work could institute the first step towards debiasing hate speech and abusive language detection systems.

Assuntos

Ódio , Modelos Teóricos , Racismo , Mídias Sociais , Fala , Algoritmos , Bases de Dados como Assunto , Aprendizado Profundo , Feminino , Humanos , Masculino , Sexismo

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA