Pesquisa | Portal Regional da BVS

The design, construction and evaluation of annotated Arabic cyberbullying corpus.

Shannag, Fatima; Hammo, Bassam H; Faris, Hossam.

Educ Inf Technol (Dordr) ; 27(8): 10977-11023, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35502160

RESUMO

Cyberbullying (CB) is classified as one of the severe misconducts on social media. Many CB detection systems have been developed for many natural languages to face this phenomenon. However, Arabic is one of the under-resourced languages suffering from the lack of quality datasets in many computational research areas. This paper discusses the design, construction, and evaluation of a multi-dialect, annotated Arabic Cyberbullying Corpus (ArCybC), a valuable resource for Arabic CB detection and motivation for future research directions in Arabic Natural Language Processing (NLP). The study describes the phases of ArCybC compilation. By way of illustration, it explores the corpus to discover strategies used in rendering Arabic CB tweets pulled from four Twitter groups, including gaming, sports, news, and celebrities. Based on thorough analysis, we discovered that these groups were the most susceptible to harassment and cyberbullying. The collected tweets were filtered based on a compiled harassment lexicon, which contains a list of multi-dialectical profane words in Arabic compiled from four categories: sexual, racial, physical appearance, and intelligence. To annotate ArCybC, we asked five annotators to classify 4,505 tweets into two classes manually: Offensive/non-Offensive and CB/non-CB. We conducted a rigorous comparison of different machine learning approaches applied on ArCybC to detect Arabic CB using two language models: bag-of-words (BoW) and word embedding. The experiments showed that Support Vector Machine (SVM) with word embedding achieved an accuracy rate of 86.3% and an F1-score rate of 85%. The main challenges encountered during the ArCybC construction were the scarcity of freely available Arabic CB texts and the deficiency of annotating the texts.

Identifying ß-thalassemia carriers using a data mining approach: The case of the Gaza Strip, Palestine.

AlAgha, Alaa S; Faris, Hossam; Hammo, Bassam H; Al-Zoubi, Ala' M.

Artif Intell Med ; 88: 70-83, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29730048

RESUMO

Thalassemia is considered one of the most common genetic blood disorders that has received excessive attention in the medical research fields worldwide. Under this context, one of the greatest challenges for healthcare professionals is to correctly differentiate normal individuals from asymptomatic thalassemia carriers. Usually, thalassemia diagnosis is based on certain measurable characteristic changes to blood cell counts and related indices. These characteristic changes can be derived easily when performing a complete blood count test (CBC) using a special fully automated blood analyzer or counter. However, the reliability of the CBC test alone is questionable with possible candidate characteristics that could be seen in other disorders, leading to misdiagnosis of thalassemia. Therefore, other costly and time-consuming tests should be performed that may cause serious consequences due to the delay in the correct diagnosis. To help overcoming these challenging diagnostic issues, this work presents a new novel dataset collected from Palestine Avenir Foundation for persons tested for thalassemia. We aim to compile a gold standard dataset for thalassemia and make it available for researchers in this field. Moreover, we use this dataset to predict the specific type of thalassemia known as beta thalassemia (ß-thalassemia) based on hybrid data mining model. The proposed model consists of two main steps. First, to overcome the problem of the highly imbalanced class distribution in the dataset, a balancing technique called SMOTE is proposed and applied to handle this problem. In the second step, four classification models, namely k-nearest neighbors (k-NN), naïve Bayesian (NB), decision tree (DT) and the multilayer perceptron (MLP) neural network are used to differentiate between normal persons and those patients carrying ß-thalassemia. Different evaluation metrics are used to assess the performance of the proposed model. The experimental results show that the SMOTE oversampling method can effectively improve the identification ratio of ß-thalassemia carriers in a highly imbalanced class distribution. The results reveal also that the NB classifier achieved the best performance in differentiating between normal and ß-thalassemia carriers at oversampling SMOTE ratio of 400%. This combination shows a specificity of 99.47% and a sensitivity of 98.81%.

Assuntos

Mineração de Dados/métodos , Triagem de Portadores Genéticos/métodos , Heterozigoto , Redes Neurais de Computação , Talassemia beta/diagnóstico , Doenças Assintomáticas , Teorema de Bayes , Biomarcadores/sangue , Bases de Dados Factuais , Árvores de Decisões , Índices de Eritrócitos , Hemoglobinas/análise , Hemoglobinas/genética , Humanos , Oriente Médio , Fenótipo , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Talassemia beta/sangue , Talassemia beta/classificação , Talassemia beta/genética

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA