Search | VHL Regional Portal

Enhancing machine learning-based sentiment analysis through feature extraction techniques.

A Semary, Noura; Ahmed, Wesam; Amin, Khalid; Plawiak, Pawel; Hammad, Mohamed.

PLoS One ; 19(2): e0294968, 2024.

Article in English | MEDLINE | ID: mdl-38354193

ABSTRACT

A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model's performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.

Subject(s)

Algorithms , Sentiment Analysis , Humans , Machine Learning

Improving sentiment classification using a RoBERTa-based hybrid model.

Semary, Noura A; Ahmed, Wesam; Amin, Khalid; Plawiak, Pawel; Hammad, Mohamed.

Front Hum Neurosci ; 17: 1292010, 2023.

Article in English | MEDLINE | ID: mdl-38130432

ABSTRACT

Introduction: Several attempts have been made to enhance text-based sentiment analysis's performance. The classifiers and word embedding models have been among the most prominent attempts. This work aims to develop a hybrid deep learning approach that combines the advantages of transformer models and sequence models with the elimination of sequence models' shortcomings. Methods: In this paper, we present a hybrid model based on the transformer model and deep learning models to enhance sentiment classification process. Robustly optimized BERT (RoBERTa) was selected for the representative vectors of the input sentences and the Long Short-Term Memory (LSTM) model in conjunction with the Convolutional Neural Networks (CNN) model was used to improve the suggested model's ability to comprehend the semantics and context of each input sentence. We tested the proposed model with two datasets with different topics. The first dataset is a Twitter review of US airlines and the second is the IMDb movie reviews dataset. We propose using word embeddings in conjunction with the SMOTE technique to overcome the challenge of imbalanced classes of the Twitter dataset. Results: With an accuracy of 96.28% on the IMDb reviews dataset and 94.2% on the Twitter reviews dataset, the hybrid model that has been suggested outperforms the standard methods. Discussion: It is clear from these results that the proposed hybrid RoBERTa-(CNN+ LSTM) method is an effective model in sentiment classification.

Efficient Convolutional Neural Network-Based Keystroke Dynamics for Boosting User Authentication.

AbdelRaouf, Hussien; Chelloug, Samia Allaoua; Muthanna, Ammar; Semary, Noura; Amin, Khalid; Ibrahim, Mina.

Sensors (Basel) ; 23(10)2023 May 19.

Article in English | MEDLINE | ID: mdl-37430812

ABSTRACT

The safeguarding of online services and prevention of unauthorized access by hackers rely heavily on user authentication, which is considered a crucial aspect of security. Currently, multi-factor authentication is used by enterprises to enhance security by integrating multiple verification methods rather than relying on a single method of authentication, which is considered less secure. Keystroke dynamics is a behavioral characteristic used to evaluate an individual's typing patterns to verify their legitimacy. This technique is preferred because the acquisition of such data is a simple process that does not require any additional user effort or equipment during the authentication process. This study proposes an optimized convolutional neural network that is designed to extract improved features by utilizing data synthesization and quantile transformation to maximize results. Additionally, an ensemble learning technique is used as the main algorithm for the training and testing phases. A publicly available benchmark dataset from Carnegie Mellon University (CMU) was utilized to evaluate the proposed method, achieving an average accuracy of 99.95%, an average equal error rate (EER) of 0.65%, and an average area under the curve (AUC) of 99.99%, surpassing recent advancements made on the CMU dataset.

Thermogram breast cancer prediction approach based on Neutrosophic sets and fuzzy c-means algorithm.

Gaber, Tarek; Ismail, Gehad; Anter, Ahmed; Soliman, Mona; Ali, Mona; Semary, Noura; Hassanien, Aboul Ella; Snasel, Vaclav.

Annu Int Conf IEEE Eng Med Biol Soc ; 2015: 4254-7, 2015 Aug.

Article in English | MEDLINE | ID: mdl-26737234

ABSTRACT

The early detection of breast cancer makes many women survive. In this paper, a CAD system classifying breast cancer thermograms to normal and abnormal is proposed. This approach consists of two main phases: automatic segmentation and classification. For the former phase, an improved segmentation approach based on both Neutrosophic sets (NS) and optimized Fast Fuzzy c-mean (F-FCM) algorithm was proposed. Also, post-segmentation process was suggested to segment breast parenchyma (i.e. ROI) from thermogram images. For the classification, different kernel functions of the Support Vector Machine (SVM) were used to classify breast parenchyma into normal or abnormal cases. Using benchmark database, the proposed CAD system was evaluated based on precision, recall, and accuracy as well as a comparison with related work. The experimental results showed that our system would be a very promising step toward automatic diagnosis of breast cancer using thermograms as the accuracy reached 100%.

Subject(s)

Breast Neoplasms , Algorithms , Databases, Factual , Female , Fuzzy Logic , Humans , Support Vector Machine

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL