Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach.

Mohammed Almansor, Mohammed Abbas; Zhang, Chongfu; Khan, Wasiq; Hussain, Abir; Alhusaini, Naji

Mohammed Almansor, Mohammed Abbas; Zhang, Chongfu; Khan, Wasiq; Hussain, Abir; Alhusaini, Naji.

Afiliação

Mohammed Almansor MA; School of Information and Communication Engineering, Zhongshan Institute, University of Electronic Science and Technology of China, Chengdu 611731, China.
Zhang C; School of Information and Communication Engineering, Zhongshan Institute, University of Electronic Science and Technology of China, Chengdu 611731, China.
Khan W; School of Electronic Information, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan 528402, China.
Hussain A; Department of Computer Science, Liverpool John Moores University, Liverpool L33AF, UK.
Alhusaini N; Department of Computer Science, Liverpool John Moores University, Liverpool L33AF, UK.

Sensors (Basel) ; 20(18)2020 Sep 15.

Article em En | MEDLINE | ID: mdl-32942721

ABSTRACT

ABSTRACT

The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models.

Assuntos

Algoritmos; Abelhas; Idioma; Aprendizado de Máquina; Animais; Análise por Conglomerados; Aprendizado de Máquina Supervisionado

Palavras-chave

cross-lingual sentiment analysis; multi-graph semi-supervised learning; sample selection

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Abelhas / Algoritmos / Aprendizado de Máquina / Idioma Limite: Animals Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google