Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts.

Bashar, Md Abul; Nayak, Richi; Luong, Khanh; Balasubramaniam, Thirunavukarasu

Bashar, Md Abul; Nayak, Richi; Luong, Khanh; Balasubramaniam, Thirunavukarasu.

Affiliation

Bashar MA; School of Computer Science and Centre for Data Science, Queensland University of Technology, 2 George St, Brisbane City, QLD 4000 Australia.
Nayak R; School of Computer Science and Centre for Data Science, Queensland University of Technology, 2 George St, Brisbane City, QLD 4000 Australia.
Luong K; School of Computer Science and Centre for Data Science, Queensland University of Technology, 2 George St, Brisbane City, QLD 4000 Australia.
Balasubramaniam T; School of Computer Science and Centre for Data Science, Queensland University of Technology, 2 George St, Brisbane City, QLD 4000 Australia.

Soc Netw Anal Min ; 11(1): 69, 2021.

Article in En | MEDLINE | ID: mdl-34341673

ABSTRACT

In this world of information and experience era, microblogging sites have been commonly used to express people feelings including fear, panic, hate and abuse. Monitoring and control of abuse on social media, especially during pandemics such as COVID-19, can help in keeping the public sentiment and morale positive. Developing the fear and hate detection methods based on machine learning requires labelled data. However, obtaining the labelled data in suddenly changed circumstances as a pandemic is expensive and acquiring them in a short time is impractical. Related labelled hate data from other domains or previous incidents may be available. However, the predictive accuracy of these hate detection models decreases significantly if the data distribution of the target domain, where the prediction will be applied, is different. To address this problem, we propose a novel concept of unsupervised progressive domain adaptation based on a deep-learning language model generated through multiple text datasets. We showcase the efficacy of the proposed method in hate speech and fear detection on the tweets collection during COVID-19 where the labelled information is unavailable.

Key words

Domain adaptation; Fear prediction; Hate speech; Small dataset; Text mining

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies Language: En Journal: Soc Netw Anal Min Year: 2021 Document type: Article Country of publication: Germany

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google