Long text feature extraction network with data augmentation.

Tang, Changhao; Ma, Kun; Cui, Benkuan; Ji, Ke; Abraham, Ajith

Tang, Changhao; Ma, Kun; Cui, Benkuan; Ji, Ke; Abraham, Ajith.

Afiliação

Tang C; Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022 China.
Ma K; Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022 China.
Cui B; Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022 China.
Ji K; Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022 China.
Abraham A; Machine Intelligence Research Labs, Scientific Network for Innovation and Research Excellence, Auburn, USA.

Appl Intell (Dordr) ; 52(15): 17652-17667, 2022.

Article em En | MEDLINE | ID: mdl-35400845

RESUMO

The spread of COVID-19 has had a serious impact on either work or the lives of people. With the decrease in physical social contacts and the rise of anxiety on the pandemic, social media has become the primary approach for people to access information related to COVID-19. Social media is rife with rumors and fake news, causing great damage to the Society. Facing shortages, imbalance, and nosiness, the current Chinese data set related to the epidemic has not helped the detection of fake news. Besides, the accuracy of classification was also affected by the easy loss of edge characteristics in long text data. In this paper, long text feature extraction network with data augmentation (LTFE) was proposed, which improves the learning performance of the classifier by optimizing the data feature structure. In the stage of encoding, Twice-Masked Language Modeling for Fine-tuning (TMLM-F) and Data Alignment that Preserves Edge Characteristics (DA-PEC) was proposed to extract the classification features of the Chinese Dataset. Between the TMLM-F and DA-PEC processes, we use Attention to capture the dependencies between words and generate corresponding vector representations. The experimental results illustrate that this method is effective for the detection of Chinese fake news pertinent to the pandemic.

Palavras-chave

COVID-19; Data augmentation; Fake news; Long text; Social media

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article