Automatic summarization model based on clustering algorithm.

Dai, Wenzhuo; He, Qing

Dai, Wenzhuo; He, Qing.

Affiliation

Dai W; College of Big Data and Information Engineering, Guizhou University Guiyang, Room 421, Chongli Building, West Campus of Guizhou University, Jiaxiu South Road, Huaxi District, Guiyang City, 550025, Guizhou Province, People's Republic of China.
He Q; College of Big Data and Information Engineering, Guizhou University Guiyang, Room 421, Chongli Building, West Campus of Guizhou University, Jiaxiu South Road, Huaxi District, Guiyang City, 550025, Guizhou Province, People's Republic of China. qhe@gzu.edu.cn.

Sci Rep ; 14(1): 15302, 2024 Jul 03.

Article in En | MEDLINE | ID: mdl-38961244

ABSTRACT

ABSTRACT

Extractive document summary is usually seen as a sequence labeling task, which the summary is formulated by sentences from the original document. However, the selected sentences usually are high redundancy in semantic space, so that the composed summary are high semantic redundancy. To alleviate this problem, we propose a model to reduce the semantic redundancy of summary by introducing the cluster algorithm to select difference sentences in semantic space and we improve the base BERT to score sentences. We evaluate our model and perform significance testing using ROUGE on the CNN/DailyMail datasets compare with six baselines, which include two traditional methods and four state-of-art deep learning model. The results validate the effectiveness of our approach, which leverages K-means algorithm to produce more accurate and less repeat sentences in semantic summaries.

Key words

Cluster algorithm; EDS; Semantic space

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Sci Rep / Sci. rep. (Nat. Publ. Group) / Scientific reports (Nature Publishing Group) Year: 2024 Document type: Article Country of publication: Reino Unido

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google