A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion.

Li, Nianfeng; Wang, Zhenyan; Huang, Yongyuan; Tian, Jia; Li, Xinyuan; Xiao, Zhiguo

Li, Nianfeng; Wang, Zhenyan; Huang, Yongyuan; Tian, Jia; Li, Xinyuan; Xiao, Zhiguo.

Afiliação

Li N; College of Computer Science and Technology, Changchun University, No. 6543, Satellite Road, Changchun 130022, China.
Wang Z; College of Computer Science and Technology, Changchun University, No. 6543, Satellite Road, Changchun 130022, China.
Huang Y; College of Computer Science and Technology, Changchun University, No. 6543, Satellite Road, Changchun 130022, China.
Tian J; College of Computer Science and Technology, Changchun University, No. 6543, Satellite Road, Changchun 130022, China.
Li X; College of Computer Science and Technology, Changchun University, No. 6543, Satellite Road, Changchun 130022, China.
Xiao Z; College of Computer Science and Technology, Changchun University, No. 6543, Satellite Road, Changchun 130022, China.

Sensors (Basel) ; 24(12)2024 Jun 09.

Article em En | MEDLINE | ID: mdl-38931544

ABSTRACT

ABSTRACT

Scene text detection is an important research field in computer vision, playing a crucial role in various application scenarios. However, existing scene text detection methods often fail to achieve satisfactory results when faced with text instances of different sizes, shapes, and complex backgrounds. To address the challenge of detecting diverse texts in natural scenes, this paper proposes a multi-scale natural scene text detection method based on attention feature extraction and cascaded feature fusion. This method combines global and local attention through an improved attention feature fusion module (DSAF) to capture text features of different scales, enhancing the network's perception of text regions and improving its feature extraction capabilities. Simultaneously, an improved cascaded feature fusion module (PFFM) is used to fully integrate the extracted feature maps, expanding the receptive field of features and enriching the expressive ability of the feature maps. Finally, to address the cascaded feature maps, a lightweight subspace attention module (SAM) is introduced to partition the concatenated feature maps into several sub-space feature maps, facilitating spatial information interaction among features of different scales. In this paper, comparative experiments are conducted on the ICDAR2015, Total-Text, and MSRA-TD500 datasets, and comparisons are made with some existing scene text detection methods. The results show that the proposed method achieves good performance in terms of accuracy, recall, and F-score, thus verifying its effectiveness and practicality.

Palavras-chave

attention mechanism; cascaded feature fusion; deep learning; text detection

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article