Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation.

Song, Kechen; Zhang, Yiming; Bao, Yanqi; Zhao, Ying; Yan, Yunhui

Song, Kechen; Zhang, Yiming; Bao, Yanqi; Zhao, Ying; Yan, Yunhui.

Afiliação

Song K; School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China.
Zhang Y; School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China.
Bao Y; National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China.
Zhao Y; School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China.
Yan Y; School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China.

Sensors (Basel) ; 23(14)2023 Jul 22.

Article em En | MEDLINE | ID: mdl-37514905

ABSTRACT

ABSTRACT

As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their input. The dense annotated large datasets are difficult to obtain, but the few-shot methods still can have satisfactory results with few pixel-annotated samples. Therefore, we propose the Visible-Depth-Thermal (three-modal) images few-shot semantic segmentation method. It utilizes the homogeneous information of three-modal images and the complementary information of different modal images, which can improve the performance of few-shot segmentation tasks. We constructed a novel indoor dataset VDT-2048-5i for the three-modal images few-shot semantic segmentation task. We also proposed a Self-Enhanced Mixed Attention Network (SEMANet), which consists of a Self-Enhanced module (SE) and a Mixed Attention module (MA). The SE module amplifies the difference between the different kinds of features and strengthens the weak connection for the foreground features. The MA module fuses the three-modal feature to obtain a better feature. Compared with the most advanced methods before, our model improves mIoU by 3.8% and 3.3% in 1-shot and 5-shot settings, respectively, which achieves state-of-the-art performance. In the future, we will solve failure cases by obtaining more discriminative and robust feature representations, and explore achieving high performance with fewer parameters and computational costs.

Palavras-chave

few-shot semantic segmentation; multi-modal images; three-modal registration

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links