RS-Dseg: semantic segmentation of high-resolution remote sensing images based on a diffusion model component with unsupervised pretraining.

Luo, Zheng; Pan, Jianping; Hu, Yong; Deng, Lin; Li, Yimeng; Qi, Chen; Wang, Xunxun

Luo, Zheng; Pan, Jianping; Hu, Yong; Deng, Lin; Li, Yimeng; Qi, Chen; Wang, Xunxun.

Affiliation

Luo Z; College of Smart City, Chongqing Jiaotong University, Chongqing, 402247, China.
Pan J; College of Smart City, Chongqing Jiaotong University, Chongqing, 402247, China. panjianping@cqjtu.edu.cn.
Hu Y; Key Laboratory of Monitoring, Assessment and Early Warning of Land Spatial Planning, Ministry of Natural Resources, Chongqing, 401147, China. panjianping@cqjtu.edu.cn.
Deng L; Technology Innovation Center for Spatio-temporal Information and Equipment of Intelligent City, Ministry of Natural Resources, Chongqing, 401120, China. panjianping@cqjtu.edu.cn.
Li Y; Chongqing Institute of Surveying and Monitoring for Planning and Natural Resources, Chongqing, 400121, China.
Qi C; Chongqing Institute of Surveying and Monitoring for Planning and Natural Resources, Chongqing, 400121, China.
Wang X; College of Smart City, Chongqing Jiaotong University, Chongqing, 402247, China.

Sci Rep ; 14(1): 18609, 2024 Aug 10.

Article in En | MEDLINE | ID: mdl-39127805

ABSTRACT

ABSTRACT

Semantic segmentation plays a crucial role in interpreting remote sensing images, especially in high-resolution scenarios where finer object details, complex spatial information and texture structures exist. To address the challenge of better extracting semantic information and ad-dressing class imbalance in multiclass segmentation, we propose utilizing diffusion models for remote sensing image semantic segmentation, along with a lightweight classification module based on a spatial-channel attention mechanism. Our approach incorporates unsupervised pretrained components with a classification module to accelerate model convergence. The diffusion model component, built on the UNet architecture, effectively captures multiscale features with rich contextual and edge information from images. The lightweight classification module, which leverages spatial-channel attention, focuses more efficiently on spatial-channel regions with significant feature information. We evaluated our approach using three publicly available datasets Postdam, GID, and Five Billion Pixels. In the test of three datasets, our method achieved the best results. On the GID dataset, the overall accuracy was 96.99%, the mean IoU was 92.17%, and the mean F1 score was 95.83%. In the training phase, our model achieved good performance after only 30 training cycles. Compared with other models, our method reduces the number of parameters, improves the training speed, and has obvious performance advantages.

Key words

Attention mechanism; Diffusion models; Multiscale; Pretraining; Semantic segmentation

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Sci Rep Year: 2024 Document type: Article Affiliation country: China

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Sci Rep Year: 2024 Document type: Article Affiliation country: China