An efficient point cloud semantic segmentation network with multiscale super-patch transformer.

Miao, Yongwei; Sun, Yuliang; Zhang, Yimin; Wang, Jinrong; Zhang, Xudong

Miao, Yongwei; Sun, Yuliang; Zhang, Yimin; Wang, Jinrong; Zhang, Xudong.

Afiliación

Miao Y; School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China.
Sun Y; School of Information Science and Technology, Zhejiang Shuren University, Hangzhou, 310015, China.
Zhang Y; School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou, 310018, China.
Wang J; School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China.
Zhang X; School of Information Science and Technology, Zhejiang Shuren University, Hangzhou, 310015, China. xdzhang@zjsru.edu.cn.

Sci Rep ; 14(1): 14581, 2024 Jun 25.

Article en En | MEDLINE | ID: mdl-38918404

ABSTRACT

ABSTRACT

Efficient semantic segmentation of large-scale point cloud scenes is a fundamental and essential task for perception or understanding the surrounding 3d environments. However, due to the vast amount of point cloud data, it is always a challenging to train deep neural networks efficiently and also difficult to establish a unified model to represent different shapes effectively due to their variety and occlusions of scene objects. Taking scene super-patch as data representation and guided by its contextual information, we propose a novel multiscale super-patch transformer network (MSSPTNet) for point cloud segmentation, which consists of a multiscale super-patch local aggregation (MSSPLA) module and a super-patch transformer (SPT) module. Given large-scale point cloud data as input, a dynamic region-growing algorithm is first adopted to extract scene super-patches from the sampling points with consistent geometric features. Then, the MSSPLA module aggregates local features and their contextual information of adjacent super-patches at different scales. Owing to the self-attention mechanism, the SPT module exploits the similarity among scene super-patches in high-level feature space. By combining these two modules, our MSSPTNet can effectively learn both local and global features from the input point clouds. Finally, the interpolating upsampling and multi-layer perceptrons are exploited to generate semantic labels for the original point cloud data. Experimental results on the public S3DIS dataset demonstrate its efficiency of the proposed network for segmenting large-scale point cloud scenes, especially for those indoor scenes with a large number of repetitive structures, i.e., the network training of our MSSPTNet is much faster than other segmentation networks by a factor of tens to hundreds.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Sci Rep Año: 2024 Tipo del documento: Article País de afiliación: China