Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers.

Liu, Yi; Xu, Xintao; Xiang, Bajian; Chen, Gang; Gong, Guoliang; Lu, Huaxiang

Liu, Yi; Xu, Xintao; Xiang, Bajian; Chen, Gang; Gong, Guoliang; Lu, Huaxiang.

Afiliação

Liu Y; Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China.
Xu X; University of Chinese Academy of Sciences, Beijing 100089, China.
Xiang B; Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China.
Chen G; School of Microelectronics, University of Science and Technology of China, Hefei 230026, China.
Gong G; Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China.
Lu H; University of Chinese Academy of Sciences, Beijing 100089, China.

Sensors (Basel) ; 22(19)2022 Oct 06.

Article em En | MEDLINE | ID: mdl-36236675

RESUMO

The depth estimation algorithm based on the convolutional neural network has many limitations and defects by constructing matching cost volume to calculate the disparity: using a limited disparity range, the authentic disparity beyond the predetermined range can not be acquired; Besides, the matching process lacks constraints on occlusion and matching uniqueness; Also, as a local feature extractor, a convolutional neural network lacks the ability of global context information perception. Aiming at the problems in the matching method of constructing matching cost volume, we propose a disparity prediction algorithm based on Transformer, which specifically comprises the Swin-SPP module for feature extraction based on Swin Transformer, Transformer disparity matching network based on self-attention and cross-attention mechanism, and occlusion prediction sub-network. In addition, we propose a double skip connection fully connected layer to solve the problems of gradient vanishing and explosion during the training process for the Transformer model, thus further enhancing inference accuracy. The proposed model in this paper achieved an EPE (Absolute error) of 0.57 and 0.61, and a 3PE (Percentage error greater than 3 px) of 1.74% and 1.56% on KITTI 2012 and KITTI 2015 datasets, respectively, with an inference time of 0.46 s and parameters as low as only 2.6 M, showing great advantages compared with other algorithms in various evaluation metrics.

Assuntos

Redes Neurais de Computação; Disparidade Visual; Algoritmos

Palavras-chave

attention; binocular disparity; transformer

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Disparidade Visual / Redes Neurais de Computação Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Sensors (Basel) Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google