Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion.

Xia, Zhongyi; Wu, Tianzhao; Wang, Zhuoyan; Zhou, Man; Wu, Boqi; Chan, C Y; Kong, Ling Bing

Xia, Zhongyi; Wu, Tianzhao; Wang, Zhuoyan; Zhou, Man; Wu, Boqi; Chan, C Y; Kong, Ling Bing.

Afiliação

Xia Z; College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China.
Wu T; College of Applied Technology, Shenzhen University, Shenzhen, 518000, Guangdong, China.
Wang Z; College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China.
Zhou M; College of Applied Technology, Shenzhen University, Shenzhen, 518000, Guangdong, China.
Wu B; College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China.
Chan CY; College of Applied Technology, Shenzhen University, Shenzhen, 518000, Guangdong, China.
Kong LB; College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, Guangdong, China.

Sci Rep ; 14(1): 7037, 2024 Mar 25.

Article em En | MEDLINE | ID: mdl-38528098

ABSTRACT

ABSTRACT

Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.

Palavras-chave

Deep learning; Depth estimation; Loss function; SPT-depth; Stereoscopic display; Transformer

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Sci Rep Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China País de publicação: ENGLAND / ESCOCIA / GB / GREAT BRITAIN / INGLATERRA / REINO UNIDO / SCOTLAND / UK / UNITED KINGDOM

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google