Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle.

Cui, Xue-Zhi; Feng, Quan; Wang, Shu-Zhi; Zhang, Jian-Hua

Cui, Xue-Zhi; Feng, Quan; Wang, Shu-Zhi; Zhang, Jian-Hua.

Afiliação

Cui XZ; School of Mechanical and Electrical Engineering, Gansu Agriculture University, Lanzhou 730070, China.
Feng Q; School of Mechanical and Electrical Engineering, Gansu Agriculture University, Lanzhou 730070, China.
Wang SZ; College of Electrical Engineering, Northwest University for Nationalities, Lanzhou 730030, China.
Zhang JH; Agricultural Information Institute of CAAS, Beijing 100081, China.

Sensors (Basel) ; 22(3)2022 Jan 18.

Article em En | MEDLINE | ID: mdl-35161463

ABSTRACT

ABSTRACT

To find an economical solution to infer the depth of the surrounding environment of unmanned agricultural vehicles (UAV), a lightweight depth estimation model called MonoDA based on a convolutional neural network is proposed. A series of sequential frames from monocular videos are used to train the model. The model is composed of two subnetworks-the depth estimation subnetwork and the pose estimation subnetwork. The former is a modified version of U-Net that reduces the number of bridges, while the latter takes EfficientNet-B0 as its backbone network to extract the features of sequential frames and predict the pose transformation relations between the frames. The self-supervised strategy is adopted during the training, which means the depth information labels of frames are not needed. Instead, the adjacent frames in the image sequence and the reprojection relation of the pose are used to train the model. Subnetworks' outputs (depth map and pose relation) are used to reconstruct the input frame, then a self-supervised loss between the reconstructed input and the original input is calculated. Finally, the loss is employed to update the parameters of the two subnetworks through the backward pass. Several experiments are conducted to evaluate the model's performance, and the results show that MonoDA has competitive accuracy over the KITTI raw dataset as well as our vineyard dataset. Besides, our method also possessed the advantage of non-sensitivity to color. On the computing platform of our UAV's environment perceptual system NVIDIA JETSON TX2, the model could run at 18.92 FPS. To sum up, our approach provides an economical solution for depth estimation by using monocular cameras, which achieves a good trade-off between accuracy and speed and can be used as a novel auxiliary depth detection paradigm for UAVs.

Assuntos

Redes Neurais de Computação; Aprendizado de Máquina Supervisionado; Fazendas

Palavras-chave

edge computing device; monocular depth estimation; self-supervised learning; vineyard scene

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação / Aprendizado de Máquina Supervisionado Tipo de estudo: Prognostic_studies Idioma: En Revista: Sensors (Basel) Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google