Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification.

Gopalapillai, Radhakrishnan; Gupta, Deepa; Zakariah, Mohammed; Alotaibi, Yousef Ajami

Gopalapillai, Radhakrishnan; Gupta, Deepa; Zakariah, Mohammed; Alotaibi, Yousef Ajami.

Affiliation

Gopalapillai R; Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India.
Gupta D; Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India.
Zakariah M; Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11543, Saudi Arabia.
Alotaibi YA; Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11543, Saudi Arabia.

Sensors (Basel) ; 21(23)2021 Nov 28.

Article in En | MEDLINE | ID: mdl-34883955

ABSTRACT

ABSTRACT

Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel's local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.

Subject(s)

Machine Learning; Neural Networks, Computer; Data Collection

Key words

RGB-D images; depth encoding; multimodal learning; scene classification; transfer learning

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Neural Networks, Computer / Machine Learning Type of study: Prognostic_studies Language: En Journal: Sensors (Basel) Year: 2021 Document type: Article Affiliation country: India

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google