Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks.

Ibrahem, Hatem; Salem, Ahmed; Kang, Hyun-Soo

Ibrahem, Hatem; Salem, Ahmed; Kang, Hyun-Soo.

Afiliação

Ibrahem H; Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Korea.
Salem A; Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Korea.
Kang HS; Electrical Engineering Department, Faculty of Engineering, Assiut University, Assiut 71515, Egypt.

Sensors (Basel) ; 22(21)2022 Oct 29.

Article em En | MEDLINE | ID: mdl-36366007

ABSTRACT

ABSTRACT

In this paper, we revisit the paired image-to-image translation using the conditional generative adversarial network, the so-called "Pix2Pix", and propose efficient optimization techniques for the architecture and the training method to maximize the architecture's performance to boost the realism of the generated images. We propose a generative adversarial network-based technique to create new artificial indoor scenes using a user-defined semantic segmentation map as an input to define the location, shape, and category of each object in the scene, exactly similar to Pix2Pix. We train different residual connections-based architectures of the generator and discriminator on the NYU depth-v2 dataset and a selected indoor subset from the ADE20K dataset, showing that the proposed models have fewer parameters, less computational complexity, and can generate better quality images than the state of the art methods following the same technique to generate realistic indoor images. We also prove that using extra specific labels and more training samples increases the quality of the generated images; however, the proposed residual connections-based models can learn better from small datasets (i.e., NYU depth-v2) and can improve the realism of the generated images in training on bigger datasets (i.e., ADE20K indoor subset) in comparison to Pix2Pix. The proposed method achieves an LPIPS value of 0.505 and an FID value of 81.067, generating better quality images than that produced by Pix2Pix and other recent paired Image-to-image translation methods and outperforming them in terms of LPIPS and FID.

Palavras-chave

convolutional neural networks; generative adversarial networks; image-to-image translation; semantic image synthesis

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Sensors (Basel) Ano de publicação: 2022 Tipo de documento: Article