Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Sensors (Basel) ; 24(14)2024 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-39066133

RESUMO

Cognitive scientists believe that adaptable intelligent agents like humans perform spatial reasoning tasks by learned causal mental simulation. The problem of learning these simulations is called predictive world modeling. We present the first framework for a learning open-vocabulary predictive world model (OV-PWM) from sensor observations. The model is implemented through a hierarchical variational autoencoder (HVAE) capable of predicting diverse and accurate fully observed environments from accumulated partial observations. We show that the OV-PWM can model high-dimensional embedding maps of latent compositional embeddings representing sets of overlapping semantics inferable by sufficient similarity inference. The OV-PWM simplifies the prior two-stage closed-set PWM approach to the single-stage end-to-end learning method. CARLA simulator experiments show that the OV-PWM can learn compact latent representations and generate diverse and accurate worlds with fine details like road markings, achieving 69 mIoU over six query semantics on an urban evaluation sequence. We propose the OV-PWM as a versatile continual learning paradigm for providing spatio-semantic memory and learned internal simulation capabilities to future general-purpose mobile robots.

2.
Sensors (Basel) ; 24(9)2024 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-38732800

RESUMO

Transformer-based models have gained popularity in the field of natural language processing (NLP) and are extensively utilized in computer vision tasks and multi-modal models such as GPT4. This paper presents a novel method to enhance the explainability of transformer-based image classification models. Our method aims to improve trust in classification results and empower users to gain a deeper understanding of the model for downstream tasks by providing visualizations of class-specific maps. We introduce two modules: the "Relationship Weighted Out" and the "Cut" modules. The "Relationship Weighted Out" module focuses on extracting class-specific information from intermediate layers, enabling us to highlight relevant features. Additionally, the "Cut" module performs fine-grained feature decomposition, taking into account factors such as position, texture, and color. By integrating these modules, we generate dense class-specific visual explainability maps. We validate our method with extensive qualitative and quantitative experiments on the ImageNet dataset. Furthermore, we conduct a large number of experiments on the LRN dataset, which is specifically designed for automatic driving danger alerts, to evaluate the explainability of our method in scenarios with complex backgrounds. The results demonstrate a significant improvement over previous methods. Moreover, we conduct ablation experiments to validate the effectiveness of each module. Through these experiments, we are able to confirm the respective contributions of each module, thus solidifying the overall effectiveness of our proposed approach.

3.
Sensors (Basel) ; 23(20)2023 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-37896492

RESUMO

In the field of intelligent vehicle technology, there is a high dependence on images captured under challenging conditions to develop robust perception algorithms. However, acquiring these images can be both time-consuming and dangerous. To address this issue, unpaired image-to-image translation models offer a solution by synthesizing samples of the desired domain, thus eliminating the reliance on ground truth supervision. However, the current methods predominantly focus on single projections rather than multiple solutions, not to mention controlling the direction of generation, which creates a scope for enhancement. In this study, we propose a generative adversarial network (GAN)-based model, which incorporates both a style encoder and a content encoder, specifically designed to extract relevant information from an image. Further, we employ a decoder to reconstruct an image using these encoded features, while ensuring that the generated output remains within a permissible range by applying a self-regression module to constrain the style latent space. By modifying the hyperparameters, we can generate controllable outputs with specific style codes. We evaluate the performance of our model by generating snow scenes on the Cityscapes and the EuroCity Persons datasets. The results reveal the effectiveness of our proposed methodology, thereby reinforcing the benefits of our approach in the ongoing evolution of intelligent vehicle technology.

4.
Sensors (Basel) ; 23(3)2023 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-36772588

RESUMO

Weather variation in the distribution of image data can cause a decline in the performance of existing visual algorithms during evaluation. Adding additional samples of target domain to training data or using pre-trained image restoration methods such as de-hazing, de-raining, and de-snowing, to improve the quality of input images are two promising solutions. In this work, we propose Multiple Weather Translation GAN (MWTG), a CycleGAN-based, dual-purpose framework that simultaneously learns weather generation and its removal from image data. MWTG consists of four GANs constrained using cycle consistency that carry out domain translation tasks between hazy, rainy, snowy, and clear weather, using an asymmetric approach. To increase network capacity, we employ a spatial feature transform (SFT) layer to fuse the features extracted from the weather layer, which contains high-level domain information from the previous generators. Further, we collect an unpaired, real-world driving dataset recorded under various weather conditions called Realistic Driving Scenes under Bad Weather (RDSBW). We qualitatively and quantitatively evaluate MWTG using the RDSBW and the variation of Cityscapes that synthesize weather effects, eg., FoggyCityscape. Our experimental results suggest that MWTG can generate realistic weather in clear images and also accurately remove noise from weather images. Furthermore, the SOTA pedestrian detector ASCP is shown to achieve an impressive gain in detection precision after image restoration using the proposed MWTG method.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa