Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38837926

RESUMO

Most deep learning approaches to comprehensive semantic modeling of 3D indoor spaces require costly dense annotations in the 3D domain. In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations. The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images, fusing cross-domain features into volumetric embeddings to predict complete 3D geometry, color, and semantics with only 2D labeling which can be either manual or machine-generated. Our key technical innovation is to leverage differentiable rendering of color and semantics to bridge 2D observations and unknown 3D space, using the observed RGB images and 2D semantics as supervision, respectively. We additionally develop a learning pipeline and corresponding method to enable learning from imperfect predicted 2D labels, which could be additionally acquired by synthesizing in an augmented set of virtual training views complementing the original real captures, enabling more efficient self-supervision loop for semantics. As a result, our end-to-end trainable solution jointly addresses geometry completion, colorization, and semantic mapping from limited RGB-D images, without relying on any 3D ground-truth information. Our method achieves state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet, surpasses baselines even with costly 3D annotations in predicting both geometry and semantics. To our knowledge, our method is also the first 2D-driven method addressing completion and semantic segmentation of real-world 3D scans simultaneously.

2.
IEEE Trans Image Process ; 33: 2058-2073, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38470576

RESUMO

Existing Cross-Domain Few-Shot Learning (CDFSL) methods require access to source domain data to train a model in the pre-training phase. However, due to increasing concerns about data privacy and the desire to reduce data transmission and training costs, it is necessary to develop a CDFSL solution without accessing source data. For this reason, this paper explores a Source-Free CDFSL (SF-CDFSL) problem, in which CDFSL is addressed through the use of existing pretrained models instead of training a model with source data, avoiding accessing source data. However, due to the lack of source data, we face two key challenges: effectively tackling CDFSL with limited labeled target samples, and the impossibility of addressing domain disparities by aligning source and target domain distributions. This paper proposes an Enhanced Information Maximization with Distance-Aware Contrastive Learning (IM-DCL) method to address these challenges. Firstly, we introduce the transductive mechanism for learning the query set. Secondly, information maximization (IM) is explored to map target samples into both individual certainty and global diversity predictions, helping the source model better fit the target data distribution. However, IM fails to learn the decision boundary of the target task. This motivates us to introduce a novel approach called Distance-Aware Contrastive Learning (DCL), in which we consider the entire feature set as both positive and negative sets, akin to Schrödinger's concept of a dual state. Instead of a rigid separation between positive and negative sets, we employ a weighted distance calculation among features to establish a soft classification of the positive and negative sets for the entire feature set. We explore three types of negative weights to enhance the performance of CDFSL. Furthermore, we address issues related to IM by incorporating contrastive constraints between object features and their corresponding positive and negative sets. Evaluations of the 4 datasets in the BSCD-FSL benchmark indicate that the proposed IM-DCL, without accessing the source domain, demonstrates superiority over existing methods, especially in the distant domain task. Additionally, the ablation study and performance analysis confirmed the ability of IM-DCL to handle SF-CDFSL. The code will be made public at https://github.com/xuhuali-mxj/IM-DCL.

3.
Neural Netw ; 178: 106429, 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38901090

RESUMO

Although recent studies on blind single image super-resolution (SISR) have achieved significant success, most of them typically require supervised training on synthetic low resolution (LR)-high resolution (HR) paired images. This leads to re-training necessity for different degradations and restricted applications in real-world scenarios with unfavorable inputs. In this paper, we propose an unsupervised blind SISR method with input underlying different degradations, named different degradations blind super-resolution (DDSR). It formulates a Gaussian modeling on blur degradation and employs a meta-learning framework for solving different image degradations. Specifically, a neural network-based kernel generator is optimized by learning from random kernel samples, referred to as random kernel learning. This operation provides effective initialization for blur degradation optimization. At the same time, a meta-learning framework is proposed to resolve multiple degradation modelings on the basis of alternative optimization between blur degradation and image restoration, respectively. Differing from the pre-trained deep-learning methods, the proposed DDSR is implemented in a plug-and-play manner, and is capable of restoring HR image from unfavorable LR input with degradations such as partial coverage, noise addition, and darkening. Extensive simulations illustrate the superior performance of the proposed DDSR approach compared to the state-of-the-arts on public datasets with comparable memory load and time consumption, yet exhibiting better application flexibility and convenience, and significantly better generalization ability towards multiple degradations. Our code is available at https://github.com/XYLGroup/DDSR.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12562-12580, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37307188

RESUMO

Despite the impressive performance of recent unbiased Scene Graph Generation (SGG) methods, the current debiasing literature mainly focuses on the long-tailed distribution problem, whereas it overlooks another source of bias, i.e., semantic confusion, which makes the SGG model prone to yield false predictions for similar relationships. In this paper, we explore a debiasing procedure for the SGG task leveraging causal inference. Our central insight is that the Sparse Mechanism Shift (SMS) in causality allows independent intervention on multiple biases, thereby potentially preserving head category performance while pursuing the prediction of high-informative tail relationships. However, the noisy datasets lead to unobserved confounders for the SGG task, and thus the constructed causal models are always causal-insufficient to benefit from SMS. To remedy this, we propose Two-stage Causal Modeling (TsCM) for the SGG task, which takes the long-tailed distribution and semantic confusion as confounders to the Structural Causal Model (SCM) and then decouples the causal intervention into two stages. The first stage is causal representation learning, where we use a novel Population Loss (P-Loss) to intervene in the semantic confusion confounder. The second stage introduces the Adaptive Logit Adjustment (AL-Adjustment) to eliminate the long-tailed distribution confounder to complete causal calibration learning. These two stages are model agnostic and thus can be used in any SGG model that seeks unbiased predictions. Comprehensive experiments conducted on the popular SGG backbones and benchmarks show that our TsCM can achieve state-of-the-art performance in terms of mean recall rate. Furthermore, TsCM can maintain a higher recall rate than other debiasing methods, which indicates that our method can achieve a better tradeoff between head and tail relationships.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA