Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Assunto principal
Tipo de documento
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1219-1231, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-32946384

RESUMO

In this paper we introduce a method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes. The proposed disentangling transformation isolates the contribution made by different groups of parameters to a given loss, without changing its nature. This brings two advantages: i) it simplifies the training dynamics in the presence of losses with complex interactions of parameters; and ii) it allows us to avoid the issue of balancing independent regression terms. We further apply this disentangling transformation to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. We also critically review the AP metric used in KITTI3D and resolve a flaw which affected and biased all previously published results on monocular 3D detection. Our improved metric is now used as official KITTI3D metric. We provide extensive experimental evaluations and ablation studies on the KITTI3D and nuScenes datasets, setting new state-of-the-art results. We provide additional results on all the classes of KITTI3D as well as nuScenes datasets to further validate the robustness of our method, demonstrating its ability to generalize for different types of objects.


Assuntos
Algoritmos
2.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 10099-10113, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34882548

RESUMO

Deep neural networks have enabled major progresses in semantic segmentation. However, even the most advanced neural architectures suffer from important limitations. First, they are vulnerable to catastrophic forgetting, i.e., they perform poorly when they are required to incrementally update their model as new classes are available. Second, they rely on large amount of pixel-level annotations to produce accurate segmentation maps. To tackle these issues, we introduce a novel incremental class learning approach for semantic segmentation taking into account a peculiar aspect of this task: since each training step provides annotation only for a subset of all possible classes, pixels of the background class exhibit a semantic shift. Therefore, we revisit the traditional distillation paradigm by designing novel loss terms which explicitly account for the background shift. Additionally, we introduce a novel strategy to initialize classifier's parameters at each step in order to prevent biased predictions toward the background class. Finally, we demonstrate that our approach can be extended to point- and scribble-based weakly supervised segmentation, modeling the partial annotations to create priors for unlabeled pixels. We demonstrate the effectiveness of our approach with an extensive evaluation on the Pascal-VOC, ADE20K, and Cityscapes datasets, significantly outperforming state-of-the-art methods.

3.
IEEE Trans Pattern Anal Mach Intell ; 43(12): 4441-4452, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-32750781

RESUMO

One of the main challenges for developing visual recognition systems working in the wild is to devise computational models immune from the domain shift problem, i.e., accurate when test data are drawn from a (slightly) different data distribution than training samples. In the last decade, several research efforts have been devoted to devise algorithmic solutions for this issue. Recent attempts to mitigate domain shift have resulted into deep learning models for domain adaptation which learn domain-invariant representations by introducing appropriate loss terms, by casting the problem within an adversarial learning framework or by embedding into deep network specific domain normalization layers. This paper describes a novel approach for unsupervised domain adaptation. Similarly to previous works we propose to align the learned representations by embedding them into appropriate network feature normalization layers. Opposite to previous works, our Domain Alignment Layers are designed not only to match the source and target feature distributions but also to automatically learn the degree of feature alignment required at different levels of the deep network. Differently from most previous deep domain adaptation methods, our approach is able to operate in a multi-source setting. Thorough experiments on four publicly available benchmarks confirm the effectiveness of our approach.

4.
IEEE Trans Pattern Anal Mach Intell ; 43(2): 485-498, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-31398109

RESUMO

Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available by leveraging information from annotated data in a source domain. Most deep UDA approaches operate in a single-source, single-target scenario, i.e., they assume that the source and the target samples arise from a single distribution. However, in practice most datasets can be regarded as mixtures of multiple domains. In these cases, exploiting traditional single-source, single-target methods for learning classification models may lead to poor results. Furthermore, it is often difficult to provide the domain labels for all data points, i.e. latent domains should be automatically discovered. This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets and exploiting this information to learn robust target classifiers. Specifically, our architecture is based on two main components, i.e. a side branch that automatically computes the assignment of each sample to its latent domain and novel layers that exploit domain membership information to appropriately align the distribution of the CNN internal feature representations to a reference distribution. We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.

5.
IEEE Trans Pattern Anal Mach Intell ; 36(10): 2104-16, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26352638

RESUMO

Ensembles of randomized decision trees, known as Random Forests, have become a valuable machine learning tool for addressing many computer vision problems. Despite their popularity, few works have tried to exploit contextual and structural information in random forests in order to improve their performance. In this paper, we propose a simple and effective way to integrate contextual information in random forests, which is typically reflected in the structured output space of complex problems like semantic image labelling. Our paper has several contributions: We show how random forests can be augmented with structured label information and be used to deliver structured low-level predictions. The learning task is carried out by employing a novel split function evaluation criterion that exploits the joint distribution observed in the structured label space. This allows the forest to learn typical label transitions between object classes and avoid locally implausible label configurations. We provide two approaches for integrating the structured output predictions obtained at a local level from the forest into a concise, global, semantic labelling. We integrate our new ideas also in the Hough-forest framework with the view of exploiting contextual information at the classification level to improve the performance on the task of object detection. Finally, we provide experimental evidence for the effectiveness of our approach on different tasks: Semantic image labelling on the challenging MSRCv2 and CamVid databases, reconstruction of occluded handwritten Chinese characters on the Kaist database and pedestrian detection on the TU Darmstadt databases.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA