Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3537-3556, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38145536

RESUMO

3D object detection from images, one of the fundamental and challenging problems in autonomous driving, has received increasing attention from both industry and academia in recent years. Benefiting from the rapid development of deep learning technologies, image-based 3D detection has achieved remarkable progress. Particularly, more than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications. However, to date no recent survey exists to collect and organize this knowledge. In this paper, we fill this gap in the literature and provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection and deeply analyzing each of their components. Additionally, we also propose two new taxonomies to organize the state-of-the-art methods into different categories, with the intent of providing a more systematic review of existing methods and facilitating fair comparisons with future works. In retrospect of what has been achieved so far, we also analyze the current challenges in the field and discuss future directions for image-based 3D detection research.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14234-14247, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37647185

RESUMO

Deep-learning models for 3D point cloud semantic segmentation exhibit limited generalization capabilities when trained and tested on data captured with different sensors or in varying environments due to domain shift. Domain adaptation methods can be employed to mitigate this domain shift, for instance, by simulating sensor noise, developing domain-agnostic generators, or training point cloud completion networks. Often, these methods are tailored for range view maps or necessitate multi-modal input. In contrast, domain adaptation in the image domain can be executed through sample mixing, which emphasizes input data manipulation rather than employing distinct adaptation modules. In this study, we introduce compositional semantic mixing for point cloud domain adaptation, representing the first unsupervised domain adaptation technique for point cloud segmentation based on semantic and geometric sample mixing. We present a two-branch symmetric network architecture capable of concurrently processing point clouds from a source domain (e.g. synthetic) and point clouds from a target domain (e.g. real-world). Each branch operates within one domain by integrating selected data fragments from the other domain and utilizing semantic information derived from source labels and target (pseudo) labels. Additionally, our method can leverage a limited number of human point-level annotations (semi-supervised) to further enhance performance. We assess our approach in both synthetic-to-real and real-to-real scenarios using LiDAR datasets and demonstrate that it significantly outperforms state-of-the-art methods in both unsupervised and semi-supervised settings.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2567-2581, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35358042

RESUMO

A fundamental and challenging problem in deep learning is catastrophic forgetting, i.e., the tendency of neural networks to fail to preserve the knowledge acquired from old tasks when learning new tasks. This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years. While earlier works in computer vision have mostly focused on image classification and object detection, more recently some IL approaches for semantic segmentation have been introduced. These previous works showed that, despite its simplicity, knowledge distillation can be effectively employed to alleviate catastrophic forgetting. In this paper, we follow this research direction and, inspired by recent literature on contrastive learning, we propose a novel distillation framework, Uncertainty-aware Contrastive Distillation (UCD). In a nutshell, UCDis operated by introducing a novel distillation loss that takes into account all the images in a mini-batch, enforcing similarity between features associated to all the pixels from the same classes, and pulling apart those corresponding to pixels from different classes. In order to mitigate catastrophic forgetting, we contrast features of the new model with features extracted by a frozen model learned at the previous incremental step. Our experimental results demonstrate the advantage of the proposed distillation technique, which can be used in synergy with previous IL approaches, and leads to state-of-art performance on three commonly adopted benchmarks for incremental semantic segmentation.

4.
Animals (Basel) ; 12(20)2022 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-36290219

RESUMO

The human-animal relationship is ancient, complex and multifaceted. It may have either positive effects on humans and animals or poor or even negative and detrimental effects on animals or both humans and animals. A large body of literature has investigated the beneficial effects of this relationship in which both human and animals appear to gain physical and psychological benefits from living together in a reciprocated interaction. However, analyzing the literature with a different perspective it clearly emerges that not rarely are human-animal relationships characterized by different forms and levels of discomfort and suffering for animals and, in some cases, also for people. The negative physical and psychological consequences on animals' well-being may be very nuanced and concealed, but there are situations in which the negative consequences are clear and striking, as in the case of animal violence, abuse or neglect. Empathy, attachment and anthropomorphism are human psychological mechanisms that are considered relevant for positive and healthy relationships with animals, but when dysfunctional or pathological determine physical or psychological suffering, or both, in animals as occurs in animal hoarding. The current work reviews some of the literature on the multifaceted nature of the human-animal relationship; describes the key role of empathy, attachment and anthropomorphism in human-animal relationships; seeks to depict how these psychological processes are distorted and dysfunctional in animal hoarding, with highly detrimental effects on both animal and human well-being.

5.
Artigo em Inglês | MEDLINE | ID: mdl-36215371

RESUMO

In the structure from motion, the viewing graph is a graph where the vertices correspond to cameras (or images) and the edges represent the fundamental matrices. We provide a new formulation and an algorithm for determining whether a viewing graph is solvable, i.e., uniquely determines a set of projective cameras. The known theoretical conditions either do not fully characterize the solvability of all viewing graphs, or are extremely difficult to compute because they involve solving a system of polynomial equations with a large number of unknowns. The main result of this paper is a method to reduce the number of unknowns by exploiting cycle consistency. We advance the understanding of solvability by (i) finishing the classification of all minimal graphs up to 9 nodes, (ii) extending the practical verification of solvability to minimal graphs with up to 90 nodes, (iii) finally answering an open research question by showing that finite solvability is not equivalent to solvability, and (iv) formally drawing the connection with the calibrated case (i.e., parallel rigidity). Finally, we present an experiment on real data that shows that unsolvable graphs may appear in practice.

6.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 10099-10113, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34882548

RESUMO

Deep neural networks have enabled major progresses in semantic segmentation. However, even the most advanced neural architectures suffer from important limitations. First, they are vulnerable to catastrophic forgetting, i.e., they perform poorly when they are required to incrementally update their model as new classes are available. Second, they rely on large amount of pixel-level annotations to produce accurate segmentation maps. To tackle these issues, we introduce a novel incremental class learning approach for semantic segmentation taking into account a peculiar aspect of this task: since each training step provides annotation only for a subset of all possible classes, pixels of the background class exhibit a semantic shift. Therefore, we revisit the traditional distillation paradigm by designing novel loss terms which explicitly account for the background shift. Additionally, we introduce a novel strategy to initialize classifier's parameters at each step in order to prevent biased predictions toward the background class. Finally, we demonstrate that our approach can be extended to point- and scribble-based weakly supervised segmentation, modeling the partial annotations to create priors for unlabeled pixels. We demonstrate the effectiveness of our approach with an extensive evaluation on the Pascal-VOC, ADE20K, and Cityscapes datasets, significantly outperforming state-of-the-art methods.

7.
Artigo em Inglês | MEDLINE | ID: mdl-35235506

RESUMO

Convolutional neural networks have enabled major progresses in addressing pixel-level prediction tasks such as semantic segmentation, depth estimation, surface normal prediction and so on, benefiting from their powerful capabilities in visual representation learning. Typically, state of the art models integrate attention mechanisms for improved deep feature representations. Recently, some works have demonstrated the significance of learning and combining both spatial- and channel-wise attentions for deep feature refinement. In this paper, we aim at effectively boosting previous approaches and propose a unified deep framework to jointly learn both spatial attention maps and channel attention vectors in a principled manner so as to structure the resulting attention tensors and model interactions between these two types of attentions. Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to VarIational STructured Attention networks (VISTA-Net). We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters. As demonstrated by our extensive empirical evaluation on six large-scale datasets for dense visual prediction, VISTA-Net outperforms the state-of-the-art in multiple continuous and discrete prediction tasks, thus confirming the benefit of the proposed approach in joint structured spatial-channel attention estimation for deep representation learning. The code is available at https://github.com/ygjwd12345/VISTA-Net.

8.
IEEE Trans Pattern Anal Mach Intell ; 44(5): 2673-2688, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-33301402

RESUMO

Multi-scale representations deeply learned via convolutional neural networks have shown tremendous importance for various pixel-level prediction problems. In this paper we present a novel approach that advances the state of the art on pixel-level prediction in a fundamental aspect, i.e. structured multi-scale features learning and fusion. In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner. In order to further improve the learning capacity of the network structure, we propose to exploit feature dependant conditional kernels within the deep probabilistic framework. Extensive experiments are conducted on four publicly available datasets (i.e. BSDS500, NYUD-V2, KITTI and Pascal-Context) and on three challenging pixel-wise prediction problems involving both discrete and continuous labels (i.e. monocular depth estimation, object contour prediction and semantic segmentation). Quantitative and qualitative results demonstrate the effectiveness of the proposed latent AG-CRF model and the overall probabilistic graph attention network with feature conditional kernels for structured feature learning and pixel-wise prediction.

9.
IEEE Trans Pattern Anal Mach Intell ; 43(2): 485-498, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-31398109

RESUMO

Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available by leveraging information from annotated data in a source domain. Most deep UDA approaches operate in a single-source, single-target scenario, i.e., they assume that the source and the target samples arise from a single distribution. However, in practice most datasets can be regarded as mixtures of multiple domains. In these cases, exploiting traditional single-source, single-target methods for learning classification models may lead to poor results. Furthermore, it is often difficult to provide the domain labels for all data points, i.e. latent domains should be automatically discovered. This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets and exploiting this information to learn robust target classifiers. Specifically, our architecture is based on two main components, i.e. a side branch that automatically computes the assignment of each sample to its latent domain and novel layers that exploit domain membership information to appropriately align the distribution of the CNN internal feature representations to a reference distribution. We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.

10.
IEEE Trans Pattern Anal Mach Intell ; 43(12): 4441-4452, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-32750781

RESUMO

One of the main challenges for developing visual recognition systems working in the wild is to devise computational models immune from the domain shift problem, i.e., accurate when test data are drawn from a (slightly) different data distribution than training samples. In the last decade, several research efforts have been devoted to devise algorithmic solutions for this issue. Recent attempts to mitigate domain shift have resulted into deep learning models for domain adaptation which learn domain-invariant representations by introducing appropriate loss terms, by casting the problem within an adversarial learning framework or by embedding into deep network specific domain normalization layers. This paper describes a novel approach for unsupervised domain adaptation. Similarly to previous works we propose to align the learned representations by embedding them into appropriate network feature normalization layers. Opposite to previous works, our Domain Alignment Layers are designed not only to match the source and target feature distributions but also to automatically learn the degree of feature alignment required at different levels of the deep network. Differently from most previous deep domain adaptation methods, our approach is able to operate in a multi-source setting. Thorough experiments on four publicly available benchmarks confirm the effectiveness of our approach.

11.
IEEE Trans Pattern Anal Mach Intell ; 42(10): 2380-2395, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31545713

RESUMO

Recent deep monocular depth estimation approaches based on supervised regression have achieved remarkable performance. However, they require costly ground truth annotations during training. To cope with this issue, in this paper we present a novel unsupervised deep learning approach for predicting depth maps. We introduce a new network architecture, named Progressive Fusion Network (PFN), that is specifically designed for binocular stereo depth estimation. This network is based on a multi-scale refinement strategy that combines the information provided by both stereo views. In addition, we propose to stack twice this network in order to form a cycle. This cycle approach can be interpreted as a form of data-augmentation since, at training time, the network learns both from the training set images (in the forward half-cycle) but also from the synthesized images (in the backward half-cycle). The architecture is jointly trained with adversarial learning. Extensive experiments on the publicly available datasets KITTI, Cityscapes and ApolloScape demonstrate the effectiveness of the proposed model which is competitive with other unsupervised deep learning methods for depth prediction.

12.
IEEE Trans Med Imaging ; 39(8): 2676-2687, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32406829

RESUMO

Deep learning (DL) has proved successful in medical imaging and, in the wake of the recent COVID-19 pandemic, some works have started to investigate DL-based solutions for the assisted diagnosis of lung diseases. While existing works focus on CT scans, this paper studies the application of DL techniques for the analysis of lung ultrasonography (LUS) images. Specifically, we present a novel fully-annotated dataset of LUS images collected from several Italian hospitals, with labels indicating the degree of disease severity at a frame-level, video-level, and pixel-level (segmentation masks). Leveraging these data, we introduce several deep models that address relevant tasks for the automatic analysis of LUS images. In particular, we present a novel deep network, derived from Spatial Transformer Networks, which simultaneously predicts the disease severity score associated to a input frame and provides localization of pathological artefacts in a weakly-supervised way. Furthermore, we introduce a new method based on uninorms for effective frame score aggregation at a video-level. Finally, we benchmark state of the art deep models for estimating pixel-level segmentations of COVID-19 imaging biomarkers. Experiments on the proposed dataset demonstrate satisfactory results on all the considered tasks, paving the way to future research on DL for the assisted diagnosis of COVID-19 from LUS data.


Assuntos
Infecções por Coronavirus/diagnóstico por imagem , Aprendizado Profundo , Interpretação de Imagem Assistida por Computador/métodos , Pneumonia Viral/diagnóstico por imagem , Ultrassonografia/métodos , Betacoronavirus , COVID-19 , Humanos , Pulmão/diagnóstico por imagem , Pandemias , Sistemas Automatizados de Assistência Junto ao Leito , SARS-CoV-2
13.
IEEE Trans Pattern Anal Mach Intell ; 41(6): 1426-1440, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29994300

RESUMO

Depth cues have been proved very useful in various computer vision and robotic tasks. This paper addresses the problem of monocular depth estimation from a single still image. Inspired by the effectiveness of recent works on multi-scale convolutional neural networks (CNN), we propose a deep model which fuses complementary information derived from multiple CNN side outputs. Different from previous methods using concatenation or weighted average schemes, the integration is obtained by means of continuous Conditional Random Fields (CRFs). In particular, we propose two different variations, one based on a cascade of multiple CRFs, the other on a unified graphical model. By designing a novel CNN implementation of mean-field updates for continuous CRFs, we show that both proposed models can be regarded as sequential deep networks and that training can be performed end-to-end. Through an extensive experimental evaluation, we demonstrate the effectiveness of the proposed approach and establish new state of the art results for the monocular depth estimation task on three publicly available datasets, i.e., NYUD-V2, Make3D and KITTI.

14.
IEEE Trans Image Process ; 27(9): 4410-4421, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29870357

RESUMO

In this paper, we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR). While, most SBIR approaches focus on extracting low- and mid-level descriptors for direct feature matching, recent works have shown the benefit of learning coupled feature representations to describe data from two related sources. However, cross-domain representation learning methods are typically cast into non-convex minimization problems that are difficult to optimize, leading to unsatisfactory performance. Inspired by self-paced learning (SPL), a learning methodology designed to overcome convergence issues related to local optima by exploiting the samples in a meaningful order (i.e., easy to hard), we introduce the cross-paced partial curriculum learning (CPPCL) framework. Compared with existing SPL methods which only consider a single modality and cannot deal with prior knowledge, CPPCL is specifically designed to assess the learning pace by jointly handling data from dual sources and modality-specific prior information provided in the form of partial curricula. In addition, thanks to the learned dictionaries, we demonstrate that the proposed CPPCL embeds robust coupled representations for SBIR. Our approach is extensively evaluated on four publicly available datasets (i.e., CUFS, Flickr15K, QueenMary SBIR, and TU-Berlin Extension datasets), showing superior performance over competing SBIR methods.

15.
IEEE Trans Med Imaging ; 26(10): 1357-65, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17948726

RESUMO

In the framework of computer-aided diagnosis of eye diseases, retinal vessel segmentation based on line operators is proposed. A line detector, previously used in mammography, is applied to the green channel of the retinal image. It is based on the evaluation of the average grey level along lines of fixed length passing through the target pixel at different orientations. Two segmentation methods are considered. The first uses the basic line detector whose response is thresholded to obtain unsupervised pixel classification. As a further development, we employ two orthogonal line detectors along with the grey level of the target pixel to construct a feature vector for supervised classification using a support vector machine. The effectiveness of both methods is demonstrated through receiver operating characteristic analysis on two publicly available databases of color fundus images.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Vasos Retinianos/anatomia & histologia , Retinoscopia/métodos , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
16.
Med Biol Eng Comput ; 55(6): 909-921, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-27638109

RESUMO

Stroke patients should be dispatched at the highest level of care available in the shortest time. In this context, a transportable system in specialized ambulances, able to evaluate the presence of an acute brain lesion in a short time interval (i.e., few minutes), could shorten delay of treatment. UWB radar imaging is an emerging diagnostic branch that has great potential for the implementation of a transportable and low-cost device. Transportability, low cost and short response time pose challenges to the signal processing algorithms of the backscattered signals as they should guarantee good performance with a reasonably low number of antennas and low computational complexity, tightly related to the response time of the device. The paper shows that a PCA-based preprocessing algorithm can: (1) achieve good performance already with a computationally simple beamforming algorithm; (2) outperform state-of-the-art preprocessing algorithms; (3) enable a further improvement in the performance (and/or decrease in the number of antennas) by using a multistatic approach with just a modest increase in computational complexity. This is an important result toward the implementation of such a diagnostic device that could play an important role in emergency scenario.


Assuntos
Diagnóstico por Imagem/métodos , Acidente Vascular Cerebral/diagnóstico , Algoritmos , Artefatos , Humanos , Micro-Ondas , Radar/instrumentação , Processamento de Sinais Assistido por Computador
17.
IEEE Trans Neural Netw ; 17(4): 1085-91, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16856671

RESUMO

An analog neural network for support vector machine learning is proposed, based on a partially dual formulation of the quadratic programming problem. It results in a simpler circuit implementation with respect to existing neural solutions for the same application. The effectiveness of the proposed network is shown through some computer simulations concerning benchmark problems.


Assuntos
Computadores Analógicos , Aprendizagem , Redes Neurais de Computação
18.
IEEE Trans Neural Netw ; 17(5): 1165-74, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17001978

RESUMO

The relation existing between support vector machines (SVMs) and recurrent associative memories is investigated. The design of associative memories based on the generalized brain-state-in-a-box (GBSB) neural model is formulated as a set of independent classification tasks which can be efficiently solved by standard software packages for SVM learning. Some properties of the networks designed in this way are evidenced, like the fact that surprisingly they follow a generalized Hebb's law. The performance of the SVM approach is compared to existing methods with nonsymmetric connections, by some design examples.


Assuntos
Algoritmos , Inteligência Artificial , Metodologias Computacionais , Armazenamento e Recuperação da Informação/métodos , Modelos Teóricos , Reconhecimento Automatizado de Padrão/métodos , Análise por Conglomerados , Simulação por Computador , Redes Neurais de Computação
19.
IEEE Trans Pattern Anal Mach Intell ; 38(6): 1070-83, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26372209

RESUMO

Recently, head pose estimation (HPE) from low-resolution surveillance data has gained in importance. However, monocular and multi-view HPE approaches still work poorly under target motion, as facial appearance distorts owing to camera perspective and scale changes when a person moves around. To this end, we propose FEGA-MTL, a novel framework based on Multi-Task Learning (MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. In the learning phase, guided by two graphs which a-priori model the similarity among (1) grid partitions based on camera geometry and (2) head pose classes, FEGA-MTL derives the optimal scene partitioning and associated pose classifiers. Upon determining the target's position using a person tracker at test time, the corresponding region-specific classifier is invoked for HPE. The FEGA-MTL framework naturally extends to a weakly supervised setting where the target's walking direction is employed as a proxy in lieu of head orientation. Experiments confirm that FEGA-MTL significantly outperforms competing single-task and multi-task learning methods in multi-view settings.


Assuntos
Algoritmos , Cabeça , Movimento (Física) , Humanos , Aprendizagem , Orientação
20.
IEEE Trans Pattern Anal Mach Intell ; 38(8): 1707-20, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26540677

RESUMO

Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.


Assuntos
Algoritmos , Conjuntos de Dados como Assunto , Processos Grupais , Reconhecimento Automatizado de Padrão , Comportamento Social , Adulto , Sinais (Psicologia) , Feminino , Humanos , Relações Interpessoais , Iluminação , Masculino , Gravação em Vídeo , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA