Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Neural Netw ; 160: 274-296, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36709531

RESUMO

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of (1) Continuous Learning, (2) Transfer and Adaptation, and (3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.


Assuntos
Educação Continuada , Aprendizado de Máquina
2.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8371-8386, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34543192

RESUMO

While 360° cameras offer tremendous new possibilities in vision, graphics, and augmented reality, the spherical images they produce make visual recognition non-trivial. Ideally, 360° imagery could inherit the deep convolutional neural networks (CNNs) already trained with great success on perspective projection images. However, spherical images cannot be projected to a single plane without significant distortion, and existing methods to transfer CNNs from perspective to spherical images introduce significant computational costs and/or degradations in accuracy. We propose to learn a Spherical Convolution Network (SphConv) that translates a planar CNN to the equirectangular projection of 360° images. Given a source CNN for perspective images as input, SphConv learns to reproduce the flat filter outputs on 360° data, sensitive to the varying distortion effects across the viewing sphere. The key benefits are 1) efficient and accurate recognition for 360° images, and 2) the ability to leverage powerful pre-trained networks for perspective images. We further proposes two instantiation of SphConv-Spherical Kernel learns location dependent kernels on the sphere for SphConv, and Kernel Transformer Network learns a functional transformation that generates SphConv kernels from the source CNN. Among the two variants, Kernel Transformer Network has a much lower memory footprint at the cost of higher computational overhead. Validating our approach with multiple source CNNs and datasets, we show that SphConv using KTN successfully preserves the source CNN's accuracy, while offering efficiency, transferability, and scalability to typical image resolutions. We further introduce a spherical Faster R-CNN model based on SphConv and show that we can learn a spherical object detector without any object annotation in 360° images.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação
3.
IEEE Trans Pattern Anal Mach Intell ; 43(8): 2697-2709, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32078535

RESUMO

Standard video encoders developed for conventional narrow field-of-view video are widely applied to 360° video as well, with reasonable results. However, while this approach commits arbitrarily to a projection of the spherical frames, we observe that some orientations of a 360° video, once projected, are more compressible than others. We introduce an approach to predict the sphere rotation that will yield the maximal compression rate. Given video clips in their original encoding, a convolutional neural network learns the association between a clip's visual content and its compressibility at different rotations of a cubemap projection. Given a novel video, our learning-based approach efficiently infers the most compressible direction in one shot, without repeated rendering and compression of the source video. We validate our idea on thousands of video clips and multiple popular video codecs. The results show that this untapped dimension of 360° compression has substantial potential-"good" rotations are typically 8-18 percent more compressible than bad ones, and our learning approach can predict them reliably 78 percent of the time.

4.
Sci Robot ; 4(30)2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-33137723

RESUMO

Standard computer vision systems assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is a major challenge in itself. We address the problem of learning to look around: How can an agent learn to acquire informative visual observations? We propose a reinforcement learning solution, where the agent is rewarded for reducing its uncertainty about the unobserved portions of its environment. Specifically, the agent is trained to select a short sequence of glimpses, after which it must infer the appearance of its full environment. To address the challenge of sparse rewards, we further introduce sidekick policy learning, which exploits the asymmetry in observability between training and test time. The proposed methods learned observation policies that not only performed the completion task for which they were trained but also generalized to exhibit useful "look-around" behavior for a range of active perception tasks.

5.
IEEE Trans Pattern Anal Mach Intell ; 41(7): 1601-1614, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-29993712

RESUMO

Visual recognition systems mounted on autonomous moving agents face the challenge of unconstrained data, but simultaneously have the opportunity to improve their performance by moving to acquire new views at test time. In this work, we first show how a recurrent neural network-based system may be trained to perform end-to-end learning of motion policies suited for this "active recognition" setting. Further, we hypothesize that active vision requires an agent to have the capacity to reason about the effects of its motions on its view of the world. To verify this hypothesis, we attempt to induce this capacity in our active recognition pipeline, by simultaneously learning to forecast the effects of the agent's motions on its internal representation of the environment conditional on all past views. Results across three challenging datasets confirm both that our end-to-end system successfully learns meaningful policies for active category recognition, and that "learning to look ahead" further boosts recognition performance.

6.
IEEE Trans Pattern Anal Mach Intell ; 41(11): 2677-2692, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30130176

RESUMO

We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all "object-like" regions-even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning an object/background label to each pixel, implemented using a deep fully convolutional network. When applied to a video, our model further incorporates a motion stream, and the network learns to combine both appearance and motion and attempts to extract all prominent objects whether they are moving or not. Beyond the core model, a second contribution of our approach is how it leverages varying strengths of training annotations. Pixel-level annotations are quite difficult to obtain, yet crucial for training a deep network approach for segmentation. Thus we propose ways to exploit weakly labeled data for learning dense foreground segmentation. For images, we show the value in mixing object category examples with image-level labels together with relatively few images with boundary-level annotations. For video, we show how to bootstrap weakly annotated videos together with the network trained for image segmentation. Through experiments on multiple challenging image and video segmentation benchmarks, our method offers consistently strong results and improves the state-of-the-art for fully automatic segmentation of generic (unseen) objects. In addition, we demonstrate how our approach benefits image retrieval and image retargeting, both of which flourish when given our high-quality foreground maps. Code, models, and videos are at: http://vision.cs.utexas.edu/projects/pixelobjectness/.

7.
IEEE Trans Pattern Anal Mach Intell ; 39(5): 908-921, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28113697

RESUMO

We propose an efficient approach for activity detection in video that unifies activity categorization with space-time localization. The main idea is to pose activity detection as a maximum-weight connected subgraph problem. Offline, we learn a binary classifier for an activity category using positive video exemplars that are "trimmed" in time to the activity of interest. Then, given a novel untrimmed video sequence, we decompose it into a 3D array of space-time nodes, which are weighted based on the extent to which their component features support the learned activity model. To perform detection, we then directly localize instances of the activity by solving for the maximum-weight connected subgraph in the test video's space-time graph. We show that this detection strategy permits an efficient branch-and-cut solution for the best-scoring-and possibly non-cubically shaped-portion of the video for a given activity classifier. The upshot is a fast method that can search a broader space of space-time region candidates than was previously practical, which we find often leads to more accurate detection. We demonstrate the proposed algorithm on four datasets, and we show its speed and accuracy advantages over multiple existing search strategies.

8.
IEEE Trans Pattern Anal Mach Intell ; 37(5): 931-43, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-26353319

RESUMO

We propose a dense local region detector to extract features suitable for image matching and object recognition tasks. Whereas traditional local interest operators rely on repeatable structures that often cross object boundaries (e.g., corners, scale-space blobs), our sampling strategy is driven by segmentation, and thus preserves object boundaries and shape. At the same time, whereas existing region-based representations are sensitive to segmentation parameters and object deformations, our novel approach to robustly sample dense sites and determine their connectivity offers better repeatability. In extensive experiments, we find that the proposed region detector provides significantly better repeatability and localization accuracy for object matching compared to an array of existing feature detectors. In addition, we show our regions lead to excellent results on two benchmark tasks that require good feature matching: weakly supervised foreground discovery and nearest neighbor-based object recognition.

9.
IEEE Trans Pattern Anal Mach Intell ; 36(2): 276-88, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24356349

RESUMO

We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the entire database. For this problem, we propose two hashing-based solutions. Our first approach maps the data to 2-bit binary keys that are locality sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sublinear time. Our first method's preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: Taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods' tradeoffs and show that they make it practical to perform active selection with millions of unlabeled points.


Assuntos
Algoritmos , Inteligência Artificial , Compressão de Dados/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador
10.
IEEE Trans Pattern Anal Mach Intell ; 34(6): 1145-58, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22516650

RESUMO

Current uses of tagged images typically exploit only the most explicit information: the link between the nouns named and the objects present somewhere in the image. We propose to leverage "unspoken" cues that rest within an ordered list of image tags so as to improve object localization. We define three novel implicit features from an image's tags-the relative prominence of each object as signified by its order of mention, the scale constraints implied by unnamed objects, and the loose spatial links hinted at by the proximity of names on the list. By learning a conditional density over the localization parameters (position and scale) given these cues, we show how to improve both accuracy and efficiency when detecting the tagged objects. Furthermore, we show how the localization density can be learned in a semantic space shared by the visual and tag-based features, which makes the technique applicable for detection in untagged input images. We validate our approach on the PASCAL VOC, LabelMe, and Flickr image data sets, and demonstrate its effectiveness relative to both traditional sliding windows as well as a visual context baseline. Our algorithm improves state-of-the-art methods, successfully translating insights about human viewing behavior (such as attention, perceived importance, or gaze) into enhanced object detection.


Assuntos
Algoritmos , Reconhecimento Visual de Modelos/fisiologia , Sinais (Psicologia) , Humanos , Semântica
11.
IEEE Trans Pattern Anal Mach Intell ; 34(2): 346-58, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21670480

RESUMO

How can knowing about some categories help us to discover new ones in unlabeled images? Unsupervised visual category discovery is useful to mine for recurring objects without human supervision, but existing methods assume no prior information and thus tend to perform poorly for cluttered scenes with multiple objects. We propose to leverage knowledge about previously learned categories to enable more accurate discovery, and address challenges in estimating their familiarity in unsegmented, unlabeled images. We introduce two variants of a novel object-graph descriptor to encode the 2D and 3D spatial layout of object-level co-occurrence patterns relative to an unfamiliar region and show that by using them to model the interaction between an image's known and unknown objects, we can better detect new visual categories. Rather than mine for all categories from scratch, our method identifies new objects while drawing on useful cues from familiar ones. We evaluate our approach on several benchmark data sets and demonstrate clear improvements in discovery over conventional purely appearance-based baselines.

12.
IEEE Trans Pattern Anal Mach Intell ; 34(6): 1092-104, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22064796

RESUMO

Fast retrieval methods are critical for many large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply for high-dimensional kernelized data when the underlying feature embedding for the kernel is unknown. We show how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm's sublinear time similarity search guarantees for a wide class of useful similarity functions. Since a number of successful image-based kernels have unknown or incomputable embeddings, this is especially valuable for image retrieval tasks. We validate our technique on several data sets, and show that it enables accurate and fast performance for several vision problems, including example-based object classification, local feature matching, and content-based retrieval.

13.
IEEE Trans Pattern Anal Mach Intell ; 31(12): 2143-57, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19834137

RESUMO

We introduce a method that enables scalable similarity search for learned metrics. Given pairwise similarity and dissimilarity constraints between some examples, we learn a Mahalanobis distance function that captures the examples' underlying relationships well. To allow sublinear time similarity search under the learned metric, we show how to encode the learned metric parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality makes it infeasible to learn an explicit transformation over the feature dimensions. We demonstrate the approach applied to a variety of image data sets, as well as a systems data set. The learned metrics improve accuracy relative to commonly used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases.


Assuntos
Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Algoritmos , Bases de Dados Factuais , Humanos , Reconhecimento Automatizado de Padrão , Postura
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA