Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 10929-10946, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37018107

RESUMO

In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes. The proposed GCoNet+ achieves the new state-of-the-art performance for co-salient object detection (CoSOD) through mining consensus representations based on the following two essential criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module (GAM); 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module (GCM) conditioning on the inconsistent consensus. To further improve the accuracy, we design a series of simple yet effective components as follows: i) a recurrent auxiliary classification module (RACM) promoting model learning at the semantic level; ii) a confidence enhancement module (CEM) assisting the model in improving the quality of the final predictions; and iii) a group-based symmetric triplet (GST) loss guiding the model to learn more discriminative features. Extensive experiments on three challenging benchmarks, i.e., CoCA, CoSOD3k, and CoSal2015, demonstrate that our GCoNet+ outperforms the existing 12 cutting-edge models. Code has been released at https://github.com/ZhengPeng7/GCoNet_plus.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10197-10211, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37027560

RESUMO

Segmenting highly-overlapping image objects is challenging, because there is typically no distinction between real object contours and occlusion boundaries on images. Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees). The explicit modeling of occlusion relationship with bilayer structure naturally decouples the boundaries of both the occluding and occluded instances, and considers the interaction between them during mask regression. We investigate the efficacy of bilayer structure using two popular convolutional network designs, namely, Fully Convolutional Network (FCN) and Graph Convolutional Network (GCN). Further, we formulate bilayer decoupling using the vision transformer (ViT), by representing instances in the image as separate learnable occluder and occludee queries. Large and consistent improvements using one/two-stage and query-based object detectors with various backbones and network layer choices validate the generalization ability of bilayer decoupling, as shown by extensive experiments on image instance segmentation benchmarks (COCO, KINS, COCOA) and video instance segmentation benchmarks (YTVIS, OVIS, BDD100 K MOTS), especially for heavy occlusion cases.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos
3.
IEEE Trans Image Process ; 28(1): 45-55, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30028702

RESUMO

We propose a deep learning approach for directly estimating relative atmospheric visibility from outdoor photos without relying on weather images or data that require expensive sensing or custom capture. Our data-driven approach capitalizes on a large collection of Internet images to learn rich scene and visibility varieties. The relative CNN-RNN coarse-to-fine model, where CNN stands for convolutional neural network and RNN stands for recurrent neural network, exploits the joint power of relative support vector machine, which has a good ranking representation, and the data-driven deep learning features derived from our novel CNN-RNN model. The CNN-RNN model makes use of shortcut connections to bridge a CNN module and an RNN coarse-to-fine module. The CNN captures the global view while the RNN simulates human's attention shift, namely, from the whole image (global) to the farthest discerned region (local). The learned relative model can be adapted to predict absolute visibility in limited scenarios. Extensive experiments and comparisons are performed to verify our method. We have built an annotated dataset consisting of about 40000 images with 0.2 million human annotations. The large-scale, annotated visibility data set will be made available to accompany this paper.

4.
IEEE Trans Vis Comput Graph ; 24(6): 2051-2063, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28489537

RESUMO

We present a real-time video stylization system and demonstrate a variety of painterly styles rendered on real video inputs. The key technical contribution lies on the object flow, which is robust to inaccurate optical flow, unknown object transformation and partial occlusion as well. Since object flows relate regions of the same object across frames, shower-door effect can be effectively reduced where painterly strokes and textures are rendered on video objects. The construction of object flows is performed in real time and automatically after applying metric learning. To reduce temporal flickering, we extend the bilateral filtering into motion bilateral filtering. We propose quantitative metrics to measure the temporal coherence on structures and textures of our stylized videos, and perform extensive experiments to compare our stylized results with baseline systems and prior works specializing in watercolor and abstraction.

5.
IEEE Trans Pattern Anal Mach Intell ; 39(12): 2510-2524, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-28113309

RESUMO

Given a single outdoor image, we propose a collaborative learning approach using novel weather features to label the image as either sunny or cloudy. Though limited, this two-class classification problem is by no means trivial given the great variety of outdoor images captured by different cameras where the images may have been edited after capture. Our overall weather feature combines the data-driven convolutional neural network (CNN) feature and well-chosen weather-specific features. They work collaboratively within a unified optimization framework that is aware of the presence (or absence) of a given weather cue during learning and classification. In this paper we propose a new data augmentation scheme to substantially enrich the training data, which is used to train a latent SVM framework to make our solution insensitive to global intensity transfer. Extensive experiments are performed to verify our method. Compared with our previous work and the sole use of a CNN classifier, this paper improves the accuracy up to 7-8 percent. Our weather image dataset is available together with the executable of our classifier.

6.
IEEE Trans Vis Comput Graph ; 22(10): 2275-2288, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-26685251

RESUMO

Previous research on impossible figures focuses extensively on single view modeling and rendering. Existing computer games that employ impossible figures as navigation maze for gaming either use a fixed third-person view with axonometric projection to retain the figure's impossibility perception, or simply break the figure's impossibility upon view changes. In this paper, we present a new approach towards 3D gaming with impossible figures, delivering for the first time navigation in 3D mazes constructed from impossible figures. Such result cannot be achieved by previous research work in modeling impossible figures. To deliver seamless gaming navigation and interaction, we propose i) a set of guiding principles for bringing out subtle perceptions and ii) a novel computational approach to construct 3D structures from impossible figure images and then to dynamically construct the impossible-figure maze subjected to user's view. In the end, we demonstrate and discuss our method with a variety of generic maze types.

7.
IEEE Trans Pattern Anal Mach Intell ; 37(4): 890-7, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26353301

RESUMO

Reconstructing transparent objects is a challenging problem. While producing reasonable results for quite complex objects, existing approaches require custom calibration or somewhat expensive labor to achieve high precision. When an overall shape preserving salient and fine details is sufficient, we show in this paper a significant step toward solving the problem when the object's silhouette is available and simple user interaction is allowed, by using a video of a transparent object shot under varying illumination. Specifically, we estimate the normal map of the exterior surface of a given solid transparent object, from which the surface depth can be integrated. Our technical contribution lies in relating this normal estimation problem to one of graph-cut segmentation. Unlike conventional formulations, however, our graph is dual-layered, since we can see a transparent object's foreground as well as the background behind it. Quantitative and qualitative evaluation are performed to verify the efficacy of this practical solution.

8.
IEEE Trans Pattern Anal Mach Intell ; 35(9): 2175-88, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23868778

RESUMO

This paper proposes to apply the nonlocal principle to general alpha matting for the simultaneous extraction of multiple image layers; each layer may have disjoint as well as coherent segments typical of foreground mattes in natural image matting. The estimated alphas also satisfy the summation constraint. As in nonlocal matting, our approach does not assume the local color-line model and does not require sophisticated sampling or learning strategies. On the other hand, our matting method generalizes well to any color or feature space in any dimension, any number of alphas and layers at a pixel beyond two, and comes with an arguably simpler implementation, which we have made publicly available. Our matting technique, aptly called KNN matting, capitalizes on the nonlocal principle by using $(K)$ nearest neighbors (KNN) in matching nonlocal neighborhoods, and contributes a simple and fast algorithm that produces competitive results with sparse user markups. KNN matting has a closed-form solution that can leverage the preconditioned conjugate gradient method to produce an efficient implementation. Experimental evaluation on benchmark datasets indicates that our matting results are comparable to or of higher quality than state-of-the-art methods requiring more involved implementation. In this paper, we take the nonlocal principle beyond alpha estimation and extract overlapping image layers using the same Laplacian framework. Given the alpha value, our closed form solution can be elegantly generalized to solve the multilayer extraction problem. We perform qualitative and quantitative comparisons to demonstrate the accuracy of the extracted image layers.

9.
IEEE Trans Pattern Anal Mach Intell ; 34(8): 1482-95, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22184257

RESUMO

We prove a closed-form solution to tensor voting (CFTV): Given a point set in any dimensions, our closed-form solution provides an exact, continuous, and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation. Using CFTV, we prove the convergence of tensor voting on a Markov random field (MRF), thus termed as MRFTV, where the structure-aware tensor at each input site reaches a stationary state upon convergence in structure propagation. We then embed structure-aware tensor into expectation maximization (EM) for optimizing a single linear structure to achieve efficient and robust parameter estimation. Specifically, our EMTV algorithm optimizes both the tensor and fitting parameters and does not require random sampling consensus typically used in existing robust statistical techniques. We performed quantitative evaluation on its accuracy and robustness, showing that EMTV performs better than the original TV and other state-of-the-art techniques in fundamental matrix estimation for multiview stereo matching. The extensions of CFTV and EMTV for extracting multiple and nonlinear structures are underway.

10.
IEEE Trans Pattern Anal Mach Intell ; 32(11): 2085-99, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20847395

RESUMO

Representative surface reconstruction algorithms taking a gradient field as input enforce the integrability constraint in a discrete manner. While enforcing integrability allows the subsequent integration to produce surface heights, existing algorithms have one or more of the following disadvantages: They can only handle dense per-pixel gradient fields, smooth out sharp features in a partially integrable field, or produce severe surface distortion in the results. In this paper, we present a method which does not enforce discrete integrability and reconstructs a 3D continuous surface from a gradient or a height field, or a combination of both, which can be dense or sparse. The key to our approach is the use of kernel basis functions, which transfer the continuous surface reconstruction problem into high-dimensional space, where a closed-form solution exists. By using the Gaussian kernel, we can derive a straightforward implementation which is able to produce results better than traditional techniques. In general, an important advantage of our kernel-based method is that the method does not suffer discretization and finite approximation, both of which lead to surface distortion, which is typical of Fourier or wavelet bases widely adopted by previous representative approaches. We perform comparisons with classical and recent methods on benchmark as well as challenging data sets to demonstrate that our method produces accurate surface reconstruction that preserves salient and sharp features. The source code and executable of the system are available for downloading.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Humanos , Distribuição Normal , Reprodutibilidade dos Testes , Software
11.
IEEE Trans Pattern Anal Mach Intell ; 32(3): 546-60, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20075477

RESUMO

This paper presents a robust and automatic approach to photometric stereo, where the two main components, namely surface normals and visible surfaces, are respectively optimized by Expectation Maximization (EM). A dense set of input images is conveniently captured using a digital video camera while a handheld spotlight is being moved around the target object and a small mirror sphere. In our approach, the inherently complex optimization problem is simplified into a two-step optimization, where EM is employed in each step: 1) Using the dense input, the weight or importance of each observation is alternately optimized with the normal and albedo at each pixel and 2) using the optimized normals and employing the Markov Random Fields (MRFs), surface integrabilities and discontinuities are alternately optimized in visible surface reconstruction. Our mathematical derivation gives simple updating rules for the EM algorithms, leading to a stable, practical, and parameter-free implementation that is very robust even in the presence of complex geometry, shadows, highlight, and transparency. We present high-quality results on normal and visible surface reconstruction, where fine geometric details are automatically recovered by our method.

12.
IEEE Trans Pattern Anal Mach Intell ; 30(4): 617-31, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18276968

RESUMO

The aim of this paper is to achieve seamless image stitching without producing visual artifact caused by severe intensity discrepancy and structure misalignment, given that the input images are roughly aligned or globally registered. Our new approach is based on structure deformation and propagation for achieving overall consistency in image structure and intensity. The new stitching algorithm, which has found applications in image compositing, image blending, and intensity correction,consists of the following main processes. Depending on the compatibility and distinctiveness of the 2-D features detected in the image plane, single or double optimal partitions are computed subject to the constraints of intensity coherence and structure continuity. Afterwards, specific 1-D features are detected along the computed optimal partitions, from which a set of sparse deformation vectors is derived to encode 1-D feature matching between the partitions. These sparse deformation cues are robustly propagated into the input images by solving the associated minimization problem in gradient domain, thus providing a uniform framework for the simultaneous alignment of image structure and intensity. We present results in general image compositing and blending, in order to show the effectiveness of our method in producing seamless stitching results from complex input images.


Assuntos
Algoritmos , Artefatos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
13.
IEEE Trans Pattern Anal Mach Intell ; 29(9): 1520-37, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17627041

RESUMO

We propose an automatic approach to soft color segmentation, which produces soft color segments with appropriate amount of overlapping and transparency essential to synthesizing natural images for a wide range of image-based applications. While many state-of-the-art and complex techniques are excellent at partitioning an input image to facilitate deriving a semantic description of the scene, to achieve seamless image synthesis, we advocate to a segmentation approach designed to maintain spatial and color coherence among soft segments while preserving discontinuities, by assigning to each pixel a set of soft labels corresponding to their respective color distributions. We optimize a global objective function which simultaneously exploits the reliability given by global color statistics and flexibility of local image compositing, leading to an image model where the global color statistics of an image is represented by a Gaussian Mixture Model (GMM), while the color of a pixel is explained by a local color mixture model where the weights are defined by the soft labels to the elements of the converged GMM. Transparency is naturally introduced in our probabilistic framework which infers an optimal mixture of colors at an image pixel. To adequately consider global and local information in the same framework, an alternating optimization scheme is proposed to iteratively solve for the global and local model parameters. Our method is fully automatic, and is shown to converge to a good optimal solution. We perform extensive evaluation and comparison, and demonstrate that our method achieves good image synthesis results for image-based applications such as image matting, color transfer, image deblurring, and image colorization.


Assuntos
Algoritmos , Inteligência Artificial , Cor , Colorimetria/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
14.
IEEE Trans Pattern Anal Mach Intell ; 28(11): 1830-46, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17063687

RESUMO

We address the problem of robust normal reconstruction by dense photometric stereo, in the presence of complex geometry, shadows, highlight, transparencies, variable attenuation in light intensities, and inaccurate estimation in light directions. The input is a dense set of noisy photometric images, conveniently captured by using a very simple set-up consisting of a digital video camera, a reflective mirror sphere, and a handheld spotlight. We formulate the dense photometric stereo problem as a Markov network and investigate two important inference algorithms for Markov Random Fields (MRFs)--graph cuts and belief propagation--to optimize for the most likely setting for each node in the network. In the graph cut algorithm, the MRF formulation is translated into one of energy minimization. A discontinuity-preserving metric is introduced as the compatibility function, which allows alpha-expansion to efficiently perform the maximum a posteriori (MAP) estimation. Using the identical dense input and the same MRF formulation, our tensor belief propagation algorithm recovers faithful normal directions, preserves underlying discontinuities, improves the normal estimation from one of discrete to continuous, and drastically reduces the storage requirement and running time. Both algorithms produce comparable and very faithful normals for complex scenes. Although the discontinuity-preserving metric in graph cuts permits efficient inference of optimal discrete labels with a theoretical guarantee, our estimation algorithm using tensor belief propagation converges to comparable results, but runs faster because very compact messages are passed and combined. We present very encouraging results on normal reconstruction. A simple algorithm is proposed to reconstruct a surface from a normal map recovered by our method. With the reconstructed surface, an inverse process, known as relighting in computer graphics, is proposed to synthesize novel images of the given scene under user-specified light source and direction. The synthesis is made to run in real time by exploiting the state-of-the-art graphics processing unit (GPU). Our method offers many unique advantages over previous relighting methods and can handle a wide range of novel light sources and directions.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Fotogrametria/métodos , Fotometria/métodos , Armazenamento e Recuperação da Informação/métodos , Cadeias de Markov
15.
IEEE Trans Pattern Anal Mach Intell ; 28(5): 832-9, 2006 May.
Artigo em Inglês | MEDLINE | ID: mdl-16640269

RESUMO

This paper presents a complete system capable of synthesizing a large number of pixels that are missing due to occlusion or damage in an uncalibrated input video. These missing pixels may correspond to the static background or cyclic motions of the captured scene. Our system employs user-assisted video layer segmentation, while the main processing in video repair is fully automatic. The input video is first decomposed into the color and illumination videos. The necessary temporal consistency is maintained by tensor voting in the spatio-temporal domain. Missing colors and illumination of the background are synthesized by applying image repairing. Finally, the occluded motions are inferred by spatio-temporal alignment of collected samples at multiple scales. We experimented on our system with some difficult examples with variable illumination, where the capturing camera can be stationary or in motion.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Iluminação , Reconhecimento Automatizado de Padrão/métodos , Fotometria/métodos , Gravação em Vídeo/métodos , Armazenamento e Recuperação da Informação/métodos , Movimento (Física) , Oscilometria/métodos , Fotografação/métodos , Técnica de Subtração
16.
IEEE Trans Pattern Anal Mach Intell ; 27(1): 36-50, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15628267

RESUMO

This paper presents a voting method to perform image correction by global and local intensity alignment. The key to our modeless approach is the estimation of global and local replacement functions by reducing the complex estimation problem to the robust 2D tensor voting in the corresponding voting spaces. No complicated model for replacement function (curve) is assumed. Subject to the monotonic constraint only, we vote for an optimal replacement function by propagating the curve smoothness constraint using a dense tensor field. Our method effectively infers missing curve segments and rejects image outliers. Applications using our tensor voting approach are proposed and described. The first application consists of image mosaicking of static scenes, where the voted replacement functions are used in our iterative registration algorithm for computing the best warping matrix. In the presence of occlusion, our replacement function can be employed to construct a visually acceptable mosaic by detecting occlusion which has large and piecewise constant color. Furthermore, by the simultaneous consideration of color matches and spatial constraints in the voting space, we perform image intensity compensation and high contrast image correction using our voting framework, when only two defective input images are given.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Análise por Conglomerados , Cor , Gráficos por Computador , Simulação por Computador , Modelos Biológicos , Modelos Estatísticos , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador , Interface Usuário-Computador
17.
IEEE Trans Pattern Anal Mach Intell ; 26(5): 594-611, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15460281

RESUMO

Most computer vision applications require the reliable detection of boundaries. In the presence of outliers, missing data, orientation discontinuities, and occlusion, this problem is particularly challenging. We propose to address it by complementing the tensor voting framework, which was limited to second order properties, with first order representation and voting. First order voting fields and a mechanism to vote for 3D surface and volume boundaries and curve endpoints in 3D are defined. Boundary inference is also useful for a second difficult problem in grouping, namely, automatic scale selection. We propose an algorithm that automatically infers the smallest scale that can preserve the finest details. Our algorithm then proceeds with progressively larger scales to ensure continuity where it has not been achieved. Therefore, the proposed approach does not oversmooth features or delay the handling of boundaries and discontinuities until model misfit occurs. The interaction of smooth features, boundaries, and outliers is accommodated by the unified representation, making possible the perceptual organization of data in curves, surfaces, volumes, and their boundaries simultaneously. We present results on a variety of data sets to show the efficacy of the improved formalism.


Assuntos
Algoritmos , Inteligência Artificial , Encéfalo/anatomia & histologia , Encéfalo/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão , Análise por Conglomerados , Simulação por Computador , Humanos , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos , Análise Numérica Assistida por Computador , Radiografia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador , Técnica de Subtração
18.
IEEE Trans Pattern Anal Mach Intell ; 26(1): 45-62, 2004 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15382685

RESUMO

A new approach to computing a panoramic (360 degrees) depth map is presented in this paper. Our approach uses a large collection of images taken by a camera whose motion has been constrained to planar concentric circles. We resample regular perspective images to produce a set of multiperspective panoramas and then compute depth maps directly from these resampled panoramas. Our panoramas sample uniformly in three dimensions: rotation angle, inverse radial distance, and vertical elevation. The use of multiperspective panoramas eliminates the limited overlap present in the original input images and, thus, problems as in conventional multibaseline stereo can be avoided. Our approach differs from stereo matching of single-perspective panoramic images taken from different locations, where the epipolar constraints are sine curves. For our multiperspective panoramas, the epipolar geometry, to the first order approximation, consists of horizontal lines. Therefore, any traditional stereo algorithm can be applied to multiperspective panoramas with little modification. In this paper, we describe two reconstruction algorithms. The first is a cylinder sweep algorithm that uses a small number of resampled multiperspective panoramas to obtain dense 3D reconstruction. The second algorithm, in contrast, uses a large number of multiperspective panoramas and takes advantage of the approximate horizontal epipolar geometry inherent in multiperspective panoramas. It comprises a novel and efficient 1D multibaseline matching technique, followed by tensor voting to extract the depth surface. Experiments show that our algorithms are capable of producing comparable high quality depth maps which can be used for applications such as view interpolation.


Assuntos
Algoritmos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão , Fotogrametria/métodos , Processamento de Sinais Assistido por Computador , Técnica de Subtração , Inteligência Artificial , Gráficos por Computador , Simulação por Computador , Percepção de Profundidade , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
19.
IEEE Trans Vis Comput Graph ; 10(1): 58-71, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15382698

RESUMO

We propose a novel 2D representation for 3D visibility sorting, the Binary-Space-Partitioned Image (BSPI), to accelerate real-time image-based rendering. BSPI is an efficient 2D realization of a 3D BSP tree, which is commonly used in computer graphics for time-critical visibility sorting. Since the overall structure of a BSP tree is encoded in a BSPI, traversing a BSPI is comparable to traversing the corresponding BSP tree. BSPI performs visibility sorting efficiently and accurately in the 2D image space by warping the reference image triangle-by-triangle instead of pixel-by-pixel. Multiple BSPIs can be combined to solve "disocclusion," when an occluded portion of the scene becomes visible at a novel viewpoint. Our method is highly automatic, including a tensor voting preprocessing step that generates candidate image partition lines for BSPIs, filters the noisy input data by rejecting outliers, and interpolates missing information. Our system has been applied to a variety of real data, including stereo, motion, and range images.


Assuntos
Algoritmos , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Sistemas On-Line , Reconhecimento Automatizado de Padrão , Interface Usuário-Computador , Processamento de Sinais Assistido por Computador , Visão Ocular
20.
IEEE Trans Pattern Anal Mach Intell ; 26(9): 1167-84, 2004 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-15742892

RESUMO

We address the problem of simultaneous two-view epipolar geometry estimation and motion segmentation from nonstatic scenes. Given a set of noisy image pairs containing matches of n objects, we propose an unconventional, efficient, and robust method, 4D tensor voting, for estimating the unknown n epipolar geometries, and segmenting the static and motion matching pairs into n independent motions. By considering the 4D isotropic and orthogonal joint image space, only two tensor voting passes are needed, and a very high noise to signal ratio (up to five) can be tolerated. Epipolar geometries corresponding to multiple, rigid motions are extracted in succession. Only two uncalibrated frames are needed, and no simplifying assumption (such as affine camera model or homographic model between images) other than the pin-hole camera model is made. Our novel approach consists of propagating a local geometric smoothness constraint in the 4D joint image space, followed by global consistency enforcement for extracting the fundamental matrices corresponding to independent motions. We have performed extensive experiments to compare our method with some representative algorithms to show that better performance on nonstatic scenes are achieved. Results on challenging data sets are presented.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Animais , Análise por Conglomerados , Humanos , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA