Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10197-10211, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37027560

RESUMEN

Segmenting highly-overlapping image objects is challenging, because there is typically no distinction between real object contours and occlusion boundaries on images. Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees). The explicit modeling of occlusion relationship with bilayer structure naturally decouples the boundaries of both the occluding and occluded instances, and considers the interaction between them during mask regression. We investigate the efficacy of bilayer structure using two popular convolutional network designs, namely, Fully Convolutional Network (FCN) and Graph Convolutional Network (GCN). Further, we formulate bilayer decoupling using the vision transformer (ViT), by representing instances in the image as separate learnable occluder and occludee queries. Large and consistent improvements using one/two-stage and query-based object detectors with various backbones and network layer choices validate the generalization ability of bilayer decoupling, as shown by extensive experiments on image instance segmentation benchmarks (COCO, KINS, COCOA) and video instance segmentation benchmarks (YTVIS, OVIS, BDD100 K MOTS), especially for heavy occlusion cases.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Procesamiento de Imagen Asistido por Computador/métodos
2.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 10929-10946, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37018107

RESUMEN

In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes. The proposed GCoNet+ achieves the new state-of-the-art performance for co-salient object detection (CoSOD) through mining consensus representations based on the following two essential criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module (GAM); 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module (GCM) conditioning on the inconsistent consensus. To further improve the accuracy, we design a series of simple yet effective components as follows: i) a recurrent auxiliary classification module (RACM) promoting model learning at the semantic level; ii) a confidence enhancement module (CEM) assisting the model in improving the quality of the final predictions; and iii) a group-based symmetric triplet (GST) loss guiding the model to learn more discriminative features. Extensive experiments on three challenging benchmarks, i.e., CoCA, CoSOD3k, and CoSal2015, demonstrate that our GCoNet+ outperforms the existing 12 cutting-edge models. Code has been released at https://github.com/ZhengPeng7/GCoNet_plus.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5002-5015, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-33989152

RESUMEN

We propose HyFRIS-Net to jointly estimate the hybrid reflectance and illumination models, as well as the refined face shape from a single unconstrained face image in a pre-defined texture space. The proposed hybrid reflectance and illumination representation ensure photometric face appearance modeling in both parametric and non-parametric spaces for efficient learning. While forcing the reflectance consistency constraint for the same person and face identity constraint for different persons, our approach recovers an occlusion-free face albedo with disambiguated color from the illumination color. Our network is trained in a self-evolving manner to achieve general applicability on real-world data. We conduct comprehensive qualitative and quantitative evaluations with state-of-the-art methods to demonstrate the advantages of HyFRIS-Net in modeling photo-realistic face albedo, illumination, and shape.


Asunto(s)
Iluminación , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Cara/diagnóstico por imagen , Humanos , Reconocimiento de Normas Patrones Automatizadas/métodos
4.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9489-9502, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34822324

RESUMEN

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Point-wise Rotation Invariant Network, focusing on rotation invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. Spherical Voxel Convolution and Point Re-sampling are proposed to extract rotation invariant features for each point. In addition, we extend PRIN to a sparse version called SPRIN, which directly operates on sparse point clouds. Both PRIN and SPRIN can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. Results show that, on the dataset with randomly rotated point clouds, SPRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide thorough theoretical proof and analysis for point-wise rotation invariance achieved by our methods. The code to reproduce our results will be made publicly available.

5.
IEEE Trans Image Process ; 30: 7856-7866, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34524959

RESUMEN

Human pose transfer has been becoming one of the emerging research topics in recent years. However, state-of-the-art results are still far from satisfactory. One main reason is that these end-to-end methods are often blindly trained without the semantic understanding of its content. In this paper, we propose a novel method for human pose transfer with consideration of the semantic part-based representation of a human. In particular, we propose to segment the human body into multiple parts, and each of them represents a semantic region of a human. With the proposed part-based layer generators, a high-quality result is guaranteed for each local semantic region. We design a three-stage hierarchical framework to fuse local representations into the final result in a coarse-to-fine manner, which provides adaptive attention for global consistency and local details, respectively. Via exploiting spatial guidance from 3D human model through the framework, our method can naturally handle the ambiguity of self-occlusions which always causes artifacts in previous methods. With semantic-aware and spatial-aware representations, our method outperforms previous approaches quantitatively and qualitatively in better handling self-occlusions, fine detail preservation/synthesis and a higher resolution result.


Asunto(s)
Algoritmos , Semántica , Humanos
6.
IEEE Trans Image Process ; 30: 2888-2897, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33539298

RESUMEN

In this paper, we propose a new method to super-resolve low resolution human body images by learning efficient multi-scale features and exploiting useful human body prior. Specifically, we propose a lightweight multi-scale block (LMSB) as basic module of a coherent framework, which contains an image reconstruction branch and a prior estimation branch. In the image reconstruction branch, the LMSB aggregates features of multiple receptive fields so as to gather rich context information for low-to-high resolution mapping. In the prior estimation branch, we adopt the human parsing maps and nonsubsampled shearlet transform (NSST) sub-bands to represent the human body prior, which is expected to enhance the details of reconstructed human body images. When evaluated on the newly collected HumanSR dataset, our method outperforms state-of-the-art image super-resolution methods with  âˆ¼ 8× fewer parameters; moreover, our method significantly improves the performance of human image analysis tasks (e.g. human parsing and pose estimation) for low-resolution inputs.


Asunto(s)
Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador/métodos , Humanos , Postura/fisiología
7.
IEEE Trans Image Process ; 30: 907-920, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33259297

RESUMEN

Person re-identification aims to identify whether pairs of images belong to the same person or not. This problem is challenging due to large differences in camera views, lighting and background. One of the mainstream in learning CNN features is to design loss functions which reinforce both the class separation and intra-class compactness. In this paper, we propose a novel Orthogonal Center Learning method with Subspace Masking for person re-identification. We make the following contributions: 1) we develop a center learning module to learn the class centers by simultaneously reducing the intra-class differences and inter-class correlations by orthogonalization; 2) we introduce a subspace masking mechanism to enhance the generalization of the learned class centers; and 3) we propose to integrate the average pooling and max pooling in a regularizing manner that fully exploits their powers. Extensive experiments show that our proposed method consistently outperforms the state-of-the-art methods on large-scale ReID datasets including Market-1501, DukeMTMC-ReID, CUHK03 and MSMT17.

8.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2449-2462, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31995475

RESUMEN

We present an algorithm to directly solve numerous image restoration problems (e.g., image deblurring, image dehazing, and image deraining). These problems are ill-posed, and the common assumptions for existing methods are usually based on heuristic image priors. In this paper, we show that these problems can be solved by generative models with adversarial learning. However, a straightforward formulation based on a straightforward generative adversarial network (GAN) does not perform well in these tasks, and some structures of the estimated images are usually not preserved well. Motivated by an interesting observation that the estimated results should be consistent with the observed inputs under the physics models, we propose an algorithm that guides the estimation process of a specific task within the GAN framework. The proposed model is trained in an end-to-end fashion and can be applied to a variety of image restoration and low-level vision problems. Extensive experiments demonstrate that the proposed method performs favorably against state-of-the-art algorithms.

9.
IEEE Trans Pattern Anal Mach Intell ; 42(1): 232-245, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30281438

RESUMEN

While conventional calibrated photometric stereo methods assume that light intensities and sensor exposures are known or unknown but identical across observed images, this assumption easily breaks down in practical settings due to individual light bulb's characteristics and limited control over sensors. This paper studies the effect of unknown and possibly non-uniform light intensities and sensor exposures among observed images on the shape recovery based on photometric stereo. This leads to the development of a "semi-calibrated" photometric stereo method, where the light directions are known but light intensities (and sensor exposures) are unknown. We show that the semi-calibrated photometric stereo becomes a bilinear problem, whose general form is difficult to solve, but in the photometric stereo context, there exists a unique solution for the surface normal and light intensities (or sensor exposures). We further show that there exists a linear solution method for the problem, and develop efficient and stable solution methods. The semi-calibrated photometric stereo is advantageous over conventional calibrated photometric stereo in accurate determination of surface normal, because it relaxes the assumption of known light intensity ratios/sensor exposures. The experimental results show superior accuracy of the semi-calibrated photometric stereo in comparison to conventional methods in practical settings.

10.
IEEE Trans Image Process ; 28(3): 1054-1067, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30281457

RESUMEN

We propose a deep convolutional neural network (CNN) method for natural image matting. Our method takes multiple initial alpha mattes of the previous methods and normalized RGB color images as inputs, and directly learns an end-to-end mapping between the inputs and reconstructed alpha mattes. Among the various existing methods, we focus on using two simple methods as initial alpha mattes: the closed-form matting and KNN matting. They are complementary to each other in terms of local and nonlocal principles. A major benefit of our method is that it can "recognize" different local image structures and then combine the results of local (closed-form matting) and nonlocal (KNN matting) mattings effectively to achieve higher quality alpha mattes than both of the inputs. Furthermore, we verify extendability of the proposed network to different combinations of initial alpha mattes from more advanced techniques such as KL divergence matting and information-flow matting. On the top of deep CNN matting, we build an RGB guided JPEG artifacts removal network to handle JPEG block artifacts in alpha matting. Extensive experiments demonstrate that our proposed deep CNN matting produces visually and quantitatively high-quality alpha mattes. We perform deeper experiments including studies to evaluate the importance of balancing training data and to measure the effects of initial alpha mattes and also consider results from variant versions of the proposed network to analyze our proposed DCNN matting. In addition, our method achieved high ranking in the public alpha matting evaluation dataset in terms of the sum of absolute differences, mean squared errors, and gradient errors. Also, our RGB guided JPEG artifacts removal network restores the damaged alpha mattes from compressed images in JPEG format.

11.
IEEE Trans Pattern Anal Mach Intell ; 41(2): 297-310, 2019 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-29994179

RESUMEN

One of the core applications of light field imaging is depth estimation. To acquire a depth map, existing approaches apply a single photo-consistency measure to an entire light field. However, this is not an optimal choice because of the non-uniform light field degradations produced by limitations in the hardware design. In this paper, we introduce a pipeline that automatically determines the best configuration for photo-consistency measure, which leads to the most reliable depth label from the light field. We analyzed the practical factors affecting degradation in lenslet light field cameras, and designed a learning based framework that can retrieve the best cost measure and optimal depth label. To enhance the reliability of our method, we augmented an existing light field benchmark to simulate realistic source dependent noise, aberrations, and vignetting artifacts. The augmented dataset was used for the training and validation of the proposed approach. Our method was competitive with several state-of-the-art methods for the benchmark and real-world light field datasets.

12.
IEEE Trans Pattern Anal Mach Intell ; 40(7): 1599-1610, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-28796612

RESUMEN

Recent advances in saliency detection have utilized deep learning to obtain high-level features to detect salient regions in scenes. These advances have yielded results superior to those reported in past work, which involved the use of hand-crafted low-level features for saliency detection. In this paper, we propose ELD-Net, a unified deep learning framework for accurate and efficient saliency detection. We show that hand-crafted features can provide complementary information to enhance saliency detection that uses only high-level features. Our method uses both low-level and high-level features for saliency detection. High-level features are extracted using GoogLeNet, and low-level features evaluate the relative importance of a local region using its differences from other regions in an image. The two feature maps are independently encoded by the convolutional and the ReLU layers. The encoded low-level and high-level features are then combined by concatenation and convolution. Finally, a linear fully connected layer is used to evaluate the saliency of a queried region. A full resolution saliency map is obtained by querying the saliency of each local region of an image. Since the high-level features are encoded at low resolution, and the encoded high-level features can be reused for every query region, our ELD-Net is very fast. Our experiments show that our method outperforms state-of-the-art deep learning-based saliency detection methods.

13.
IEEE Trans Pattern Anal Mach Intell ; 40(2): 376-391, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-28278459

RESUMEN

Rank minimization can be converted into tractable surrogate problems, such as Nuclear Norm Minimization (NNM) and Weighted NNM (WNNM). The problems related to NNM, or WNNM, can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT), or Weighted SVT, but they suffer from high computational cost of Singular Value Decomposition (SVD) at each iteration. We propose a fast and accurate approximation method for SVT, that we call fast randomized SVT (FRSVT), with which we avoid direct computation of SVD. The key idea is to extract an approximate basis for the range of the matrix from its compressed matrix. Given the basis, we compute partial singular values of the original matrix from the small factored matrix. In addition, by developping a range propagation method, our method further speeds up the extraction of approximate basis at each iteration. Our theoretical analysis shows the relationship between the approximation bound of SVD and its effect to NNM via SVT. Along with the analysis, our empirical results quantitatively and qualitatively show that our approximation rarely harms the convergence of the host algorithms. We assess the efficiency and accuracy of the proposed method on various computer vision problems, e.g., subspace clustering, weather artifact removal, and simultaneous multi-image alignment and rectification.

14.
IEEE Trans Pattern Anal Mach Intell ; 29(9): 1520-37, 2007 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-17627041

RESUMEN

We propose an automatic approach to soft color segmentation, which produces soft color segments with appropriate amount of overlapping and transparency essential to synthesizing natural images for a wide range of image-based applications. While many state-of-the-art and complex techniques are excellent at partitioning an input image to facilitate deriving a semantic description of the scene, to achieve seamless image synthesis, we advocate to a segmentation approach designed to maintain spatial and color coherence among soft segments while preserving discontinuities, by assigning to each pixel a set of soft labels corresponding to their respective color distributions. We optimize a global objective function which simultaneously exploits the reliability given by global color statistics and flexibility of local image compositing, leading to an image model where the global color statistics of an image is represented by a Gaussian Mixture Model (GMM), while the color of a pixel is explained by a local color mixture model where the weights are defined by the soft labels to the elements of the converged GMM. Transparency is naturally introduced in our probabilistic framework which infers an optimal mixture of colors at an image pixel. To adequately consider global and local information in the same framework, an alternating optimization scheme is proposed to iteratively solve for the global and local model parameters. Our method is fully automatic, and is shown to converge to a good optimal solution. We perform extensive evaluation and comparison, and demonstrate that our method achieves good image synthesis results for image-based applications such as image matting, color transfer, image deblurring, and image colorization.


Asunto(s)
Algoritmos , Inteligencia Artificial , Color , Colorimetría/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
15.
IEEE Trans Pattern Anal Mach Intell ; 39(8): 1591-1604, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-28113654

RESUMEN

We propose a robust uncalibrated multiview photometric stereo method for high quality 3D shape reconstruction. In our method, a coarse initial 3D mesh obtained using a multiview stereo method is projected onto a 2D planar domain using a planar mesh parameterization technique. We describe methods for surface normal estimation that work in the parameterized 2D space that jointly incorporates all geometric and photometric cues from multiple viewpoints. Using an estimated surface normal map, a refined 3D mesh is then recovered by computing an optimal displacement map in the same 2D planar domain. Our method avoids the need of merging view-dependent surface normal maps that is often required in conventional methods. We conduct evaluation on various real-world objects containing surfaces with specular reflections, multiple albedos, and complex topologies in both controlled and uncontrolled settings and demonstrate that accurate 3D meshes with fine geometric details can be recovered by our method.

16.
IEEE Trans Pattern Anal Mach Intell ; 28(5): 832-9, 2006 May.
Artículo en Inglés | MEDLINE | ID: mdl-16640269

RESUMEN

This paper presents a complete system capable of synthesizing a large number of pixels that are missing due to occlusion or damage in an uncalibrated input video. These missing pixels may correspond to the static background or cyclic motions of the captured scene. Our system employs user-assisted video layer segmentation, while the main processing in video repair is fully automatic. The input video is first decomposed into the color and illumination videos. The necessary temporal consistency is maintained by tensor voting in the spatio-temporal domain. Missing colors and illumination of the background are synthesized by applying image repairing. Finally, the occluded motions are inferred by spatio-temporal alignment of collected samples at multiple scales. We experimented on our system with some difficult examples with variable illumination, where the capturing camera can be stationary or in motion.


Asunto(s)
Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Iluminación , Reconocimiento de Normas Patrones Automatizadas/métodos , Fotometría/métodos , Grabación en Video/métodos , Almacenamiento y Recuperación de la Información/métodos , Movimiento (Física) , Oscilometría/métodos , Fotograbar/métodos , Técnica de Sustracción
17.
IEEE Trans Image Process ; 25(1): 9-23, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26529764

RESUMEN

In this paper, we introduce a novel approach to automatically detect salient regions in an image. Our approach consists of global and local features, which complement each other to compute a saliency map. The first key idea of our work is to create a saliency map of an image by using a linear combination of colors in a high-dimensional color space. This is based on an observation that salient regions often have distinctive colors compared with backgrounds in human perception, however, human perception is complicated and highly nonlinear. By mapping the low-dimensional red, green, and blue color to a feature vector in a high-dimensional color space, we show that we can composite an accurate saliency map by finding the optimal linear combination of color coefficients in the high-dimensional color space. To further improve the performance of our saliency estimation, our second key idea is to utilize relative location and color contrast between superpixels as features and to resolve the saliency estimation from a trimap via a learning-based algorithm. The additional local features and learning-based algorithm complement the global estimation from the high-dimensional color transform-based algorithm. The experimental results on three benchmark datasets show that our approach is effective in comparison with the previous state-of-the-art saliency estimation methods.

18.
IEEE Trans Image Process ; 25(8): 3639-3654, 2016 08.
Artículo en Inglés | MEDLINE | ID: mdl-28113552

RESUMEN

This paper presents an automatic method to extract a multi-view object in a natural environment. We assume that the target object is bounded by the convex volume of interest defined by the overlapping space of camera viewing frustums. There are two key contributions of our approach. First, we present an automatic method to identify a target object across different images for multi-view binary co-segmentation. The extracted target object shares the same geometric representation in space with a distinctive color and texture model from the background. Second, we present an algorithm to detect color ambiguous regions along the object boundary for matting refinement. Our matting region detection algorithm is based on information theory, which measures the Kullback-Leibler (KL) divergence of local color distribution of different pixel-bands. The local pixel-band with the largest entropy is selected for matte refinement, subject to the multi-view consistent constraint. Our results are highquality alpha mattes consistent across all different viewpoints. We demonstrate the effectiveness of the proposed method using various examples.

19.
IEEE Trans Pattern Anal Mach Intell ; 38(4): 744-58, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26353362

RESUMEN

Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers. In many low-level vision problems, not only it is known that the underlying structure of clean data is low-rank, but the exact rank of clean data is also known. Yet, when applying conventional rank minimization for those problems, the objective function is formulated in a way that does not fully utilize a priori target rank information about the problems. This observation motivates us to investigate whether there is a better alternative solution when using rank minimization. In this paper, instead of minimizing the nuclear norm, we propose to minimize the partial sum of singular values, which implicitly encourages the target rank constraint. Our experimental analyses show that, when the number of samples is deficient, our approach leads to a higher success rate than conventional rank minimization, while the solutions obtained by the two approaches are almost identical when the number of samples is more than sufficient. We apply our approach to various low-level vision problems, e.g., high dynamic range imaging, motion edge detection, photometric stereo, image alignment and recovery, and show that our results outperform those obtained by the conventional nuclear norm rank minimization method.

20.
IEEE Trans Pattern Anal Mach Intell ; 37(6): 1219-32, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26357344

RESUMEN

This paper introduces a new high dynamic range (HDR) imaging algorithm which utilizes rank minimization. Assuming a camera responses linearly to scene radiance, the input low dynamic range (LDR) images captured with different exposure time exhibit a linear dependency and form a rank-1 matrix when stacking intensity of each corresponding pixel together. In practice, misalignments caused by camera motion, presences of moving objects, saturations and image noise break the rank-1 structure of the LDR images. To address these problems, we present a rank minimization algorithm which simultaneously aligns LDR images and detects outliers for robust HDR generation. We evaluate the performances of our algorithm systematically using synthetic examples and qualitatively compare our results with results from the state-of-the-art HDR algorithms using challenging real world examples.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA