Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Sensors (Basel) ; 20(23)2020 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-33255622

RESUMEN

Removing raindrops from a single image is a challenging problem due to the complex changes in shape, scale, and transparency among raindrops. Previous explorations have mainly been limited in two ways. First, publicly available raindrop image datasets have limited capacity in terms of modeling raindrop characteristics (e.g., raindrop collision and fusion) in real-world scenes. Second, recent deraining methods tend to apply shape-invariant filters to cope with diverse rainy images and fail to remove raindrops that are especially varied in shape and scale. In this paper, we address these raindrop removal problems from two perspectives. First, we establish a large-scale dataset named RaindropCityscapes, which includes 11,583 pairs of raindrop and raindrop-free images, covering a wide variety of raindrops and background scenarios. Second, a two-branch Multi-scale Shape Adaptive Network (MSANet) is proposed to detect and remove diverse raindrops, effectively filtering the occluded raindrop regions and keeping the clean background well-preserved. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art raindrop removal methods. Moreover, the extension of our method towards the rainy image segmentation and detection tasks validates the practicality of the proposed method in outdoor applications.

2.
Artículo en Inglés | MEDLINE | ID: mdl-37027594

RESUMEN

Recently, contrastive learning based on augmentation invariance and instance discrimination has made great achievements, owing to its excellent ability to learn beneficial representations without any manual annotations. However, the natural similarity among instances conflicts with instance discrimination which treats each instance as a unique individual. In order to explore the natural relationship among instances and integrate it into contrastive learning, we propose a novel approach in this paper, Relationship Alignment (RA for abbreviation), which forces different augmented views of current batch instances to main a consistent relationship with other instances. In order to perform RA effectively in existing contrastive learning framework, we design an alternating optimization algorithm where the relationship exploration step and alignment step are optimized respectively. In addition, we add an equilibrium constraint for RA to avoid the degenerate solution, and introduce the expansion handler to make it approximately satisfied in practice. In order to better capture the complex relationship among instances, we additionally propose Multi-Dimensional Relationship Alignment (MDRA for abbreviation), which aims to explore the relationship from multiple dimensions. In practice, we decompose the final high-dimensional feature space into a cartesian product of several low-dimensional subspaces and perform RA in each subspace respectively. We validate the effectiveness of our approach on multiple self-supervised learning benchmarks and get consistent improvements compared with current popular contrastive learning methods. On the most commonly used ImageNet linear evaluation protocol, our RA obtains significant improvements over other methods, our MDRA gets further improvements based on RA to achieve the best performance. The source code of our approach will be released soon.

3.
IEEE Trans Image Process ; 30: 9136-9149, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34735342

RESUMEN

Due to the lack of natural scene and haze prior information, it is greatly challenging to completely remove the haze from a single image without distorting its visual content. Fortunately, the real-world haze usually presents non-homogeneous distribution, which provides us with many valuable clues in partial well-preserved regions. In this paper, we propose a Non-Homogeneous Haze Removal Network (NHRN) via artificial scene prior and bidimensional graph reasoning. Firstly, we employ the gamma correction iteratively to simulate artificial multiple shots under different exposure conditions, whose haze degrees are different and enrich the underlying scene prior. Secondly, beyond utilizing the local neighboring relationship, we build a bidimensional graph reasoning module to conduct non-local filtering in the spatial and channel dimensions of feature maps, which models their long-range dependency and propagates the natural scene prior between the well-preserved nodes and the nodes contaminated by haze. To the best of our knowledge, this is the first exploration to remove non-homogeneous haze via the graph reasoning based framework. We evaluate our method on different benchmark datasets. The results demonstrate that our method achieves superior performance over many state-of-the-art algorithms for both the single image dehazing and hazy image understanding tasks. The source code of the proposed NHRN is available on https://github.com/whrws/NHRNet.

4.
Neural Netw ; 125: 281-289, 2020 May.
Artículo en Inglés | MEDLINE | ID: mdl-32151915

RESUMEN

Rectified activation units make an important contribution to the success of deep neural networks in many computer vision tasks. In this paper, we propose a Parametric Deformable Exponential Linear Unit (PDELU) and theoretically verify its effectiveness for improving the convergence speed of learning procedure. By means of flexible map shape, the proposed PDELU could push the mean value of activation responses closer to zero, which ensures the steepest descent in training a deep neural network. We verify the effectiveness of the proposed method in the image classification task. Extensive experiments on three classical databases (i.e., CIFAR-10, CIFAR-100, and ImageNet-2015) indicate that the proposed method leads to higher convergence speed and better accuracy when it is embedded into different CNN architectures (i.e., NIN, ResNet, WRN, and DenseNet). Meanwhile, the proposed PDELU outperforms many existing shape-specific activation functions (i.e., Maxout, ReLU, LeakyReLU, ELU, SELU, SoftPlus, Swish) and the shape-adaptive activation functions (i.e., APL, PReLU, MPELU, FReLU).


Asunto(s)
Aprendizaje Profundo/normas , Bases de Datos Factuales , Reconocimiento de Normas Patrones Automatizadas/métodos
5.
IEEE Trans Pattern Anal Mach Intell ; 41(8): 1994-2007, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-30369437

RESUMEN

In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations. Specifically, we introduce a statistical 3D morphable model that flexibly describes the distribution of points on the surface of the face model, with an efficient switchable online adaptation that gradually captures the identity of the tracked subject and rapidly constructs a suitable face model when the subject changes. Moreover, unlike prior art that employed ICP-based facial pose estimation, to improve robustness to occlusions, we propose a ray visibility constraint that regularizes the pose based on the face model's visibility with respect to the input point cloud. Ablation studies and experimental results on Biwi and ICT-3DHP datasets demonstrate that the proposed framework is effective and outperforms completing state-of-the-art depth-based methods.

7.
IEEE Trans Pattern Anal Mach Intell ; 38(9): 1922-8, 2016 09.
Artículo en Inglés | MEDLINE | ID: mdl-26584487

RESUMEN

We propose a real-time method to accurately track the human head pose in the 3-dimensional (3D) world. Using a RGB-Depth camera, a face template is reconstructed by fitting a 3D morphable face model, and the head pose is determined by registering this user-specific face template to the input depth video.

8.
IEEE Trans Image Process ; 25(7): 2943-2955, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27093622

RESUMEN

Video quality fluctuation plays a significant role in human visual perception, and hence, many rate control approaches have been widely developed to maintain consistent quality for video communication. This paper presents a novel rate control framework based on the Lagrange multiplier in high-efficiency video coding. With the assumption of constant quality control, a new relationship between the distortion and the Lagrange multiplier is established. Based on the proposed distortion model and buffer status, we obtain a computationally feasible solution to the problem of minimizing the distortion variation across video frames at the coding tree unit level. Extensive simulation results show that our method outperforms the rate control used in HEVC Test Model (HM) by providing a more accurate rate regulation, lower video quality fluctuation, and stabler buffer fullness. The average peak signal-to-noise ratio (PSNR) and PSNR deviation improvements are about 0.37 dB and 57.14% in the low-delay (P and B) video communication, where the complexity overhead is  âˆ¼ 4.44% .

9.
IEEE Trans Image Process ; 24(12): 5033-45, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26316130

RESUMEN

A visual quality evaluation of image object segmentation as one member of the visual quality evaluation family has been studied over the years. Researchers aim at developing the objective measures that can evaluate the visual quality of object segmentation results in agreement with human quality judgments. It is also significant to construct a platform for evaluating the performance of the objective measures in order to analyze their pros and cons. In this paper, first, we present a novel subjective object segmentation visual quality database, in which a total of 255 segmentation results were evaluated by more than thirty human subjects. Then, we propose a novel full-reference objective measure for an object segmentation visual quality evaluation, which involves four human visual properties. Finally, our measure is compared with some state-of-the-art objective measures on our database. The experiment demonstrates that the proposed measure performs better in matching subjective judgments. Moreover, the database is available publicly for other researchers in the field to evaluate their measures.


Asunto(s)
Bases de Datos Factuales/clasificación , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Bases de Datos Factuales/normas , Humanos , Procesamiento de Imagen Asistido por Computador/normas
10.
IEEE Trans Image Process ; 24(7): 2197-211, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25823037

RESUMEN

In this paper, we propose a new method to online enhance the quality of a depth video based on the intermediary of a so-called static structure of the captured scene. The static and dynamic regions of the input depth frame are robustly separated by a layer assignment procedure, in which the dynamic part stays in the front while the static part fits and helps to update this structure by a novel online variational generative model with added spatial refinement. The dynamic content is enhanced spatially while the static region is otherwise substituted by the updated static structure so as to favor the long-range spatiotemporal enhancement. The proposed method both performs long-range temporal consistency on the static region and keeps necessary depth variations in the dynamic content. Thus, it can produce flicker-free and spatially optimized depth videos with reduced motion blur and depth distortion. Our experimental results reveal that the proposed method is effective in both static and dynamic indoor scenes and is compatible with depth videos captured by Kinect and time-of-flight camera. We also demonstrate that excellent performance can be achieved by the proposed method in comparison with the existing spatiotemporal approaches. In addition, our enhanced depth videos and static structures can act as effective cues to improve various applications, including depth-aided background subtraction and novel view synthesis, showing satisfactory results with few visual artifacts.


Asunto(s)
Algoritmos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Fotograbar/métodos , Grabación en Video/métodos , Sistemas en Línea , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Interfaz Usuario-Computador
11.
IEEE Trans Image Process ; 22(7): 2876-88, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23412619

RESUMEN

Local invariant features have been successfully used in image matching to cope with viewpoint change, partial occlusion, and clutters. However, when these factors become too strong, there will be a lot of mismatches due to the limited repeatability and discriminative power of features. In this paper, we present an efficient approach to remove the false matches and propagate the correct ones for the affine invariant features which represent the state-of-the-art local invariance. First, a pair-wise affine consistency measure is proposed to evaluate the consensus of the matches of affine invariant regions. The measure takes into account both the keypoint location and the region shape, size, and orientation. Based on this measure, a geometric filter is then presented which can efficiently remove the outliers from the initial matches, and is robust to severe clutters and non-rigid deformation. To increase the correct matches, we propose a global match refinement and propagation method that simultaneously finds a optimal group of local affine transforms to relate the features in two images. The global method is capable of producing a quasi-dense set of matches even for the weakly textured surfaces that suffer strong rigid transformation or non-rigid deformation. The strong capability of the proposed method in dealing with significant viewpoint change, non-rigid deformation, and low-texture objects is demonstrated in experiments of image matching, object recognition, and image based rendering.

12.
IEEE Trans Image Process ; 22(4): 1536-47, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23247855

RESUMEN

Modeling subjective opinions on visual quality is a challenging problem, which closely relates to many factors of the human perception. In this paper, the additive log-logistic model (ALM) is proposed to formulate such a multidimensional nonlinear problem. The log-logistic model has flexible monotonic or nonmonotonic partial derivatives and thus is suitable to model various uni-type impairments. The proposed ALM metric adds the distortions due to each type of impairment in a log-logistic transformed space of subjective opinions. The features can be evaluated and selected by classic statistical inference, and the model parameters can be easily estimated. Cross validations on five Telecommunication Standardization Sector of International Telecommunication Union (ITU-T) subjectively-rated databases confirm that: 1) based on the same features, the ALM outperforms the support vector regression and the logistic model in quality prediction and, 2) the resultant no-reference quality met-ric based on impairment-relevant video parameters achieves high correlation with a total of 27 216 subjective opinions on 1134 video clips, even compared with existing full-reference quality metrics based on pixel differences. The ALM metric wins the model competition of the ITU-T Study Group 12 (where the validation databases are independent with the training databases) and thus is being put forth into ITU-T Recommendation P.1202.2 for the consent of ITU-T.

13.
IEEE Trans Cybern ; 43(2): 725-37, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22997272

RESUMEN

The design of robust and efficient cosegmentation algorithms is challenging because of the variety and complexity of the objects and images. In this paper, we propose a new cosegmentation model by incorporating a color reward strategy and an active contour model. A new energy function corresponding to the curve is first generated with two considerations: the foreground similarity between the image pairs and the background consistency in each of the image pair. Furthermore, a new foreground similarity measurement based on the rewarding strategy is proposed. Then, we minimize the energy function value via a mutual procedure which uses dynamic priors to mutually evolve the curves. The proposed method is evaluated on many images from commonly used databases. The experimental results demonstrate that the proposed model can efficiently segment the common objects from the image pairs with generally lower error rate than many existing and conventional cosegmentation methods.

14.
IEEE Trans Image Process ; 22(12): 4809-24, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23955762

RESUMEN

In this paper, we propose a novel feature adaptive co-segmentation method that can learn adaptive features of different image groups for accurate common objects segmentation. We also propose image complexity awareness for adaptive feature learning. In the proposed method, the original images are first ranked according to the image complexities that are measured by superpixel changing cue and object detection cue. Then, the unsupervised segments of the simple images are used to learn the adaptive features, which are achieved using an expectation-minimization algorithm combining l 1-regularized least squares optimization with the consideration of the confidence of the simple image segmentation accuracies and the fitness of the learned model. The error rate of the final co-segmentation is tested by the experiments on different image groups and verified to be lower than the existing state-of-the-art co-segmentation methods.

15.
IEEE Trans Image Process ; 20(11): 3308-13, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21659028

RESUMEN

In this paper, we present a multiview approach to segment the foreground objects consisting of a group of people into individual human objects and track them across the video sequence. Depth and occlusion information recovered from multiple views of the scene is integrated into the object detection, segmentation, and tracking processes. Adaptive background penalty with occlusion reasoning is proposed to separate the foreground regions from the background in the initial frame. Multiple cues are employed to segment individual human objects from the group. To propagate the segmentation through video, each object region is independently tracked by motion compensation and uncertainty refinement, and the motion occlusion is tackled as layer transition. The experimental results implemented on both our sequences and other's sequence have demonstrated the algorithm's efficiency in terms of subjective performance. Objective comparison with a state-of-the-art algorithm validates the superior performance of our method quantitatively.

16.
IEEE Trans Image Process ; 20(12): 3365-75, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21606024

RESUMEN

In this paper, we introduce a method to detect co-saliency from an image pair that may have some objects in common. The co-saliency is modeled as a linear combination of the single-image saliency map (SISM) and the multi-image saliency map (MISM). The first term is designed to describe the local attention, which is computed by using three saliency detection techniques available in literature. To compute the MISM, a co-multilayer graph is constructed by dividing the image pair into a spatial pyramid representation. Each node in the graph is described by two types of visual descriptors, which are extracted from a representation of some aspects of local appearance, e.g., color and texture properties. In order to evaluate the similarity between two nodes, we employ a normalized single-pair SimRank algorithm to compute the similarity score. Experimental evaluation on a number of image pairs demonstrates the good performance of the proposed method on the co-saliency detection task.

17.
IEEE Trans Image Process ; 20(6): 1627-40, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21224174

RESUMEN

Most existing feature detectors assume no surface discontinuity within the keypoints' support regions and, hence, have little chance to match the keypoints located on or near the surface boundaries. These keypoints, though not many, are salient and representative. In this paper, we show that they can be successfully matched by using the proposed scale- and affine-invariant Fan features. Specifically, the image neighborhood of a keypoint is depicted by multiple fan subregions, namely Fan features, to provide robustness to surface discontinuity and background change. These Fan features are made scale-invariant by using the automatic scale selection method based on the Fan Laplacian of Gaussian (FLOG). Affine invariance is further introduced to the Fan features based on the affine shape diagnosis of the mirror-predicted surface patch. The Fan features are then described by Fan-SIFT, which is an extension of the famous scale-invariant feature transform (SIFT) descriptor. Experimental results of quantitative comparisons show that the proposed Fan feature has good repeatability that is comparable to the state-of-the-art features for general structured scenes. Moreover, by using Fan features, we can successfully match image structures near surface discontinuities despite significant scale, viewpoint, and background changes. These structures are complementary to those found by the traditional methods and are especially useful for describing weakly textured scenes, which is demonstrated in our experiments on image matching and object rendering.


Asunto(s)
Algoritmos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
18.
IEEE Trans Image Process ; 20(8): 2110-21, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21324781

RESUMEN

Because of the outstanding contribution in improving compression efficiency, block-based quantization has been widely accepted in state-of-the-art image/video coding standards. However, false contour artifacts are introduced, which result in reducing the fidelity of the decoded image/video especially in terms of subjective quality. In this paper, a block-based decontouring method is proposed to reduce the false contour artifacts in the decoded image/video by automatically dithering its direct current (DC) value according to a composite model established between gradient smoothness and block-edge smoothness. Feature points on the model with the corresponding criteria in suppressing contour artifacts are compared to show a good consistency between the model and the actual processing effects. Discrete cosine transform (DCT)-based block level contour artifacts detection mechanism ensures the blocks within the texture region are not affected by the DC dithering. Both the implementation method and the algorithm complexity are analyzed to present the feasibility in integrating the proposed method into an existing video decoder on an embedded platform or system-on-chip (SoC). Experimental results demonstrate the effectiveness of the proposed method both in terms of subjective quality and processing complexity in comparison with the previous methods.

19.
IEEE Trans Image Process ; 20(11): 3207-18, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21518660

RESUMEN

Efficient image watermarking calls for full exploitation of the perceptual distortion constraint. Second-order statistics of visual stimuli are regarded as critical features for perception. This paper proposes a second-order statistics (SOS)-based image quality metric, which considers the texture masking effect and the contrast sensitivity in Karhunen-Loève transform domain. Compared with the state-of-the-art metrics, the quality prediction by SOS better correlates with several subjectively rated image databases, in which the images are impaired by the typical coding and watermarking artifacts. With the explicit metric definition, spread spectrum watermarking is posed as an optimization problem: we search for a watermark to minimize the distortion of the watermarked image and to maximize the correlation between the watermark pattern and the spread spectrum carrier. The simple metric guarantees the optimal watermark a closed-form solution and a fast implementation. The experiments show that the proposed watermarking scheme can take full advantage of the distortion constraint and improve the robustness in return.


Asunto(s)
Compresión de Datos/métodos , Etiquetado de Productos/métodos , Percepción Visual , Algoritmos , Sensibilidad de Contraste , Bases de Datos Factuales , Interpretación de Imagen Asistida por Computador , Reconocimiento de Normas Patrones Automatizadas/métodos , Enmascaramiento Perceptual , Análisis de Ondículas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA