Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38607717

ABSTRACT

Photometric stereo recovers the surface normals of an object from multiple images with varying shading cues, i.e., modeling the relationship between surface orientation and intensity at each pixel. Photometric stereo prevails in superior per-pixel resolution and fine reconstruction details. However, it is a complicated problem because of the non-linear relationship caused by non-Lambertian surface reflectance. Recently, various deep learning methods have shown a powerful ability in the context of photometric stereo against non-Lambertian surfaces. This paper provides a comprehensive review of existing deep learning-based calibrated photometric stereo methods utilizing orthographic cameras and directional light sources. We first analyze these methods from different perspectives, including input processing, supervision, and network architecture. We summarize the performance of deep learning photometric stereo models on the most widely-used benchmark data set. This demonstrates the advanced performance of deep learning-based photometric stereo methods. Finally, we give suggestions and propose future research trends based on the limitations of existing models.

2.
Article in English | MEDLINE | ID: mdl-38526900

ABSTRACT

Event camera shows great potential in 3D hand pose estimation, especially addressing the challenges of fast motion and high dynamic range in a low-power way. However, due to the asynchronous differential imaging mechanism, it is challenging to design event representation to encode hand motion information especially when the hands are not moving (causing motion ambiguity), and it is infeasible to fully annotate the temporally dense event stream. In this paper, we propose EvHandPose with novel hand flow representations in Event-to-Pose module for accurate hand pose estimation and alleviating the motion ambiguity issue. To solve the problem under sparse annotation, we design contrast maximization and hand-edge constraints in Pose-to-IWE (Image with Warped Events) module and formulate EvHandPose in a weakly-supervision framework. We further build EvRealHands, the first large-scale real-world event-based hand pose dataset on several challenging scenes to bridge the real-synthetic domain gap. Experiments on EvRealHands demonstrate that EvHandPose outperforms previous event-based methods under all evaluation scenes, achieves accurate and stable hand pose estimation with high temporal resolution in fast motion and strong light scenes compared with RGB-based methods, generalizes well to outdoor scenes and another type of event camera, and shows the potential for the hand gesture recognition task.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(4): 2285-2298, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37938939

ABSTRACT

A number of advanced image editing technologies have demonstrated impressive performance in synthesizing visually pleasing results in accordance with user instructions. In this paper, we further extend the practicalities of image editing technology by proposing the conditional image repainting (CIR) task, which requires the model to synthesize realistic visual content based on multiple cross-modality conditions provided by the user. We first define condition inputs and formulate two-phased CIR models as the baseline. After that, we further design unified CIR models with novel condition fusion modules to improve the performance. For allowing users to express their intent more freely, our CIR models support both attributes and language to represent colors of repainted visual content. We demonstrate the effectiveness of CIR models by collecting and processing four datasets. Finally, we present a number of practical application scenarios of CIR models to demonstrate its usability.

4.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 1079-1092, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37903053

ABSTRACT

This paper proposes a novel pipeline to estimate a non-parametric environment map with high dynamic range from a single human face image. Lighting-independent and -dependent intrinsic images of the face are first estimated separately in a cascaded network. The influence of face geometry on the two lighting-dependent intrinsics, diffuse shading and specular reflection, are further eliminated by distributing the intrinsics pixel-wise onto spherical representations using the surface normal as indices. This results in two representations simulating images of a diffuse sphere and a glossy sphere under the input scene lighting. Taking into account the distinctive nature of light sources and ambient terms, we further introduce a two-stage lighting estimator to predict both accurate and realistic lighting from these two representations. Our model is trained supervisedly on a large-scale and high-quality synthetic face image dataset. We demonstrate that our method allows accurate and detailed lighting estimation and intrinsic decomposition, outperforming state-of-the-art methods both qualitatively and quantitatively on real face images.

5.
Article in English | MEDLINE | ID: mdl-37922172

ABSTRACT

In this paper, we propose a novel method, namely GR-PSN, which learns surface normals from photometric stereo images and generates the photometric images under distant illumination from different lighting directions and surface materials. The framework is composed of two subnetworks, named GeometryNet and ReconstructNet, which are cascaded to perform shape reconstruction and image rendering in an end-to-end manner. ReconstructNet introduces additional supervision for surface-normal recovery, forming a closed-loop structure with GeometryNet. We also encode lighting and surface reflectance in ReconstructNet, to achieve arbitrary rendering. In training, we set up a parallel framework to simultaneously learn two arbitrary materials for an object, providing an additional transform loss. Therefore, our method is trained based on the supervision by three different loss functions, namely the surface-normal loss, reconstruction loss, and transform loss. We alternately input the predicted surface-normal map and the ground-truth into ReconstructNet, to achieve stable training for ReconstructNet. Experiments show that our method can accurately recover the surface normals of an object with an arbitrary number of inputs, and can re-render images of the object with arbitrary surface materials. Extensive experimental results show that our proposed method outperforms those methods based on a single surface recovery network and shows realistic rendering results on 100 different materials. Our code can be found in https://github.com/Kelvin-Ju/GR-PSN.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15219-15232, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37578915

ABSTRACT

Neuromorphic cameras are emerging imaging technology that has advantages over conventional imaging sensors in several aspects including dynamic range, sensing latency, and power consumption. However, the signal-to-noise level and the spatial resolution still fall behind the state of conventional imaging sensors. In this article, we address the denoising and super-resolution problem for modern neuromorphic cameras. We employ 3D U-Net as the backbone neural architecture for such a task. The networks are trained and tested on two types of neuromorphic cameras: a dynamic vision sensor and a spike camera. Their pixels generate signals asynchronously, the former is based on perceived light changes and the latter is based on accumulated light intensity. To collect the datasets for training such networks, we design a display-camera system to record high frame-rate videos at multiple resolutions, providing supervision for denoising and super-resolution. The networks are trained in a noise-to-noise fashion, where the two ends of the network are unfiltered noisy data. The output of the networks has been tested for downstream applications including event-based visual object tracking and image reconstruction. Experimental results demonstrate the effectiveness of improving the quality of neuromorphic events and spikes, and the corresponding improvement to downstream applications with state-of-the-art performance.

7.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13991-14004, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37486843

ABSTRACT

This article presents a new approach for surface normal recovery from polarization images under an unknown distant light. Polarization provides rich cues of object geometry and material, but it is also influenced by different lighting conditions. Different from previous Shape-from-Polarization (SfP) methods, which rely on handcrafted or data-driven priors, we analytically investigate the benefits of estimating distant lighting for resolving the ambiguity in normal estimation from SfP using the polarimetric Bidirectional Reflectance Distribution Function (pBRDF) based image formation model. We then propose a two-stage learning framework that first effectively exploits polarization and shading cues to estimate the reflectance and lighting information and then optimizes the initial normal as the geometric prior. Leveraging the normal prior with the polarization cues from the input images, our network further generates the surface normal with more details in the second stage. We also present a data generation pipeline derived from the pBRDF model enabling model training and create a real dataset for evaluation of SfP approaches. Extensive ablation studies show the effectiveness of our designed architecture, and our approach outperforms existing methods in quantitative and qualitative experiments on real data.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13749-13765, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37463081

ABSTRACT

With the rapid development of high-resolution 3D vision applications, the traditional way of manipulating surface detail requires considerable memory and computing time. To address these problems, we introduce an efficient surface detail processing framework in 2D normal domain, which extracts new normal feature representations as the carrier of micro geometry structures that are illustrated both theoretically and empirically in this article. Compared with the existing state of the arts, we verify and demonstrate that the proposed normal-based representation has three important properties, including detail separability, detail transferability and detail idempotence. Finally, three new schemes are further designed for geometric surface detail processing applications, including geometric texture synthesis, geometry detail transfer, and 3D surface super-resolution. Theoretical analysis and experimental results on the latest benchmark dataset verify the effectiveness and versatility of our normal-based representation, which accepts 30 times of the input surface vertices but at the same time only takes 6.5% memory cost and 14.0% running time in comparison with existing competing algorithms.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12192-12205, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37318980

ABSTRACT

In this article, we investigate the problem of panoramic image reflection removal to relieve the content ambiguity between the reflection layer and the transmission scene. Although a partial view of the reflection scene is attainable in the panoramic image and provides additional information for reflection removal, it is not trivial to directly apply this for getting rid of undesired reflections due to its misalignment with the reflection-contaminated image. We propose an end-to-end framework to tackle this problem. By resolving misalignment issues with adaptive modules, the high-fidelity recovery of reflection layer and transmission scenes is accomplished. We further propose a new data generation approach that considers the physics-based formation model of mixture images and the in-camera dynamic range clipping to diminish the domain gap between synthetic and real data. Experimental results demonstrate the effectiveness of the proposed method and its applicability for mobile devices and industrial applications.

10.
IEEE Trans Image Process ; 32: 1774-1787, 2023.
Article in English | MEDLINE | ID: mdl-37015134

ABSTRACT

Taking photos with digital cameras often accompanies saturated pixels due to their limited dynamic range, and it is far too ill-posed to restore them. Capturing multiple low dynamic range images with bracketed exposures can make the problem less ill-posed, however, it is prone to ghosting artifacts caused by spatial misalignment among images. A polarization camera can capture four spatially-aligned and temporally-synchronized polarized images with different polarizer angles in a single shot, which can be used for ghost-free high dynamic range (HDR) reconstruction. However, real-world scenarios are still challenging since existing polarization-based HDR reconstruction methods treat all pixels in the same manner and only utilize the spatially-variant exposures of the polarized images (without fully exploiting the degree of polarization (DoP) and the angle of polarization (AoP) of the incoming light to the sensor, which encode abundant structural and contextual information of the scene) to handle the problem still in an ill-posed manner. In this paper, we propose a pixel-wise depolarization strategy to solve the polarization guided HDR reconstruction problem, by classifying the pixels based on their levels of ill-posedness in HDR reconstruction procedure and applying different solutions to different classes. To utilize the strategy with better generalization ability and higher robustness, we propose a network-physics-hybrid polarization-based HDR reconstruction pipeline along with a neural network tailored to it, fully exploiting the DoP and AoP. Experimental results show that our approach achieves state-of-the-art performance on both synthetic and real-world images.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8553-8565, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37022447

ABSTRACT

Reconstruction of high dynamic range image from a single low dynamic range image captured by a conventional RGB camera, which suffers from over- or under-exposure, is an ill-posed problem. In contrast, recent neuromorphic cameras like event camera and spike camera can record high dynamic range scenes in the form of intensity maps, but with much lower spatial resolution and no color information. In this article, we propose a hybrid imaging system (denoted as NeurImg) that captures and fuses the visual information from a neuromorphic camera and ordinary images from an RGB camera to reconstruct high-quality high dynamic range images and videos. The proposed NeurImg-HDR+ network consists of specially designed modules, which bridges the domain gaps on resolution, dynamic range, and color representation between two types of sensors and images to reconstruct high-resolution, high dynamic range images and videos. We capture a test dataset of hybrid signals on various HDR scenes using the hybrid camera, and analyze the advantages of the proposed fusing strategy by comparing it to state-of-the-art inverse tone mapping methods and merging two low dynamic range images approaches. Quantitative and qualitative experiments on both synthetic data and real-world scenarios demonstrate the effectiveness of the proposed hybrid high dynamic range imaging system. Code and dataset can be found at: https://github.com/hjynwa/NeurImg-HDR.

12.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9439-9453, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37022832

ABSTRACT

Removing the undesired moiré patterns from images capturing the contents displayed on screens is of increasing research interest, as the need for recording and sharing the instant information conveyed by the screens is growing. Previous demoiréing methods provide limited investigations into the formation process of moiré patterns to exploit moiré-specific priors for guiding the learning of demoiréing models. In this paper, we investigate the moiré pattern formation process from the perspective of signal aliasing, and correspondingly propose a coarse-to-fine disentangling demoiréing framework. In this framework, we first disentangle the moiré pattern layer and the clean image with alleviated ill-posedness based on the derivation of our moiré image formation model. Then we refine the demoiréing results exploiting both the frequency domain features and edge attention, considering moiré patterns' property on spectrum distribution and edge intensity revealed in our aliasing based analysis. Experiments on several datasets show that the proposed method performs favorably against state-of-the-art methods. Besides, the proposed method is validated to adapt well to different data sources and scales, especially on the high-resolution moiré images.


Subject(s)
Algorithms , Moire Topography
13.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10129-10142, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37022867

ABSTRACT

Recently, many advances in inverse rendering are achieved by high-dimensional lighting representations and differentiable rendering. However, multi-bounce lighting effects can hardly be handled correctly in scene editing using high-dimensional lighting representations, and light source model deviation and ambiguities exist in differentiable rendering methods. These problems limit the applications of inverse rendering. In this paper, we present a multi-bounce inverse rendering method based on Monte Carlo path tracing, to enable correct complex multi-bounce lighting effects rendering in scene editing. We propose a novel light source model that is more suitable for light source editing in indoor scenes, and design a specific neural network with corresponding disambiguation constraints to alleviate ambiguities during the inverse rendering. We evaluate our method on both synthetic and real indoor scenes through virtual object insertion, material editing, relighting tasks, and so on. The results demonstrate that our method achieves better photo-realistic quality.


Subject(s)
Algorithms , Lighting , Lighting/methods , Neural Networks, Computer , Monte Carlo Method
14.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2151-2165, 2023 Feb.
Article in English | MEDLINE | ID: mdl-35344487

ABSTRACT

Undesirable reflections contained in photos taken in front of glass windows or doors often degrade visual quality of the image. Separating two layers apart benefits both human and machine perception. The polarization status of the light changes after refraction or reflection, providing more observations of the scene, which can benefit the reflection separation. Different from previous works that take three or more polarization images as input, we propose to exploit physical constraints from a pair of unpolarized and polarized images to separate reflection and transmission layers in this paper. Due to the simplified capturing setup, the system is more under-determined compared to the existing polarization-based works. In order to solve this problem, we propose to estimate the semi-reflector orientation first to make the physical image formation well-posed, and then learn to reliably separate two layers using additional networks based on both physical and numerical analysis. In addition, a motion estimation network is introduced to handle the misalignment of paired input. Quantitative and qualitative experimental results show our approach performs favorably over existing polarization and single image based solutions.

15.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 1424-1441, 2023 Feb.
Article in English | MEDLINE | ID: mdl-35439129

ABSTRACT

Reflection removal has been discussed for more than decades. This paper aims to provide the analysis for different reflection properties and factors that influence image formation, an up-to-date taxonomy for existing methods, a benchmark dataset, and the unified benchmarking evaluations for state-of-the-art (especially learning-based) methods. Specifically, this paper presents a SIngle-image Reflection Removal Plus dataset "SIR 2+ " with the new consideration for in-the-wild scenarios and glass with diverse color and unplanar shapes. We further perform quantitative and visual quality comparisons for state-of-the-art single-image reflection removal algorithms. Open problems for improving reflection removal algorithms are discussed at the end. Our dataset and follow-up update can be found at https://reflectionremoval.github.io/sir2data/.

16.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 7809-7823, 2022 Nov.
Article in English | MEDLINE | ID: mdl-34559637

ABSTRACT

This paper presents a photometric stereo method that works with unknown natural illumination without any calibration objects or initial guess of the target shape. To solve this challenging problem, we propose the use of an equivalent directional lighting model for small surface patches consisting of slowly varying normals, and solve each patch up to an arbitrary orthogonal ambiguity. We further build the patch connections by extracting consistent surface normal pairs via spatial overlaps among patches and intensity profiles. Guided by these connections, the local ambiguities are unified to a global orthogonal one through Markov Random Field optimization and rotation averaging. After applying the integrability constraint, our solution contains only a binary ambiguity, which could be easily removed. Experiments using both synthetic and real-world datasets show our method provides even comparable results to calibrated methods.

17.
Article in English | MEDLINE | ID: mdl-37015352

ABSTRACT

The delay of rendering on AR devices requires prediction of head motion using sensor data acquired tens of even one hundred milliseconds ago to avoid misalignment between the virtual content and the physical world, where the misalignment will lead to a sense of time latency and dizziness for users. To solve the problem, we propose a method for the 6DoF motion prediction to compensate for the time latency. Compared with traditional hand-crafted methods, our method is based on deep learning, which has better motion prediction ability to deal with complex human motion. In particular, we propose a MOtion UNcerTainty encode decode network (MOUNT) that estimates the uncertainty of input data and predicts the uncertainty of output motion to improve the prediction accuracy and smoothness. Experiments on the EuRoC and our collected dataset demonstrate that our method significantly outperforms the traditional method and greatly improves AR visual effects.

18.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 114-128, 2022 Jan.
Article in English | MEDLINE | ID: mdl-32750795

ABSTRACT

This article presents a photometric stereo method based on deep learning. One of the major difficulties in photometric stereo is designing an appropriate reflectance model that is both capable of representing real-world reflectances and computationally tractable for deriving surface normal. Unlike previous photometric stereo methods that rely on a simplified parametric image formation model, such as the Lambert's model, the proposed method aims at establishing a flexible mapping between complex reflectance observations and surface normal using a deep neural network. In addition, the proposed method predicts the reflectance, which allows us to understand surface materials and to render the scene under arbitrary lighting conditions. As a result, we propose a deep photometric stereo network (DPSN) that takes reflectance observations under varying light directions and infers the surface normal and reflectance in a per-pixel manner. To make the DPSN applicable to real-world scenes, a dataset of measured BRDFs (MERL BRDF dataset) has been used for training the network. Evaluation using simulation and real-world scenes shows the effectiveness of the proposed approach in estimating both surface normal and reflectances.

19.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 129-142, 2022 Jan.
Article in English | MEDLINE | ID: mdl-32750798

ABSTRACT

This paper addresses the problem of photometric stereo, in both calibrated and uncalibrated scenarios, for non-Lambertian surfaces based on deep learning. We first introduce a fully convolutional deep network for calibrated photometric stereo, which we call PS-FCN. Unlike traditional approaches that adopt simplified reflectance models to make the problem tractable, our method directly learns the mapping from reflectance observations to surface normal, and is able to handle surfaces with general and unknown isotropic reflectance. At test time, PS-FCN takes an arbitrary number of images and their associated light directions as input and predicts a surface normal map of the scene in a fast feed-forward pass. To deal with the uncalibrated scenario where light directions are unknown, we introduce a new convolutional network, named LCNet, to estimate light directions from input images. The estimated light directions and the input images are then fed to PS-FCN to determine the surface normals. Our method does not require a pre-defined set of light directions and can handle multiple images in an order-agnostic manner. Thorough evaluation of our approach on both synthetic and real datasets shows that it outperforms state-of-the-art methods in both calibrated and uncalibrated scenarios.

20.
IEEE Trans Pattern Anal Mach Intell ; 44(5): 2657-2672, 2022 05.
Article in English | MEDLINE | ID: mdl-33301400

ABSTRACT

The generator in generative adversarial networks (GANs) is driven by a discriminator to produce high-quality images through an adversarial game. At the same time, the difficulty of reaching a stable generator has been increased. This paper focuses on non-adversarial generative networks that are trained in a plain manner without adversarial loss. The given limited number of real images could be insufficient to fully represent the real data distribution. We therefore investigate a set of distributions in a Wasserstein ball centred on the distribution induced by the training data and propose to optimize the generator over this Wasserstein ball. We theoretically discuss the solvability of the newly defined objective function and develop a tractable reformulation to learn the generator. The connections and differences between the proposed non-adversarial generative networks and GANs are analyzed. Experimental results on real-world datasets demonstrate that the proposed algorithm can effectively learn image generators in a non-adversarial approach, and the generated images are of comparable quality with those from GANs.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Algorithms , Image Processing, Computer-Assisted/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...