RESUMEN
Event streams provide a novel paradigm to describe visual scenes by capturing intensity variations above specific thresholds along with various types of noise. Existing event generation methods usually rely on one-way mappings using hand-crafted parameters and noise rates, which may not adequately suit diverse scenarios and event cameras. To address this limitation, we propose a novel approach to learn a bidirectional mapping between the feature space of event streams and their inherent parameters, enabling the generation of reliable event streams with enhanced generalization capabilities. We first randomly generate a vast number of parameters and synthesize massive event streams using an event simulator. Subsequently, an event-based normalizing flow network is proposed to learn the invertible mapping between the representation of a synthetic event stream and its parameters. The invertible mapping is implemented by incorporating an intensity-guided conditional affine simulation mechanism, facilitating better alignment between event features and parameter spaces. Additionally, we impose constraints on event sparsity, edge distribution, and noise distribution through novel event losses, further emphasizing event priors in the bidirectional mapping. Our framework surpasses state-of-the-art methods in video reconstruction, optical flow estimation, and parameter estimation tasks on synthetic and real-world datasets, exhibiting excellent generalization across diverse scenes and cameras.
RESUMEN
Joint filtering mainly uses an additional guidance image as a prior and transfers its structures to the target image in the filtering process. Different from existing approaches that rely on local linear models or hand-designed objective functions to extract the structural information from the guidance image, we propose a new joint filtering method based on a spatially variant linear representation model (SVLRM), where the target image is linearly represented by the guidance image. However, learning SVLRMs for vision tasks is a highly ill-posed problem. To estimate the spatially variant linear representation coefficients, we develop an effective approach based on a deep convolutional neural network (CNN). As such, the proposed deep CNN (constrained by the SVLRM) is able to model the structural information of both the guidance and input images. We show that the proposed approach can be effectively applied to a variety of applications, including depth/RGB image upsampling and restoration, flash deblurring, natural image denoising, and scale-aware filtering. In addition, we show that the linear representation model can be extended to high-order representation models (e.g., quadratic and cubic polynomial representations). Extensive experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods that have been specifically designed for each task.
RESUMEN
Deblurring images captured in dynamic scenes is challenging as the motion blurs are spatially varying caused by camera shakes and object movements. In this paper, we propose a spatially varying neural network to deblur dynamic scenes. The proposed model is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN). The RNN is used as a deconvolution operator on feature maps extracted from the input image by one of the CNNs. Another CNN is used to learn the spatially varying weights for the RNN. As a result, the RNN is spatial-aware and can implicitly model the deblurring process with spatially varying kernels. To better exploit properties of the spatially varying RNN, we develop both one-dimensional and two-dimensional RNNs for deblurring. The third component, based on a CNN, reconstructs the final deblurred feature maps into a restored image. In addition, the whole network is end-to-end trainable. Quantitative and qualitative evaluations on benchmark datasets demonstrate that the proposed method performs favorably against the state-of-the-art deblurring algorithms.