Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38358870

RESUMO

Multi-modal homography estimation aims to spatially align the images from different modalities, which is quite challenging since both the image content and resolution are variant across modalities. In this paper, we introduce a novel framework namely CrossHomo to tackle this challenging problem. Our framework is motivated by two interesting findings which demonstrate the mutual benefits between image super-resolution and homography estimation. Based on these findings, we design a flexible multi-level homography estimation network to align the multi-modal images in a coarse-to-fine manner. Each level is composed of a multi-modal image super-resolution (MISR) module to shrink the resolution gap between different modalities, followed by a multi-modal homography estimation (MHE) module to predict the homography matrix. To the best of our knowledge, CrossHomo is the first attempt to address the homography estimation problem with both modality and resolution discrepancy. Extensive experimental results show that our CrossHomo can achieve high registration accuracy on various multi-modal datasets with different resolution gaps. In addition, the network has high efficiency in terms of both model complexity and running speed. The source codes are available at https://github.com/lep990816/CrossHomo.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10114-10128, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37030806

RESUMO

Measuring perceptual color differences (CDs) is of great importance in modern smartphone photography. Despite the long history, most CD measures have been constrained by psychophysical data of homogeneous color patches or a limited number of simplistic natural photographic images. It is thus questionable whether existing CD measures generalize in the age of smartphone photography characterized by greater content complexities and learning-based image signal processors. In this article, we put together so far the largest image dataset for perceptual CD assessment, in which the photographic images are 1) captured by six flagship smartphones, 2) altered by Photoshop, 3) post-processed by built-in filters of the smartphones, and 4) reproduced with incorrect color profiles. We then conduct a large-scale psychophysical experiment to gather perceptual CDs of 30,000 image pairs in a carefully controlled laboratory environment. Based on the newly established dataset, we make one of the first attempts to construct an end-to-end learnable CD formula based on a lightweight neural network, as a generalization of several previous metrics. Extensive experiments demonstrate that the optimized formula outperforms 33 existing CD measures by a large margin, offers reasonable local CD maps without the use of dense supervision, generalizes well to homogeneous color patch data, and empirically behaves as a proper metric in the mathematical sense. Our dataset and code are publicly available at https://github.com/hellooks/CDNet.


Assuntos
Algoritmos , Smartphone , Fotografação/métodos , Redes Neurais de Computação , Aprendizagem , Cor
3.
IEEE Trans Image Process ; 31: 6267-6281, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36166564

RESUMO

The 360° image that offers a 360-degree scenario of the world is widely used in virtual reality and has drawn increasing attention. In 360° image compression, the spherical image is first transformed into a planar image with a projection such as equirectangular projection (ERP) and then saved with the existing codecs. The ERP images that represent different circles of latitude with the same number of pixels suffer from the unbalance sampling problem, resulting in inefficiency using planar compression methods, especially for the deep neural network (DNN) based codecs. To tackle this problem, we introduce a latitude adaptive coding scheme for DNNs by allocating variant numbers of codes for different regions according to the latitude on the sphere. Specifically, taking both the number of allocated codes for each region and their entropy into consideration, we introduce a flexible regional adaptive rate loss for region-wise rate controlling. Latitude adaptive constraints are then introduced to prevent spending too many codes on the over-sampling regions. Furthermore, we introduce viewport-based distortion loss by calculating the average distortion on a set of viewports. We optimize and test our model on a large 360° dataset containing 19,790 images collected from the Internet. The experiment results demonstrate the superiority of the proposed latitude adaptive coding scheme. On the whole, our model outperforms the existing image compression standards, including JPEG, JPEG2000, HEVC Intra Coding, and VVC Intra Coding, and helps to save around 15% bits compared to the baseline learned image compression model for planar images.

4.
IEEE Trans Image Process ; 31: 4062-4075, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35436193

RESUMO

In this work, we propose a new patch-based framework called VPU for the video-based point cloud upsampling task by effectively exploiting temporal dependency among multiple consecutive point cloud frames, in which each frame consists of a set of unordered, sparse and irregular 3D points. Rather than adopting the sophisticated motion estimation strategy in video analysis, we propose a new spatio-temporal aggregation (STA) module to effectively extract, align and aggregate rich local geometric clues from consecutive frames at the feature level. By more reliably summarizing spatio-temporally consistent and complementary knowledge from multiple frames in the resultant local structural features, our method better infers the local geometry distributions at the current frame. In addition, our STA module can be readily incorporated with various existing single frame-based point upsampling methods (e.g., PU-Net, MPU, PU-GAN and PU-GCN). Comprehensive experiments on multiple point cloud sequence datasets demonstrate our video-based point cloud upsampling framework achieves substantial performance improvement over its single frame-based counterparts.

5.
IEEE Trans Image Process ; 31: 1697-1707, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35081025

RESUMO

3D volumetric image processing has attracted increasing attention in the last decades, in which one major research area is to develop efficient lossless volumetric image compression techniques to better store and transmit such images with massive amount of information. In this work, we propose the first end-to-end optimized learning framework for losslessly compressing 3D volumetric data. Our approach builds upon a hierarchical compression scheme by additionally introducing the intra-slice auxiliary features and estimating the entropy model based on both intra-slice and inter-slice latent priors. Specifically, we first extract the hierarchical intra-slice auxiliary features through multi-scale feature extraction modules. Then, an Intra-slice and Inter-slice Conditional Entropy Coding module is proposed to fuse the intra-slice and inter-slice information from different scales as the context information. Based on such context information, we can predict the distributions for both intra-slice auxiliary features and the slice images. To further improve the lossless compression performance, we also introduce two new gating mechanisms called Intra-Gate and Inter-Gate to generate the optimal feature representations for better information fusion. Eventually, we can produce the bitstream for losslessly compressing volumetric images based on the estimated entropy model. Different from the existing lossless volumetric image codecs, our end-to-end optimized framework jointly learns both intra-slice auxiliary features at different scales for each slice and inter-slice latent features from previously encoded slices for better entropy estimation. The extensive experimental results indicate that our framework outperforms the state-of-the-art hand-crafted lossless volumetric image codecs (e.g., JP3D) and the learning-based lossless image compression method on four volumetric image benchmarks for losslessly compressing both 3D Medical Images and Hyper-Spectral Images.

6.
IEEE Trans Image Process ; 30: 3748-3763, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33729938

RESUMO

Defocus blur detection (DBD), which has been widely applied to various fields, aims to detect the out-of-focus or in-focus pixels from a single image. Despite the fact that the deep learning based methods applied to DBD have outperformed the hand-crafted feature based methods, the performance cannot still meet our requirement. In this paper, a novel network is established for DBD. Unlike existing methods which only learn the projection from the in-focus part to the ground-truth, both in-focus and out-of-focus pixels, which are completely and symmetrically complementary, are taken into account. Specifically, two symmetric branches are designed to jointly estimate the probability of focus and defocus pixels, respectively. Due to their complementary constraint, each layer in a branch is affected by an attention obtained from another branch, effectively learning the detailed information which may be ignored in one branch. The feature maps from these two branches are then passed through a unique fusion block to simultaneously get the two-channel output measured by a complementary loss. Additionally, instead of estimating only one binary map from a specific layer, each layer is encouraged to estimate the ground truth to guide the binary map estimation in its linked shallower layer followed by a top-to-bottom combination strategy, gradually exploiting the global and local information. Experimental results on released datasets demonstrate that our proposed method remarkably outperforms state-of-the-art algorithms.

7.
IEEE Trans Image Process ; 30: 3098-3112, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33600315

RESUMO

Nowadays, people are getting used to taking photos to record their daily life, however, the photos are actually not consistent with the real natural scenes. The two main differences are that the photos tend to have low dynamic range (LDR) and low resolution (LR), due to the inherent imaging limitations of cameras. The multi-exposure image fusion (MEF) and image super-resolution (SR) are two widely-used techniques to address these two issues. However, they are usually treated as independent researches. In this paper, we propose a deep Coupled Feedback Network (CF-Net) to achieve MEF and SR simultaneously. Given a pair of extremely over-exposed and under-exposed LDR images with low-resolution, our CF-Net is able to generate an image with both high dynamic range (HDR) and high-resolution. Specifically, the CF-Net is composed of two coupled recursive sub-networks, with LR over-exposed and under-exposed images as inputs, respectively. Each sub-network consists of one feature extraction block (FEB), one super-resolution block (SRB) and several coupled feedback blocks (CFB). The FEB and SRB are to extract high-level features from the input LDR image, which are required to be helpful for resolution enhancement. The CFB is arranged after SRB, and its role is to absorb the learned features from the SRBs of the two sub-networks, so that it can produce a high-resolution HDR image. We have a series of CFBs in order to progressively refine the fused high-resolution HDR image. Extensive experimental results show that our CF-Net drastically outperforms other state-of-the-art methods in terms of both SR accuracy and fusion performance. The software code is available here https://github.com/ytZhang99/CF-Net.

8.
IEEE Trans Pattern Anal Mach Intell ; 43(10): 3446-3461, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32248094

RESUMO

Learning-based lossy image compression usually involves the joint optimization of rate-distortion performance, and requires to cope with the spatial variation of image content and contextual dependence among learned codes. Traditional entropy models can spatially adapt the local bit rate based on the image content, but usually are limited in exploiting context in code space. On the other hand, most deep context models are computationally very expensive and cannot efficiently perform decoding over the symbols in parallel. In this paper, we present a content-weighted encoder-decoder model, where the channel-wise multi-valued quantization is deployed for the discretization of the encoder features, and an importance map subnet is introduced to generate the importance masks for spatially varying code pruning. Consequently, the summation of importance masks can serve as an upper bound of the length of bitstream. Furthermore, the quantized representations of the learned code and importance map are still spatially dependent, which can be losslessly compressed using arithmetic coding. To compress the codes effectively and efficiently, we propose an upper-triangular masked convolutional network (triuMCN) for large context modeling. Experiments show that the proposed method can produce visually much better results, and performs favorably against deep and traditional lossy image compression approaches.

9.
Artigo em Inglês | MEDLINE | ID: mdl-31870979

RESUMO

The depth images acquired by consumer depth sensors (e.g., Kinect and ToF) usually are of low resolution and insufficient quality. One natural solution is to incorporate a high resolution RGB camera and exploit the statistical correlation of its data and depth. In recent years, both optimization-based and learning-based approaches have been proposed to deal with the guided depth reconstruction problems. In this paper, we introduce a weighted analysis sparse representation (WASR) model for guided depth image enhancement, which can be considered a generalized formulation of a wide range of previous optimization-based models. We unfold the optimization by the WASR model and conduct guided depth reconstruction with dynamically changed stage-wise operations. Such a guidance strategy enables us to dynamically adjust the stage-wise operations that update the depth image, thus improving the reconstruction quality and speed. To learn the stage-wise operations in a task-driven manner, we propose two parameterizations and their corresponding methods: dynamic guidance with Gaussian RBF nonlinearity parameterization (DG-RBF) and dynamic guidance with CNN nonlinearity parameterization (DG-CNN). The network structures of the proposed DG-RBF and DG-CNN methods are designed with the the objective function of our WASR model in mind and the optimal network parameters are learned from paired training data. Such optimization-inspired network architectures enable our models to leverage the previous expertise as well as take benefit from training data. The effectiveness is validated for guided depth image super-resolution and for realistic depth image reconstruction tasks using standard benchmarks. Our DG-RBF and DG-CNN methods achieve the best quantitative results (RMSE) and better visual quality than the state-of-the-art approaches at the time of writing. The code is available at https://github.com/ShuhangGu/GuidedDepthSR.

10.
Artigo em Inglês | MEDLINE | ID: mdl-29994747

RESUMO

Due to the poor lighting condition and limited dynamic range of digital imaging devices, the recorded images are often under-/over-exposed and with low contrast. Most of previous single image contrast enhancement (SICE) methods adjust the tone curve to correct the contrast of an input image. Those methods, however, often fail in revealing image details because of the limited information in a single image. On the other hand, the SICE task can be better accomplished if we can learn extra information from appropriately collected training data. In this work, we propose to use the convolutional neural network (CNN) to train a SICE enhancer. One key issue is how to construct a training dataset of low-contrast and high-contrast image pairs for end-to-end CNN learning. To this end, we build a large-scale multi-exposure image dataset, which contains 589 elaborately selected high-resolution multi-exposure sequences with 4,413 images. Thirteen representative multi-exposure image fusion and stack-based high dynamic range imaging algorithms are employed to generate the contrast enhanced images for each sequence, and subjective experiments are conducted to screen the best quality one as the reference image of each scene. With the constructed dataset, a CNN can be easily trained as the SICE enhancer to improve the contrast of an under-/over-exposure image. Experimental results demonstrate the advantages of our method over existing SICE methods with a significant margin.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...