RESUMO
In this paper, a synthetic hyperspectral video database is introduced. Since it is impossible to record ground-truth hyperspectral videos, this database offers the possibility to leverage the evaluation of algorithms in diverse applications. For all scenes, depth maps are provided as well to yield the position of a pixel in all spatial dimensions as well as the reflectance in spectral dimension. Two novel algorithms for two different applications are proposed to prove the diversity of applications that can be addressed by this novel database. First, a cross-spectral image reconstruction algorithm is extended to exploit the temporal correlation between two consecutive frames. The evaluation using this hyperspectral database shows an increase in peak signal-to-noise ratio (PSNR) of up to 5.6 dB dependent on the scene. Second, a hyperspectral video coder is introduced, which extends an existing hyperspectral image coder by exploiting temporal correlation. The evaluation shows rate savings of up to 10% depending on the scene.
RESUMO
Light spectra are a very important source of information for diverse classification problems, e.g., for discrimination of materials. To lower the cost of acquiring this information, multispectral cameras are used. Several techniques exist for estimating light spectra out of multispectral images by exploiting properties about the spectrum. Unfortunately, especially when capturing multispectral videos, the images are heavily affected by noise due to the nature of limited exposure times in videos. Therefore, models that explicitly try to lower the influence of noise on the reconstructed spectrum are highly desirable. Hence, a novel reconstruction algorithm is presented. This novel estimation method is based on the guided filtering technique that preserves basic structures, while using spatial information to reduce the influence of noise. The evaluation based on spectra of natural images reveals that this new technique yields better quantitative and subjective results in noisy scenarios than other state-of-the-art spatial reconstruction methods. Specifically, the proposed algorithm lowers the mean squared error and the spectral angle up to 46% and 35% in noisy scenarios, respectively. Furthermore, it is shown that the proposed reconstruction technique works out of the box and does not need any calibration or training by reconstructing spectra from a real-world multispectral camera with nine channels.
RESUMO
In this paper, we provide an in-depth assessment on the Bjøntegaard Delta. We construct a large data set of video compression performance comparisons using a diverse set of metrics including PSNR, VMAF, bitrate, and processing energies. These metrics are evaluated for visual data types such as classic perspective video, 360° video, point clouds, and screen content. As compression technology, we consider multiple hybrid video codecs as well as state-of-the-art neural network based compression methods. Using additional supporting points in-between standard points defined by parameters such as the quantization parameter, we assess the interpolation error of the Bjøntegaard-Delta (BD) calculus and its impact on the final BD value. From the analysis, we find that the BD calculus is most accurate in the standard application of rate-distortion comparisons with mean errors below 0.5 percentage points. For other applications and special cases, e.g., VMAF quality, energy considerations, or inter-codec comparisons, the errors are higher (up to 5 percentage points), but can be halved by using a higher number of supporting points. We finally come up with recommendations on how to use the BD calculus such that the validity of the resulting BD-values is maximized. Main recommendations are as follows: First, relative curve differences should be plotted and analyzed. Second, the logarithmic domain should be used for saturating metrics such as SSIM and VMAF. Third, BD values below a certain threshold indicated by the subset error should not be used to draw recommendations. Fourth, using two supporting points is sufficient to obtain rough performance estimates.
RESUMO
Recently, many new applications arose for multispectral and hyper-spectral imaging. Besides modern biometric systems for identity verification, also agricultural and medical applications came up, which measure the health condition of plants and humans. Despite the growing demand, the acquisition of multi-spectral data is up to the present complicated. Often, expensive, inflexible, or low resolution acquisition setups are only obtainable for specific professional applications. To overcome these limitations, a novel camera array for multi-spectral imaging is presented in this article for generating consistent multispectral videos. As differing spectral images are acquired at various viewpoints, a geometrically constrained multi-camera sensor layout is introduced, which enables the formulation of novel registration and reconstruction algorithms to globally set up robust models. On average, the novel acquisition approach achieves a gain of 2.5 dB PSNR compared to recently published multi-spectral filter array imaging systems. At the same time, the proposed acquisition system ensures not only a superior spatial, but also a high spectral, and temporal resolution, while filters are flexibly exchangeable by the user depending on the application. Moreover, depth information is generated, so that 3D imaging applications, e.g., for augmented or virtual reality, become possible. The proposed camera array for multi-spectral imaging can be set up using off-the-shelf hardware, which allows for a compact design and employment in, e.g., mobile devices or drones, while being cost-effective.
RESUMO
Capturing ground truth data to benchmark super-resolution (SR) is challenging. Therefore, current quantitative studies are mainly evaluated on simulated data artificially sampled from ground truth images. We argue that such evaluations overestimate the actual performance of SR methods compared to their behavior on real images. Toward bridging this simulated-to-real gap, we introduce the Super-Resolution Erlangen (SupER) database, the first comprehensive laboratory SR database of all-real acquisitions with pixel-wise ground truth. It consists of more than 80k images of 14 scenes combining different facets: CMOS sensor noise, real sampling at four resolution levels, nine scene motion types, two photometric conditions, and lossy video coding at five levels. As such, the database exceeds existing benchmarks by an order of magnitude in quality and quantity. This paper also benchmarks 19 popular single-image and multi-frame algorithms on our data. The benchmark comprises a quantitative study by exploiting ground truth data and qualitative evaluations in a large-scale observer study. We also rigorously investigate agreements between both evaluations from a statistical perspective. One interesting result is that top-performing methods on simulated data may be surpassed by others on real data. Our insights can spur further algorithm development, and the publicy available dataset can foster future evaluations.
RESUMO
Lossless compression of dynamic 2-D+t and 3-D+t medical data is challenging regarding the huge amount of data, the characteristics of the inherent noise, and the high bit depth. Beyond that, a scalable representation is often required in telemedicine applications. Motion Compensated Temporal Filtering works well for lossless compression of medical volume data and additionally provides temporal, spatial, and quality scalability features. To achieve a high quality lowpass subband, which shall be used as a downscaled representative of the original data, graph-based motion compensation was recently introduced to this framework. However, encoding the motion information, which is stored in adjacency matrices, is not well investigated so far. This work focuses on coding these adjacency matrices to make the graph-based motion compensation feasible for data compression. We propose a novel coding scheme based on constructing so-called motion maps. This allows for the first time to compare the performance of graph-based motion compensation to traditional block-and mesh-based approaches. For high quality lowpass subbands our method is able to outperform the block-and mesh-based approaches by increasing the visual quality in terms of PSNR by 0.53 dB and 0.28 dB for CT data, as well as 1.04 dB and 1.90 dB for MR data, respectively, while the bit rate is reduced at the same time.
RESUMO
The usage of embedded systems is omnipresent in our everyday life, e.g., in smartphones, tablets, or automotive devices. These devices are able to deal with challenging image processing tasks like real-time detection of faces or high dynamic range imaging. However, the size and computational power of an embedded system is a limiting demand. To help students understanding these challenges, a new lab course "Image and Video Signal Processing on Embedded Systems" has been developed and is presented in this paper. The Raspberry Pi 3 Model B and the open source programming language Python have been chosen, because of low hardware cost and free availability of the programming language. In this lab course the students learn handling both hard- and software, Python as an alternative to MATLAB, the image signal processing path, and how to develop an embedded image processing system, from the idea to implementation and debugging. At the beginning of the lab course an introduction to Python and the Raspberry Pi is given. After that, various experiments like the implementation of a corner detector and creation of a panorama image are prepared in the lab course. Students participating in the lab course develop a profound understanding of embedded image and video processing algorithms which is verified by comparing questionnaires at the beginning and the end of the lab course. Moreover, compared to a peer group attending an accompanying lecture with exercises, students having participated in this lab course outperform their peer group in the exam for the lecture by 0.5 on a five-point scale.
RESUMO
Due to their high resolution, dynamic medical 2D+t and 3D+t volumes from computed tomography (CT) and magnetic resonance tomography (MR) reach a size which makes them very unhandy for teleradiologic applications. A lossless scalable representation offers the advantage of a down-scaled version which can be used for orientation or previewing, while the remaining information for reconstructing the full resolution is transmitted on demand. The wavelet transform offers the desired scalability. A very high quality of the lowpass sub-band is crucial in order to use it as a down-scaled representation. We propose an approach based on compensated wavelet lifting for obtaining a scalable representation of dynamic CT and MR volumes with very high quality. The mesh compensation is feasible to model the displacement in dynamic volumes which is mainly given by expansion and contraction of tissue over time. To achieve this, we propose an optimized estimation of the mesh compensation parameters to optimally fit for dynamic volumes. Within the lifting structure, the inversion of the motion compensation is crucial in the update step. We propose to take this inversion directly into account during the estimation step and can improve the quality of the lowpass sub-band by 0.63 and 0.43 dB on average for our tested dynamic CT and MR volumes at the cost of an increase of the rate by 2.4% and 1.2% on average.
RESUMO
This paper considers online robust principal component analysis (RPCA) in time-varying decomposition problems such as video foreground-background separation. We propose a compressive online RPCA algorithm that decomposes recursively a sequence of data vectors (e.g., frames) into sparse and low-rank components. Different from conventional batch RPCA, which processes all the data directly, our approach considers a small set of measurements taken per data vector (frame). Moreover, our algorithm can incorporate multiple prior information from previous decomposed vectors via proposing an - minimization method. At each time instance, the algorithm recovers the sparse vector by solving the - minimization problem-which promotes not only the sparsity of the vector but also its correlation with multiple previously recovered sparse vectors-and, subsequently, updates the low-rank component using incremental singular value decomposition. We also establish theoretical bounds on the number of measurements required to guarantee successful compressive separation under the assumptions of static or slowly changing low-rank components. We evaluate the proposed algorithm using numerical experiments and online video foreground-background separation experiments. The experimental results show that the proposed method outperforms the existing methods.
RESUMO
The implementation of automatic image registration is still difficult in various applications. In this paper, an automatic image registration approach through line-support region segmentation and geometrical outlier removal is proposed. This new approach is designed to address the problems associated with the registration of images with affine deformations and inconsistent content, such as remote sensing images with different spectral content or noise interference, or map images with inconsistent annotations. To begin with, line-support regions, namely a straight region whose points share roughly the same image gradient angle, are extracted to address the issues of inconsistent content existing in images. To alleviate the incompleteness of line segments, an iterative strategy with multi-resolution is employed to preserve global structures that are masked at full resolution by image details or noise. Then, geometrical outlier removal is developed to provide reliable feature point matching, which is based on affine-invariant geometrical classifications for corresponding matches initialized by scale invariant feature transform. The candidate outliers are selected by comparing the disparity of accumulated classifications among all matches, instead of conventional methods which only rely on local geometrical relations. Various image sets have been considered in this paper for the evaluation of the proposed approach, including aerial images with simulated affine deformations, remote sensing optical and synthetic aperture radar images taken at different situations (multispectral, multisensor, and multitemporal), and map images with inconsistent annotations. Experimental results demonstrate the superior performance of the proposed method over the existing approaches for the whole data set.
RESUMO
In this paper, we derive a spatiotemporal extrapolation method for 3-D discrete signals. Extending a discrete signal beyond a limited number of known samples is commonly referred to as discrete signal extrapolation. Extrapolation problems arise in many applications in video communications. Transmission errors in video communications may cause data losses which are concealed by extrapolating the surrounding video signal into the missing area. The same principle is applied for TV logo removal. Prediction in hybrid video coding is also interpreted as an extrapolation problem. Conventionally, the unknown areas in the video sequence are estimated from either the spatial or temporal surrounding. Our approach considers the spatiotemporal signal including the missing area in a volume and replaces the unknown samples by extrapolating the surrounding signal from spatial, as well as temporal direction. By exploiting spatial and temporal correlations at the same time, it is possible to inherently compensate motion. Deviations in luminance occurring from frame to frame can be compensated, too.
Assuntos
Algoritmos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Gráficos por Computador , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
Pixelwise linear prediction using backward-adaptive least-squares or weighted least-squares estimation of prediction coefficients is currently among the state-of-the-art methods for lossless image compression. While current research is focused on mean intensity prediction of the pixel to be transmitted, best compression requires occurrence probability estimates for all possible intensity values. Apart from common heuristic approaches, we show how prediction error variance estimates can be derived from the (weighted) least-squares training region and how a complete probability distribution can be built based on an autoregressive image model. The analysis of image stationarity properties further allows deriving a novel formula for weight computation in weighted least-squares proofing and generalizing ad hoc equations from the literature. For sparse intensity distributions in non-natural images, a modified image model is presented. Evaluations were done in the newly developed C++ framework volumetric, artificial, and natural image lossless coder (Vanilc), which can compress a wide range of images, including 16-bit medical 3D volumes or multichannel data. A comparison with several of the best available lossless image codecs proofs that the method can achieve very competitive compression ratios. In terms of reproducible research, the source code of Vanilc has been made public.
RESUMO
Even though image signals are typically defined on a regular 2D grid, there also exist many scenarios where this is not the case and the amplitude of the image signal only is available for a non-regular subset of pixel positions. In such a case, a resampling of the image to a regular grid has to be carried out. This is necessary since almost all algorithms and technologies for processing, transmitting or displaying image signals rely on the samples being available on a regular grid. Thus, it is of great importance to reconstruct the image on this regular grid, so that the reconstruction comes closest to the case that the signal has been originally acquired on the regular grid. In this paper, Frequency Selective Reconstruction is introduced for solving this challenging task. This algorithm reconstructs image signals by exploiting the property that small areas of images can be represented sparsely in the Fourier domain. By further considering the basic properties of the optical transfer function of imaging systems, a sparse model of the signal is iteratively generated. In doing so, the proposed algorithm is able to achieve a very high reconstruction quality, in terms of peak signal-to-noise ratio (PSNR) and structural similarity measure as well as in terms of visual quality. The simulation results show that the proposed algorithm is able to outperform state-of-the-art reconstruction algorithms and gains of more than 1 dB PSNR are possible.
RESUMO
In this paper, two multiple description coding schemes are developed, based on prediction-induced randomly offset quantizers and unequal-deadzone-induced near-uniformly offset quantizers, respectively. In both schemes, each description encodes one source subset with a small quantization stepsize, and other subsets are predictively coded with a large quantization stepsize. In the first method, due to predictive coding, the quantization bins that a coefficient belongs to in different descriptions are randomly overlapped. The optimal reconstruction is obtained by finding the intersection of all received bins. In the second method, joint dequantization is also used, but near-uniform offsets are created among different low-rate quantizers by quantizing the predictions and by employing unequal deadzones. By generalizing the recently developed random quantization theory, the closed-form expression of the expected distortion is obtained for the first method, and a lower bound is obtained for the second method. The schemes are then applied to lapped transform-based multiple description image coding. The closed-form expressions enable the optimization of the lapped transform. An iterative algorithm is also developed to facilitate the optimization. Theoretical analyzes and image coding results show that both schemes achieve better performance than other methods in this category.