RESUMEN
In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).
RESUMEN
Most of the real-world virtual reality (VR) content available today is captured and rendered from a fixed vantage point. The visual-vestibular conflict arising from the lack of head-motion parallax degrades the feeling of presence in the virtual environment and has been shown to induce nausea and visual discomfort. We present an end-to-end framework for VR with head-motion parallax for real-world scenes. To capture both horizontally and vertically separated perspectives, we use a camera rig with two vertically stacked rings of outward-facing cameras. The data from the rig are processed offline and stored into a compact intermediate representation, which is used to render novel views for a head-mounted display, in accordance with the viewer's head movements. We compare two promising intermediate representations-Stacked OmniStereo and Layered Depth Panoramas-and evaluate them in terms of objective image quality metrics and the occurrence of disocclusion holes in synthesized novel views.
Asunto(s)
Gafas Inteligentes , Realidad Virtual , Movimientos de la Cabeza , Movimiento (Física)RESUMEN
Recently, many fast implementations of the bilateral and the nonlocal filters were proposed based on lattice and vector quantization, e.g. clustering, in higher dimensions. However, these approaches can still be inefficient owing to the complexities in the resampling process or in filtering the high-dimensional resampled signal. In contrast, simply scalar resampling the high-dimensional signal after decorrelation presents the opportunity to filter signals using multi-rate signal processing techniques. Cis work proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering, appealing to the similarities between separable wavelet transforms and Gaussian pyramids. Accurately implementing the filter is important not only for image processing applications, but also for a number of recently proposed bilateralregularized inverse problems, where the accuracy of the solutions depends ultimately on an accurate filter implementation. We show that our Gaussian lifting approach filters images more accurately and efficiently across many filter scales. Adaptive lifting schemes for bilateral and nonlocal means filtering are also explored.
RESUMEN
We propose the fast optical flow extractor, a filtering method that recovers artifact-free optical flow fields from HEVCcompressed video. To extract accurate optical flow fields, we form a regularized optimization problem that considers the smoothness of the solution and the pixelwise confidence weights of an artifactridden HEVC motion field. Solving such an optimization problem is slow, so we first convert the problem into a confidence-weighted filtering task. By leveraging the already-available HEVC motion parameters, we achieve a 100-fold speed-up in the running times compared to similar methods, while producing subpixel-accurate flow estimates. Je fast optical flow extractor is useful when video frames are already available in coded formats. Our method is not specific to a coder, and works with motion fields from video coders such as H.264/AVC and HEVC.
RESUMEN
Compact descriptors for visual search (CDVS) is a recently completed standard from the ISO/IEC moving pictures experts group (MPEG). The primary goal of this standard is to provide a standardized bitstream syntax to enable interoperability in the context of image retrieval applications. Over the course of the standardization process, remarkable improvements were achieved in reducing the size of image feature data and in reducing the computation and memory footprint in the feature extraction process. This paper provides an overview of the technical features of the MPEG-CDVS standard and summarizes its evolution.
RESUMEN
Streaming mobile augmented reality applications require both real-time recognition and tracking of objects of interest in a video sequence. Typically, local features are calculated from the gradients of a canonical patch around a keypoint in individual video frames. In this paper, we propose a temporally coherent keypoint detector and design efficient interframe predictive coding techniques for canonical patches, feature descriptors, and keypoint locations. In the proposed system, we strive to transmit each patch or its equivalent feature descriptor with as few bits as possible by modifying a previously transmitted patch or descriptor. Our solution enables server-based mobile augmented reality where a continuous stream of salient information, sufficient for image-based retrieval, and object localization, is sent at a bit-rate that is practical for today's wireless links and less than one-tenth of the bit-rate needed to stream the compressed video to the server.
Asunto(s)
Compresión de Datos/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Fotograbar/métodos , Interfaz Usuario-Computador , Grabación en Video/métodos , Algoritmos , Aumento de la Imagen/métodos , Sistemas en Línea , Sensibilidad y Especificidad , Procesamiento de Señales Asistido por ComputadorRESUMEN
This paper introduces the new idea of describing people using first names. We show that describing people in terms of similarity to a vector of possible first names is a powerful representation of facial appearance that can be used for a number of important applications, such as naming never-seen faces and building facial attribute classifiers. We build models for 100 common first names used in the US and for each pair, construct a pairwise first-name classifier. These classifiers are built using training images downloaded from the internet, with no additional user interaction. This gives our approach important advantages in building practical systems that do not require additional human intervention for data labeling. The classification scores from each pairwise name classifier can be used as a set of facial attributes to describe facial appearance. We show several surprising results. Our name attributes predict the correct first names of test faces at rates far greater than chance. The name attributes are applied to gender recognition and to age classification, outperforming state-of-the-art methods with all training images automatically gathered from the internet. We also demonstrate the powerful use of our name attributes for associating faces in images with names from caption, and the important application of unconstrained face verification.
Asunto(s)
Identificación Biométrica/métodos , Cara/anatomía & histología , Reconocimiento Facial , Nombres , Reconocimiento de Normas Patrones Automatizadas/métodos , Terminología como Asunto , HumanosRESUMEN
We present the radial gradient transform (RGT) and a fast approximation, the approximate RGT (ARGT). We analyze the effects of the approximation on gradient quantization and histogramming. The ARGT is incorporated into the rotation-invariant fast feature (RIFF) algorithm. We demonstrate that, using the ARGT, RIFF extracts features 16× faster than SURF while achieving a similar performance for image matching and retrieval.
Asunto(s)
Algoritmos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
We present a novel approach using distributed source coding for image authentication. The key idea is to provide a Slepian-Wolf encoded quantized image projection as authentication data. This version can be correctly decoded with the help of an authentic image as side information. Distributed source coding provides the desired robustness against legitimate variations while detecting illegitimate modification. The decoder incorporating expectation maximization algorithms can authenticate images which have undergone contrast, brightness, and affine warping adjustments. Our authentication system also offers tampering localization by using the sum-product algorithm.
Asunto(s)
Algoritmos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Medidas de Seguridad , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
We consider distributed source coding in the presence of hidden variables that parameterize the statistical dependence among sources. We derive the Slepian-Wolf bound and devise coding algorithms for a block-candidate model of this problem. The encoder sends, in addition to syndrome bits, a portion of the source to the decoder uncoded as doping bits. The decoder uses the sum-product algorithm to simultaneously recover the source symbols and the hidden statistical dependence variables. We also develop novel techniques based on density evolution (DE) to analyze the coding algorithms. We experimentally confirm that our DE analysis closely approximates practical performance. This result allows us to efficiently optimize parameters of the algorithms. In particular, we show that the system performs close to the Slepian-Wolf bound when an appropriate doping rate is selected. We then apply our coding and analysis techniques to a reduced-reference video quality monitoring system and show a bit rate saving of about 75% compared with fixed-length coding.
Asunto(s)
Algoritmos , Compresión de Datos/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Fotograbar/métodos , Procesamiento de Señales Asistido por Computador , Grabación en Video/métodos , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
The direction-adaptive partitioned block transform (DA-PBT) is proposed to exploit the directional features in color images to improve coding performance. Depending on the directionality in an image block, the transform either selects one of the eight directional modes or falls back to the nondirectional mode equivalent to the conventional 2-D DCT. The selection of a directional mode determines the transform direction that provides directional basis functions, the block partitioning that spatially confines the high-frequency energy, the scanning order that arranges the transform coefficients into a 1-D sequence for efficient entropy coding, and the quantization matrix optimized for visual quality. The DA-PBT can be incorporated into image coding using a rate-distortion optimized framework for direction selection, and can therefore be viewed as a generalization of variable blocksize transforms with the inclusion of directional transforms and nonrectangular partitions. As a block transform, it can naturally be combined with block-based intra or inter prediction to exploit the directionality remaining in the residual. Experimental results show that the proposed DA-PBT outperforms the 2-D DCT by more than 2 dB for test images with directional features. It also greatly reduces the ringing and checkerboard artifacts typically observed around directional features in images. The DA-PBT also consistently outperforms a previously proposed directional DCT. When combined with directional prediction, gains are less than additive, as similar signal properties are exploited by the prediction and the transform. For hybrid video coding, significant gains are shown for intra coding, but not for encoding the residual after accurate motion-compensated prediction.
RESUMEN
We propose a direction-adaptive DWT (DA-DWT) that locally adapts the filtering directions to image content based on directional lifting. With the adaptive transform, energy compaction is improved for sharp image features. A mathematical analysis based on an anisotropic statistical image model is presented to quantify the theoretical gain achieved by adapting the filtering directions. The analysis indicates that the proposed DA-DWT is more effective than other lifting-based approaches. Experimental results report a gain of up to 2.5 dB in PSNR over the conventional DWT for typical test images. Subjectively, the reconstruction from the DA-DWT better represents the structure in the image and is visually more pleasing.
Asunto(s)
Algoritmos , Compresión de Datos/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Procesamiento de Señales Asistido por Computador , Simulación por Computador , Interpretación Estadística de Datos , Modelos Estadísticos , Análisis Numérico Asistido por Computador , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
We propose disparity-compensated lifting for wavelet compression of light fields. With this approach, we obtain the benefits of wavelet coding, such as scalability in all dimensions, as well as superior compression performance. Additionally, the proposed approach solves the irreversibility limitations of previous light field wavelet coding approaches, using the lifting structure. Our scheme incorporates disparity compensation into the lifting structure for the transform across the views in the light field data set. Another transform is performed to exploit the coherence among neighboring pixels, followed by a modified SPIHT coder and rate-distortion optimized bitstream assembly. A view-sequencing algorithm is developed to organize the views for encoding. For light fields of an object, we propose to use shape adaptation to improve the compression efficiency and visual quality of the images. The necessary shape information is efficiently coded based on prediction from the existing geometry model. Experimental results show that the proposed scheme exhibits superior compression performance over existing light field compression techniques.
Asunto(s)
Algoritmos , Compresión de Datos/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Procesamiento de Señales Asistido por Computador , Gráficos por Computador , Análisis Numérico Asistido por ComputadorRESUMEN
Background: In plastic and reconstructive craniofacial surgery, careful preoperative planning is essential. In complex cases of craniofacial synostosis, rapid prototyping models are used to simulate the surgery and reduce operating time. Recently, 3-D CT model surgery has been introduced for presurgical planning and prediction of the postoperative result. Objective: For simulation of craniofacial surgery a computer-based system was developed that allows visualization and manipulation of CT-data using computer graphics techniques. Surgical procedures in all areas of the bony skull can be performed interactively. Results: The case of a child with scaphocephalus is presented. Surgery is planned using the craniofacial surgery simulator described above. Conclusion: The computer-based interactive surgery simulation systems presented here allow precise visualization of craniofacial surgery. The accurate computer-aided 3-D simulation of bone displacements is also the prerequisite for transfer of the simulated surgery using a navigation system for surgery. Thus the preoperatively planned procedure could be transferred directly to the operating table. Copyright 2001 European Association for Cranio-Maxillofacial Surgery.