Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5700-5714, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34048338

RESUMO

In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).

2.
IEEE Comput Graph Appl ; 41(4): 29-39, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34010127

RESUMO

Most of the real-world virtual reality (VR) content available today is captured and rendered from a fixed vantage point. The visual-vestibular conflict arising from the lack of head-motion parallax degrades the feeling of presence in the virtual environment and has been shown to induce nausea and visual discomfort. We present an end-to-end framework for VR with head-motion parallax for real-world scenes. To capture both horizontally and vertically separated perspectives, we use a camera rig with two vertically stacked rings of outward-facing cameras. The data from the rig are processed offline and stored into a compact intermediate representation, which is used to render novel views for a head-mounted display, in accordance with the viewer's head movements. We compare two promising intermediate representations-Stacked OmniStereo and Layered Depth Panoramas-and evaluate them in terms of objective image quality metrics and the occurrence of disocclusion holes in synthesized novel views.


Assuntos
Óculos Inteligentes , Realidade Virtual , Movimentos da Cabeça , Movimento (Física)
3.
Artigo em Inglês | MEDLINE | ID: mdl-32286976

RESUMO

Recently, many fast implementations of the bilateral and the nonlocal filters were proposed based on lattice and vector quantization, e.g. clustering, in higher dimensions. However, these approaches can still be inefficient owing to the complexities in the resampling process or in filtering the high-dimensional resampled signal. In contrast, simply scalar resampling the high-dimensional signal after decorrelation presents the opportunity to filter signals using multi-rate signal processing techniques. Cis work proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering, appealing to the similarities between separable wavelet transforms and Gaussian pyramids. Accurately implementing the filter is important not only for image processing applications, but also for a number of recently proposed bilateralregularized inverse problems, where the accuracy of the solutions depends ultimately on an accurate filter implementation. We show that our Gaussian lifting approach filters images more accurately and efficiently across many filter scales. Adaptive lifting schemes for bilateral and nonlocal means filtering are also explored.

4.
Artigo em Inglês | MEDLINE | ID: mdl-32286986

RESUMO

We propose the fast optical flow extractor, a filtering method that recovers artifact-free optical flow fields from HEVCcompressed video. To extract accurate optical flow fields, we form a regularized optimization problem that considers the smoothness of the solution and the pixelwise confidence weights of an artifactridden HEVC motion field. Solving such an optimization problem is slow, so we first convert the problem into a confidence-weighted filtering task. By leveraging the already-available HEVC motion parameters, we achieve a 100-fold speed-up in the running times compared to similar methods, while producing subpixel-accurate flow estimates. Je fast optical flow extractor is useful when video frames are already available in coded formats. Our method is not specific to a coder, and works with motion fields from video coders such as H.264/AVC and HEVC.

5.
IEEE Trans Image Process ; 25(1): 179-94, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26571532

RESUMO

Compact descriptors for visual search (CDVS) is a recently completed standard from the ISO/IEC moving pictures experts group (MPEG). The primary goal of this standard is to provide a standardized bitstream syntax to enable interoperability in the context of image retrieval applications. Over the course of the standardization process, remarkable improvements were achieved in reducing the size of image feature data and in reducing the computation and memory footprint in the feature extraction process. This paper provides an overview of the technical features of the MPEG-CDVS standard and summarizes its evolution.

6.
IEEE Trans Image Process ; 23(8): 3352-67, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24956363

RESUMO

Streaming mobile augmented reality applications require both real-time recognition and tracking of objects of interest in a video sequence. Typically, local features are calculated from the gradients of a canonical patch around a keypoint in individual video frames. In this paper, we propose a temporally coherent keypoint detector and design efficient interframe predictive coding techniques for canonical patches, feature descriptors, and keypoint locations. In the proposed system, we strive to transmit each patch or its equivalent feature descriptor with as few bits as possible by modifying a previously transmitted patch or descriptor. Our solution enables server-based mobile augmented reality where a continuous stream of salient information, sufficient for image-based retrieval, and object localization, is sent at a bit-rate that is practical for today's wireless links and less than one-tenth of the bit-rate needed to stream the compressed video to the server.


Assuntos
Compressão de Dados/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Fotografação/métodos , Interface Usuário-Computador , Gravação em Vídeo/métodos , Algoritmos , Aumento da Imagem/métodos , Sistemas On-Line , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador
7.
IEEE Trans Pattern Anal Mach Intell ; 36(9): 1860-73, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26352237

RESUMO

This paper introduces the new idea of describing people using first names. We show that describing people in terms of similarity to a vector of possible first names is a powerful representation of facial appearance that can be used for a number of important applications, such as naming never-seen faces and building facial attribute classifiers. We build models for 100 common first names used in the US and for each pair, construct a pairwise first-name classifier. These classifiers are built using training images downloaded from the internet, with no additional user interaction. This gives our approach important advantages in building practical systems that do not require additional human intervention for data labeling. The classification scores from each pairwise name classifier can be used as a set of facial attributes to describe facial appearance. We show several surprising results. Our name attributes predict the correct first names of test faces at rates far greater than chance. The name attributes are applied to gender recognition and to age classification, outperforming state-of-the-art methods with all training images automatically gathered from the internet. We also demonstrate the powerful use of our name attributes for associating faces in images with names from caption, and the important application of unconstrained face verification.


Assuntos
Identificação Biométrica/métodos , Face/anatomia & histologia , Reconhecimento Facial , Nomes , Reconhecimento Automatizado de Padrão/métodos , Terminologia como Assunto , Humanos
8.
IEEE Trans Image Process ; 22(8): 2970-82, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23204286

RESUMO

We present the radial gradient transform (RGT) and a fast approximation, the approximate RGT (ARGT). We analyze the effects of the approximation on gradient quantization and histogramming. The ARGT is incorporated into the rotation-invariant fast feature (RIFF) algorithm. We demonstrate that, using the ARGT, RIFF extracts features 16× faster than SURF while achieving a similar performance for image matching and retrieval.


Assuntos
Algoritmos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
9.
IEEE Trans Image Process ; 21(1): 273-83, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21609882

RESUMO

We present a novel approach using distributed source coding for image authentication. The key idea is to provide a Slepian-Wolf encoded quantized image projection as authentication data. This version can be correctly decoded with the help of an authentic image as side information. Distributed source coding provides the desired robustness against legitimate variations while detecting illegitimate modification. The decoder incorporating expectation maximization algorithms can authenticate images which have undergone contrast, brightness, and affine warping adjustments. Our authentication system also offers tampering localization by using the sum-product algorithm.


Assuntos
Algoritmos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Medidas de Segurança , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
10.
IEEE Trans Image Process ; 21(5): 2630-40, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22106146

RESUMO

We consider distributed source coding in the presence of hidden variables that parameterize the statistical dependence among sources. We derive the Slepian-Wolf bound and devise coding algorithms for a block-candidate model of this problem. The encoder sends, in addition to syndrome bits, a portion of the source to the decoder uncoded as doping bits. The decoder uses the sum-product algorithm to simultaneously recover the source symbols and the hidden statistical dependence variables. We also develop novel techniques based on density evolution (DE) to analyze the coding algorithms. We experimentally confirm that our DE analysis closely approximates practical performance. This result allows us to efficiently optimize parameters of the algorithms. In particular, we show that the system performs close to the Slepian-Wolf bound when an appropriate doping rate is selected. We then apply our coding and analysis techniques to a reduced-reference video quality monitoring system and show a bit rate saving of about 75% compared with fixed-length coding.


Assuntos
Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Fotografação/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
11.
IEEE Trans Image Process ; 19(7): 1740-55, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20215074

RESUMO

The direction-adaptive partitioned block transform (DA-PBT) is proposed to exploit the directional features in color images to improve coding performance. Depending on the directionality in an image block, the transform either selects one of the eight directional modes or falls back to the nondirectional mode equivalent to the conventional 2-D DCT. The selection of a directional mode determines the transform direction that provides directional basis functions, the block partitioning that spatially confines the high-frequency energy, the scanning order that arranges the transform coefficients into a 1-D sequence for efficient entropy coding, and the quantization matrix optimized for visual quality. The DA-PBT can be incorporated into image coding using a rate-distortion optimized framework for direction selection, and can therefore be viewed as a generalization of variable blocksize transforms with the inclusion of directional transforms and nonrectangular partitions. As a block transform, it can naturally be combined with block-based intra or inter prediction to exploit the directionality remaining in the residual. Experimental results show that the proposed DA-PBT outperforms the 2-D DCT by more than 2 dB for test images with directional features. It also greatly reduces the ringing and checkerboard artifacts typically observed around directional features in images. The DA-PBT also consistently outperforms a previously proposed directional DCT. When combined with directional prediction, gains are less than additive, as similar signal properties are exploited by the prediction and the transform. For hybrid video coding, significant gains are shown for intra coding, but not for encoding the residual after accurate motion-compensated prediction.

12.
IEEE Trans Image Process ; 16(5): 1289-302, 2007 May.
Artigo em Inglês | MEDLINE | ID: mdl-17491460

RESUMO

We propose a direction-adaptive DWT (DA-DWT) that locally adapts the filtering directions to image content based on directional lifting. With the adaptive transform, energy compaction is improved for sharp image features. A mathematical analysis based on an anisotropic statistical image model is presented to quantify the theoretical gain achieved by adapting the filtering directions. The analysis indicates that the proposed DA-DWT is more effective than other lifting-based approaches. Experimental results report a gain of up to 2.5 dB in PSNR over the conventional DWT for typical test images. Subjectively, the reconstruction from the DA-DWT better represents the structure in the image and is visually more pleasing.


Assuntos
Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Sinais Assistido por Computador , Simulação por Computador , Interpretação Estatística de Dados , Modelos Estatísticos , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
13.
IEEE Trans Image Process ; 15(4): 793-806, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16579369

RESUMO

We propose disparity-compensated lifting for wavelet compression of light fields. With this approach, we obtain the benefits of wavelet coding, such as scalability in all dimensions, as well as superior compression performance. Additionally, the proposed approach solves the irreversibility limitations of previous light field wavelet coding approaches, using the lifting structure. Our scheme incorporates disparity compensation into the lifting structure for the transform across the views in the light field data set. Another transform is performed to exploit the coherence among neighboring pixels, followed by a modified SPIHT coder and rate-distortion optimized bitstream assembly. A view-sequencing algorithm is developed to organize the views for encoding. For light fields of an object, we propose to use shape adaptation to improve the compression efficiency and visual quality of the images. The necessary shape information is efficiently coded based on prediction from the existing geometry model. Experimental results show that the proposed scheme exhibits superior compression performance over existing light field compression techniques.


Assuntos
Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Gráficos por Computador , Análise Numérica Assistida por Computador
14.
J Maxillofac Surg ; 29(3): 156-158, 2001 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-11403552

RESUMO

Background: In plastic and reconstructive craniofacial surgery, careful preoperative planning is essential. In complex cases of craniofacial synostosis, rapid prototyping models are used to simulate the surgery and reduce operating time. Recently, 3-D CT model surgery has been introduced for presurgical planning and prediction of the postoperative result. Objective: For simulation of craniofacial surgery a computer-based system was developed that allows visualization and manipulation of CT-data using computer graphics techniques. Surgical procedures in all areas of the bony skull can be performed interactively. Results: The case of a child with scaphocephalus is presented. Surgery is planned using the craniofacial surgery simulator described above. Conclusion: The computer-based interactive surgery simulation systems presented here allow precise visualization of craniofacial surgery. The accurate computer-aided 3-D simulation of bone displacements is also the prerequisite for transfer of the simulated surgery using a navigation system for surgery. Thus the preoperatively planned procedure could be transferred directly to the operating table. Copyright 2001 European Association for Cranio-Maxillofacial Surgery.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...