Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
IEEE Trans Image Process ; 31: 974-983, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34971532

RESUMEN

Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/.

2.
Artículo en Inglés | MEDLINE | ID: mdl-32070955

RESUMEN

Head-mounted holographic displays (HMHD) are projected to be the first commercial realization of holographic video display systems. HMHDs use liquid crystal on silicon (LCoS) spatial light modulators (SLM), which are best suited to display phase-only holograms (POH). The performance/watt requirement of a monochrome, 60 fps Full HD, 2-eye, POH HMHD system is about 10 TFLOPS/W, which is orders of magnitude higher than that is achievable by commercially available mobile processors. To mitigate this compute power constraint, display-ready POHs shall be generated on a nearby server and sent to the HMHD in compressed form over a wireless link. This paper discusses design of a feasible HMHD-based augmented reality system, focusing on compression requirements and per-pixel rate-distortion trade-off for transmission of display-ready POH from the server to HMHD. Since the decoder in the HMHD needs to operate on low power, only coding methods that have low-power decoder implementation are considered. Effects of 2D phase unwrapping and flat quantization on compression performance are also reported. We next propose a versatile PCM-POH codec with progressive quantization that can adapt to SLM-dynamic-range and available bitrate, and features per-pixel rate-distortion control to achieve acceptable POH quality at target rates of 60-200 Mbit/s that can be reliably achieved by current wireless technologies. Our results demonstrate feasibility of realizing a low-power, quality-ensured, multi-user, interactive HMHD augmented reality system with commercially available components using the proposed adaptive compression of display-ready POH with light-weight decoding.

3.
IEEE Trans Image Process ; 16(5): 1315-26, 2007 May.
Artículo en Inglés | MEDLINE | ID: mdl-17491462

RESUMEN

We propose new models and methods for rate-distortion (RD) optimal video delivery over IP, when packets with bit errors are also delivered. In particular, we propose RD optimal methods for slicing and unequal error protection (UEP) of packets over IP allowing transmission of packets with bit errors. The proposed framework can be employed in a classical independent-layer transport model for optimal slicing, as well as in a cross-layer transport model for optimal slicing and UEP, where the forward error correction (FEC) coding is performed at the link layer, but the application controls the FEC code rate with the constraint that a given IP packet is subject to constant channel protection. The proposed method uses a novel dynamic programming approach to determine the optimal slicing and UEP configuration for each video frame in a practical manner, that is compliant with the AVC/H.264 standard. We also propose new rate and distortion estimation techniques at the encoder side in order to efficiently evaluate the objective function for a slice configuration. The cross-layer formulation option effectively determines which regions of a frame should be protected better; hence, it can be considered as a spatial UEP scheme. We successfully demonstrate, by means of experimental results, that each component of the proposed system provides significant gains, up to 2.0 dB, compared to competitive methods.


Asunto(s)
Algoritmos , Artefactos , Redes de Comunicación de Computadores , Compresión de Datos/métodos , Aumento de la Imagen/métodos , Procesamiento de Señales Asistido por Computador , Grabación en Video/métodos , Interpretación de Imagen Asistida por Computador/métodos , Análisis Numérico Asistido por Computador
4.
IEEE Trans Image Process ; 16(3): 684-97, 2007 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-17357729

RESUMEN

This paper proposes a complete stochastic framework for RD optimal encoder design for video over error-prone networks, which applies to any motion-compensated predictive video codec. The distortion measure has been taken as the mean square error over an ensemble of channels given an estimate of the instantaneous packet loss probability. We show that 1) the optimal motion compensated prediction, in the MSE sense, requires computation of the expected value of the reference frames, and 2) calculation of the MSE (distortion measure) requires computation of the second moment of the reference frames. We propose a recursive procedure for the computation of both the expected value and second moment of the reference frames, which are together called the stochastic frame buffer. Furthermore, we propose a stochastic RD optimization method for selection of the optimal macroblock mode and motion vectors given the instantaneous packet loss probability. If available, channel feedback can also be incorporated into the proposed stochastic framework. However, the proposed framework does not require a feedback channel to exist, and when it exists, it does not have to be lossless. In the absence of any packet losses, the proposed stochastic framework reduces to the well-known deterministic RD optimization procedures. One possible application of the optimal stochastic framework would be for multicast streaming to an ensemble of receivers. Experimental results indicate that the proposed framework outperforms other available error tracking and control schemes.


Asunto(s)
Algoritmos , Artefactos , Redes de Comunicación de Computadores , Compresión de Datos/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Grabación en Video/métodos , Inteligencia Artificial , Análisis Numérico Asistido por Computador , Procesamiento de Señales Asistido por Computador , Procesos Estocásticos
5.
IEEE Trans Image Process ; 15(4): 1042-9, 2006 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-16579388

RESUMEN

We present a novel framework for lossless (invertible) authentication watermarking, which enables zero-distortion reconstruction of the un-watermarked images upon verification. As opposed to earlier lossless authentication methods that required reconstruction of the original image prior to validation, the new framework allows validation of the watermarked images before recovery of the original image. This reduces computational requirements in situations when either the verification step fails or the zero-distortion reconstruction is not needed. For verified images, integrity of the reconstructed image is ensured by the uniqueness of the reconstruction procedure. The framework also enables public(-key) authentication without granting access to the perfect original and allows for efficient tamper localization. Effectiveness of the framework is demonstrated by implementing the framework using hierarchical image authentication along with lossless generalized-least significant bit data embedding.


Asunto(s)
Gráficos por Computador , Seguridad Computacional , Compresión de Datos/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Etiquetado de Productos/métodos , Procesamiento de Señales Asistido por Computador , Algoritmos , Patentes como Asunto , Reconocimiento de Normas Patrones Automatizadas/métodos
6.
IEEE Trans Image Process ; 15(10): 3053-65, 2006 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17022269

RESUMEN

We propose a new adaptive filtering framework for local image registration, which compensates for the effect of local distortions/displacements without explicitly estimating a distortion/displacement field. To this effect, we formulate local image registration as a two-dimensional (2-D) system identification problem with spatially varying system parameters. We utilize a 2-D adaptive filtering framework to identify the locally varying system parameters, where a new block adaptive filtering scheme is introduced. We discuss the conditions under which the adaptive filter coefficients conform to a local displacement vector at each pixel. Experimental results demonstrate that the proposed 2-D adaptive filtering framework is very successful in modeling and compensation of both local distortions, such as Stirmark attacks, and local motion, such as in the presence of a parallax field. In particular, we show that the proposed method can provide image registration to: a) enable reliable detection of watermarks following a Stirmark attack in nonblind detection scenarios, b) compensate for lens distortions, and c) align multiview images with nonparametric local motion.


Asunto(s)
Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Almacenamiento y Recuperación de la Información/métodos
7.
IEEE Trans Image Process ; 15(10): 2879-91, 2006 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17022256

RESUMEN

There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.


Asunto(s)
Biometría/métodos , Interpretación de Imagen Asistida por Computador/métodos , Labio/fisiología , Lectura de los Labios , Movimiento/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Habla/fisiología , Algoritmos , Inteligencia Artificial , Análisis Discriminante , Humanos , Aumento de la Imagen/métodos , Almacenamiento y Recuperación de la Información/métodos , Labio/anatomía & histología , Software de Reconocimiento del Habla
8.
IEEE Trans Image Process ; 13(7): 937-51, 2004 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-15648860

RESUMEN

We propose measures to evaluate quantitatively the performance of video object segmentation and tracking methods without ground-truth (GT) segmentation maps. The proposed measures are based on spatial differences of color and motion along the boundary of the estimated video object plane and temporal differences between the color histogram of the current object plane and its predecessors. They can be used to localize (spatially and/or temporally) regions where segmentation results are good or bad; and/or they can be combined to yield a single numerical measure to indicate the goodness of the boundary segmentation and tracking results over a sequence. The validity of the proposed performance measures without GT have been demonstrated by canonical correlation analysis with another set of measures with GT on a set of sequences (where GT information is available). Experimental results are presented to evaluate the segmentation maps obtained from various sequences using different segmentation approaches.


Asunto(s)
Algoritmos , Inteligencia Artificial , Interpretación de Imagen Asistida por Computador/métodos , Movimiento , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Grabación en Video/métodos , Gráficos por Computador , Aumento de la Imagen/métodos , Almacenamiento y Recuperación de la Información/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Procesamiento de Señales Asistido por Computador , Validación de Programas de Computación
9.
IEEE Trans Image Process ; 11(5): 497-508, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-18244650

RESUMEN

Effective and efficient representation of color features of multiple video frames or pictures is an important yet challenging task for visual information management systems. Key frame-based methods to represent the color features of a group of frames (GoF) are highly dependent on the selection criterion of the representative frame(s), and may lead to unreliable results. We present various histogram-based color descriptors to reliably capture and represent the color properties of multiple images or a GoF. One family of such descriptors, called alpha-trimmed average histograms, combine individual frame or image histograms using a specific filtering operation to generate robust color histograms that can eliminate the adverse effects of brightness/color variations, occlusion, and edit effects on the color representation. We show the efficacy of the alpha-trimmed average histograms for video segment retrieval applications, and illustrate how they consistently outperform key frame-based methods. Another color histogram descriptor that we introduce, called the intersection histogram, reflects the number of pixels of a given color that is common to all the frames in the GoF. We employ the intersection histogram to develop a fast and efficient algorithm for identification of the video segment to which a query frame belongs. The proposed color histogram descriptors have been included in the ISO standard MPEG-7 after extensive evaluation experiments.

10.
IEEE Trans Image Process ; 12(6): 627-38, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-18237937

RESUMEN

We propose a system that employs low-level image segmentation followed by color and two-dimensional (2-D) shape matching to automatically group those low-level segments into objects based on their similarity to a set of example object templates presented by the user. A hierarchical content tree data structure is used for each database image to store matching combinations of low-level regions as objects. The system automatically initializes the content tree with only "elementary nodes" representing homogeneous low-level regions. The "learning" phase refers to labeling of combinations of low-level regions that have resulted in successful color and/or 2-D shape matches with the example template(s). These combinations are labeled as "object nodes" in the hierarchical content tree. Once learning is performed, the speed of second-time retrieval of learned objects in the database increases significantly. The learning step can be performed off-line provided that example objects are given in the form of user interest profiles. Experimental results are presented to demonstrate the effectiveness of the proposed system with hierarchical content tree representation and learning by color and 2-D shape matching on collections of car and face images.

11.
IEEE Trans Image Process ; 12(7): 796-807, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-18237954

RESUMEN

We propose a fully automatic and computationally efficient framework for analysis and summarization of soccer videos using cinematic and object-based features. The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection, and penalty-box detection. The system can output three types of summaries: i) all slow-motion segments in a game; ii) all goals in a game; iii) slow-motion segments classified according to object-based features. The first two types of summaries are based on cinematic features only for speedy processing, while the summaries of the last type contain higher-level semantics. The proposed framework is efficient, effective, and robust. It is efficient in the sense that there is no need to compute object-based features when cinematic features are sufficient for the detection of certain events, e.g., goals in soccer. It is effective in the sense that the framework can also employ object-based features when needed to increase accuracy (at the expense of more computation). The efficiency, effectiveness, and robustness of the proposed framework are demonstrated over a large data set, consisting of more than 13 hours of soccer video, captured in different countries and under different conditions.

12.
IEEE Trans Image Process ; 11(2): 135-45, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-18244619

RESUMEN

This paper describes a hierarchical approach for object-based motion description of video in terms of object motions and object-to-object interactions. We present a temporal hierarchy for object motion description, which consists of low-level elementary motion units (EMU) and high-level action units (AU). Likewise, object-to-object interactions are decomposed into a hierarchy of low-level elementary reaction units (ERU) and high-level interaction units (IU). We then propose an algorithm for temporal segmentation of video objects into EMUs, whose dominant motion can be described by a single representative parametric model. The algorithm also computes a representative (dominant) affine model for each EMU. We also provide algorithms for identification of ERUs and for classification of the type of ERUs. Experimental results demonstrate that segmenting the life-span of video objects into EMUS and ERUs facilitates the generation of high-level visual summaries for fast browsing and navigation. At present, the formation of high-level action and interaction units is done interactively. We also provide a set of query-by-example results for low-level EMU retrieval from a database based on similarity of the representative dominant affine models.

13.
IEEE Trans Image Process ; 12(8): 962-76, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-18237970

RESUMEN

This research presents a new model-based approach toward the three-dimensional (3-D) tracking and extraction of gait and human motion. We suggest the use of a hierarchical, structural model of the human body that introduces the concept of soft kinematic constraints. These constraints take the form of a priori, stochastic distributions learned from previous configurations of the body exhibited during specific activities; they are used to supplement an existing motion model limited by hard kinematic constraints. We use time-varying parameters of the structural model to measure gait velocity, stance width, stride length, stance times, and other gait variables with multiple degrees of accuracy and robustness. To characterize tracking performance, we also introduce a novel geometric model of expected tracking failures. We demonstrate and quantify the performance of the suggested models using multi-view, video sequences of human movement captured in a complex home environment.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA