Búsqueda | Portal de Búsqueda de la BVS

Compression for Quadratic Similarity Queries: Finite Blocklength and Practical Schemes.

Steiner, Fabian; Dempfle, Steffen; Ingber, Amir; Weissman, Tsachy.

IEEE Trans Inf Theory ; 62(5): 2737-2747, 2016 May.

Artículo en Inglés | MEDLINE | ID: mdl-29398721

RESUMEN

We study the problem of compression for the purpose of similarity identification, where similarity is measured by the mean square Euclidean distance between vectors. While the asymptotical fundamental limits of the problem - the minimal compression rate and the error exponent - were found in a previous work, in this paper we focus on the nonasymptotic domain and on practical, implementable schemes. We first present a finite blocklength achievability bound based on shape-gain quantization: The gain (amplitude) of the vector is compressed via scalar quantization and the shape (the projection on the unit sphere) is quantized using a spherical code. The results are numerically evaluated and they converge to the asymptotic values as predicted by the error exponent. We then give a nonasymptotic lower bound on the performance of any compression scheme, and compare to the upper (achievability) bound. For a practical implementation of such a scheme, we use wrapped spherical codes, studied by Hamkins and Zeger, and use the Leech lattice as an example for an underlying lattice. As a side result, we obtain a bound on the covering angle of any wrapped spherical code, as a function of the covering radius of the underlying lattice.

Compression for Quadratic Similarity Queries.

Ingber, Amir; Courtade, Thomas; Weissman, Tsachy.

IEEE Trans Inf Theory ; 61(5): 2729-2747, 2015 May.

Artículo en Inglés | MEDLINE | ID: mdl-29375151

RESUMEN

The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on the compressed data. For a Gaussian source, we show that the queries can be answered reliably if and only if the compression rate exceeds a given threshold-the identification rate- which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution.

Compression for Similarity Identification: Computing the Error Exponent.

Ingber, Amir; Weissman, Tsachy.

Proc Data Compress Conf ; 2015: 413-422, 2015 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-29046895

RESUMEN

We consider the problem of compressing discrete memoryless data sequences for the purpose of similarity identification, first studied by Ahlswede et al. (1997). In this setting, a source sequence is compressed, where the goal is to be able to identify whether the original source sequence is similar to another given sequence (called the query sequence). There is no requirement that the source will be reproducible from the compressed version. In the case where no false negatives are allowed, a compression scheme is said to be reliable if the probability of error (false positive) vanishes as the sequence length grows. The minimal compression rate in this sense, which is the parallel of the classical rate distortion function, is called the identification rate. The rate at which the error probability vanishes is measured by its exponent, called the identification exponent (which is the analog of the classical excess distortion exponent). While an information-theoretic expression for the identification exponent was found in past work, it is uncomputable due to a dependency on an auxiliary random variable with unbounded cardinality. The main result of this paper is a cardinality bound on the auxiliary random variable in the identification exponent, thereby making the quantity computable (solving the problem that was left open by Ahlswede et al.). The new proof technique relies on the fact that the Lagrangian in the optimization problem (in the expression for the exponent) can be decomposed by coordinate (of the auxiliary random variable). Then a standard Carathéodory - style argument completes the proof.

Compressing Tabular Data via Pairwise Dependencies.

Pavlichin, Dmitri S; Ingber, Amir; Weissman, Tsachy.

Proc Data Compress Conf ; 2017: 455, 2017 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-29046897

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA