Curious compression: gerald

gerald_duck

Curious compression

Feb 25, 2009 23:52

Earlier today, nunfetishist introduced me to the concept of bijective compression. Apparently, this means that for any dataset x, compress(decompress(x))=x, i.e. any dataset is a well-formed compression and decompresses to something that compresses back to itself.

That's not true of normal compression algorithms: while decompress(compress(x))=x is a fundamental requirement of any lossless compression algorithm, dd if=/dev/urandom bs=1024 count=1 | zcat, for example, is extraordinarily unlikely to work.

(As an aside, the name feels wrong: a bijection is simply a function where every element in the domain (in this case, arbitrary datasets we want to compress) maps to exactly one element in the range (in this case, valid compressed datasets). By my understanding, every lossless compression scheme is bijective and the ones under discussion ought to be called "permutative compression algorithms" because the range and domain are the same.)

Bijective compressions are allegedly useful in cryptography because it's not possible to use membership of the compressor's range as a test for valid (compressed) plaintexts - any dataset is a valid compressed plaintext.

Then a thought occurred to me. If I were the first person to have had it I'd be astonished, but I can at least claim independent invention.

I came up with the idea of involutive compression: a bijective compression algorithm where the compression and decompression functions are the same; a single function f that returns a small, high-entropy dataset if fed a large low-entropy one and a large low-entropy dataset if fed a small high-entropy one.

Although it might seem implausible, such compression functions are possible. To pick a trivial example, consider a function that run-length encodes up to 128 trailing spaces in a file by stripping them and placing a top-bit-set byte at the start to say how many were removed. Thus the complete behaviour of the function f(x) would be:

Let n be the number of trailing spaces, up to a maximum of 128
Remove n bytes (all of which are spaces) from the end of x
If the top bit of x's first byte is set (assuming x isn't empty by this stage) remove that byte and let m be its seven low-order bits, plus one; otherwise, let m be 0
If n>0 prepend a byte with value 128+(n-1)
Postpend m spaces.

That's the existence proof; presumably there are much better involutive compression functions out there, somewhere.

Do involutive compression algorithms have a use? Apart from as a curio, the only one I can think of right now is saving the user the bother of specifying whether they want to compress or uncompress - pretty tenuous.