File compression: the general case

Quoth DaveW:

Y’know, it occurs to me that this is possible, and I know how to do it. The compression algorithm is “decrement”, and the decompression algorithm is “increment”. Give me any bitstring, and I can eventually reduce it to 100 bits or less with consecutive decrements, and restore it with consecutive increments. Of course, I have to know how many times to run my decompression program, but that’s a minor detail :wink:

I’m quite sure I’ve got erislover’s little scheme figured out, and it’s kid stuff. I might even be a step ahead of him.

The basic premise is that any number you want can be derived from a ratio of two rational numbers. And it’s a rather simple iterative process to find the proper ratio.

Take for example pi, 3.14159…. You can derive the first three digits using the ratio 22/7. If you want more digits, you futz around with the numerator or the denominator until the result has the digits you want. Pi to four digits? Easy - 43.98/14 gives the first four digits. 5 digits? No problem – 43.98/13.9995.

So how do you make a compression scheme with that?[list=1][li]take any file in it’s binary form and using a series of transformations, change it’s linear bitstream into a single base 10 number. For example, a 1k file (1024 bytes) has 8192 bits.[]convert the bits to 4 bit words, each of which is a hexdecimal value, and then converting each successive pair of hexdecimal values to a base 10 value (000-255).[]Remove any separators between these 3-digit base 10 values (000-255)and mash them together side by side, resulting in a 3072 digit number.Now, you just pretend this is the first 3072 digits of some irrational number you[/li]want to approximate with ratios, and go about finding the ratio that fits.[/list=1]Maybe it will be something like a 55 digit numerator with 34 digit denominator. As long as the total number of digits in the numerator and denominator can be expressed in less than 8192 bits (1k), compression has been achieved.

And if you want a few free lunches out of all this, take this numerator and denominator and arrange them into a single 89 digit number (a decimal point might be a handy separator here, to distinguish the numerator from the denominator) and apply step four a few times.

For decompressing, I suppose it would be valuable to know how many times you’ve got to run your compressed number through the loop backwards. So upon the first go round of compression, you’d do something clever like restrict the values used in the initial ratio to whole numbers instead of decimal values. Then, when decompressing you would just keep passing backwards through the loop until neither the numerator or denominator had decimal values.

Like erislover asserted, this scheme should work well with any file type since it starts with the fundamental 0s and 1s instead of the compiled source code.

Although I came up with this independantly in about five minutes, I’m sure I’m not the only guy in the world to have thought of this so I won’t claim to be a genius. And I’ve never done any studying of data compression either. Wouldn’t it be funny if I just figured out the core algorithm for ZIP?

If I get a nasty-gram from their attorneys, I’ll let you know.