This is a new problem, but I think it’ll take a lot of the same thought as the last one did. And, since we had so much fun with that one, it’s about time for some more!
The Lempel-Zev-Welch algorithm is used to compress a data string. Since it turns out to be so efficient for raster data, it’s the algorithm employed by the popular GIF image format. Without going into the details, it converts a sequence of numbers into another sequence of numbers. If the original numbers are all of the range 0-3, for instance, then the first number of the output sequence will be of the range 0-5, the second will be of the range 0-6, the next of the range 0-7, then 0-8, and so on, for as long as is needed.
GIF specifications take the compression one step further with something called variable-length codes. Basically, it only allocates for each number the minimum number of bits required. For the first one, which is in the range 0-5 (000 to 101) three bits are needed, so we might refer to it as AAA. Similarly this is true for the second and third numbers as well, which we’ll call BBB and CCC. For the fourth number, however, since its range is 0-8 (0000 to 1000) four bits are allocated, so it’s referred to as DDDD. This is true up to the eleventh number, KKKK, which has a range 0-15. The next number, LLLLL, has five binary digits, since its range is 0-16. I hope that’s clear, but I’m not sure about it. Ask me for a better explanation. Anyway, these are sequenced one after the other, with the most significant bit first, into the following sequence of 8-bit bytes:
CCBBBAAA
EEEDDDDC
GGGFFFFE
IIIHHHHG
KKKJJJJI
MMLLLLLK
NNNNNMMM
and so on. Now the question. Is there a better way? Something (Tim) said in the other Pure Math thread about wasted bits got me thinking, there’s no way that AAA = 110, for instance. So, while no individual bit is wasted, the fact that there are bit sequences that will never appear make me think that there’s a more efficient way to turn the variable-length numbers into the final sequence of bits. Any thoughts?