Does anyone remember a big flap being made a few years ago (2000?) about some fellow who had come up with an algorithm for ultra-compression of data? At the time they claimed it was going to revolutionize computing - fit a whole movie on a standard data CD, get T1 speeds on a dial-up line, that sort of thing.
I was quite dubious about this, in that 1) it was too big of a leap in technology, 2) the fellow was not an expert in the field, and 3) he was keeping it “secret” until he could commercialize it. He had let some expert have a peek at it, and the expert had said it could do what he said it would.
That’s about all I remember. Does anyone know what this was, and what ultimately happened to it?
As far as I remember, it had all the markings of a scam, like those free energy machine things that pop up now and again, e.g. the claims were along the lines of:
-This will turn the world upside down and overturn the known laws of physics
-It uses methodologies that are entirely novel
-No, I can’t go into detail about that
-I have a working prototype that has been demonstrated and found convincing by this list of obscure nobodies
-I need funding to continue the work; invest now and make meeelllions of dollars!
It appears to have faded into obscurity, which is also typical of scams; whether they actually separated anyone from their money, I don’t know. I do recall that they were making outlandish claims about the capabilities of the software and I seem to remember that there was some reason to suspect that their test data was tailor-made and would be highly compressible by conventional methods.
It’s important to notice that there is no general compression. If you identify certain patterns in your data, like repeating parts, you might be able to take advantage of those. However no compression algorithm can compress all data, not even on average for multiple samples. Any meaningful definition of “compressing random data” would require just that. Compression of the output of a given pseudo-random number generator is possible but absolutely pointless.
Contrary to popular belief “only compressing parts that are smaller after compression” doesn’t help either.
I was dumbfounded during the whole ZeoSync flap - why were they getting so much press? Why would a news source report their claims as possibly factual?
It’s equivalent to my claiming that I have a shipping container, into which you can put 100 cubic meters of stuff, but the full container will then have a volume of only 1 cubic meter. This would be great for cargo! Would anyone believe it?
It’s mathematically impossible for a compression algorithm to also be lossless — that is, produce a decompressed compressed version of the data identical to the original input data, for all possible input. If ZeoSync was claiming to have achieved this, then they certainly deserved a good horse laugh and a quick dive into obscurity.
However, there are plenty of lossy compression algorithms that will compress ordinary audio and video quite well, as we see in JPEG, MPEG, MP3, and so on. I’ll grant you that an input stream of white noise would defeat such algorithms, just like any other — but then, who wants to listen to white noise?
If we take lossy compression of random (!) data into consideration, I have an idea for an algorithm with constant zero bit output size, but I won’t talk about it unless my lawyer is present.
Because they were smart enough to have anticipated the objections from those qualified enough to see through it and had pre-spun a web of plausible and relevant bullshit - this wasn’t any kind of sensible rebuttal, but was enough to establish the impression that they had answered the critics.
Can you explain this? I thought the point of compression algortithms was to only “compress” parts of data that would end up smaller than the original. Why would a compression algorithm compress something to a larger size?
Perhaps because compression involves putting the data into a ‘container’, which carries overhead of itself; otherwise how could the decompression algorithm know that it was dealing with a file that the compression algorithm couldn’t compress and had decided to leave alone?
Apparently someone thinks we do. I and about 5000 other concertgoers arrived at Red Rocks Amphitheatre near Denver when they opened the gates so that we could get good spots for the Grateful Dead. We got seated and started to relax and dropped some nice acid and then…
They roared white noise at us thru the speakers for three hours, continuously. So loud that it was difficult to hold a conversation.
It’s not immediately intuitive, but if you can understand the answer to this question, you’ll know more about data compression than 99.9% of the general population.
compression is TOTALLY dependent on what the data is like, some stuff can barely compress at all, other things can be compressed to a billionth of the size.
I mean if you have three billion gigabyte files. and need to send them compressed. you can just have a compression algorithm that compresses them to 1,2 and 3… and it will be a great algorithm… for that set of data, and suck for any other… plus the decoder needs to have the files already.
Well, since this is GQ please educate me. I can understand that if you attempt to compress an uncompressable file it will end up a bit larger by a few bytes. If that is all it is, it seems like a trivial case. Any reason that a file that can be more than trivally compressed would ever create compressed strings that are longer than the original data?
The original statement didn’t mention “overhead” costs, just that:
I am by no means an expert on compression algorithms, but this is either a pedantic point or I am more clueless than I thought.