Whatever happened to Mega Data Compression

ftg · March 25, 2004, 4:44am

I had posted a lengthy response in this thread early on, mostly in reply to Bytegeist’s “interesting” first post. But it went the way of the hamsters.

I dreaded typing it all back in, etc. But it appears the signal to noise ratio has finally recovered (ignoring the ternary hijack).

At this point I want to add just a couple of things:

Here’s the comp.compression FAQ. It explains a lot of the basics, makes fun of earlier “we can compress everything!” companies, etc. Some of the posters to this thread really need to read it.

While some compression methods tack on headers:

Not all do, e.g., many forms of RLE. You can do LZW without headers if you choose.
Even without headers, some files will get larger.

Note that you can’t iterate a lossy compression system. If you feed a jpeg into a jpeg compressor (dressing it up as an image first), the losses incurred will make recovery of the (single compressed) image impossible. All bits in the compressed image have become vital.

For things like jpeg compression, there is a quality setting (mistakenly taken as a “percentage” value by too many people). Set it to 75 and you get a nice approximation of the original. Set it to 20 and it’s probably going to look crappy. Set it to 5, you get a tiny file that is not going to look like the original. (Setting it to 100 means you don’t have a clue.) For audio files, you can tweak the sample rates and ranges. Etc. But it’s one pass.

These kind of companies have been popping up for a long time. A friend of mine in graduate school was brought in to consult for such a company. He knew it was a joke inside of 5 minutes. They were very unhappy with his analysis and adamant that they really could do it. That was the early 70s folks.

Why does the press give free advertising for these clowns? There is no journalism in American Big Media anymore. No reporter actually questions anything they are told.

Mangetout · March 25, 2004, 11:28am

A real compression tool wouldn’t use characters, probably, and might treat the data as a contiuous binary stream, or something else; the control ‘character’ in my example, had to be the first one in the file (i.e. it is part of the header, not the data), which is why you need Y or N. Also, I assumed (for the purposes of the example) that the entire data would be either compressed or uncompressed; if you want mixed segments in a single file, you’d have to do something like:

-Use segments of a fixed size (which means that the break points probably won’t be where you’d find them most convenient in terms of compressibility/or non)) - in this case, you won’t achieve the best possible compression simply because your compressible segments will contain uncompressible sequences and vice versa.

-Use a control ‘character’ that does not otherwise occur in your compressed data (this won’t happen naturally, so you have to make it happen by encoding all of the compressed and uncompressed data in such a way as that the control character doesn’t occur accidentally) - in this case, you’re essentially ‘wasting’ the potential of using the control sequence at every step.

-Give the file a header that explicitly maps the size, position and compession methodology of each segment - probably the best solution all round, but the map takes up space even if the entire content of the file is an uncompressible sequence.

The advantage of the map method is that you can analyse your segments and use a different compression methodology for each one, as appropriate - I think this is what the zip format actually does.

ultrafilter · March 25, 2004, 4:22pm

Yes, but if the entire file is uncompressed, the map is rather short.

Mangetout · March 25, 2004, 11:02pm

Indeed, but still longer than the original raw data.

Topic		Replies	Views
Whatever happened to Mega data compression? Factual Questions	3	1059	August 21, 2000
Data CD compression - 1.5 GB on 1 CD! How? Factual Questions	10	2454	October 13, 2002
Large Hardrives/Silly Compression/Fast Internet Factual Questions	12	1022	January 7, 2003
File compression: the general case Factual Questions	61	2290	November 2, 2001
Tell me about compressed sensing (Wired article inspired) Factual Questions	15	2148	March 28, 2010

Whatever happened to Mega Data Compression

Related topics