Straight Dope Message Board > Main Compression Algorithims--I'm clearly missing something
 Register FAQ Calendar Mark Forums Read

#1
08-04-2012, 09:38 PM
 Fenris Guest Join Date: Jan 2000
Compression Algorithims--I'm clearly missing something

Ok--lemme explain how I think compression algorithms work, then tell my question...and then someone can tell me why I'm wrong. I know my idea/question is flawed...I just don't know why.

Let's imagine you have a bmp file. It's a bunch of 1s and 0s. When you zip or rar a file, you're saying (and this is the baby talk version) "Take patterns of ones and zeros and convert them into (say) hexidecimal." So 111111 would be "AAA", 111110 would be "AAB" and so on. Then when you unzip, it does the reverse: AAA gets turned back into 111111 (and so on). So by turning every six digit group into a three digit group, you're reducing the file size by half.

So...my question: Assuming this is at least correct in a baby-talk version, why can't you take it further? Take every group where AAA is followed by AAB (AAAAAB) and convert it to "AAA" and so on.

It's obvious that it won't work, because eventually you could compress the Oxford English Dictionary into a single six (or however many) character string. So there's clearly a flaw in my logic. Also, if you try to zip a Zip file, it rarely gets even marginally smaller and sometimes gets bigger. So it doesn't work in the real world.

But....why not?

#2
08-04-2012, 09:44 PM
 ultrafilter Guest Join Date: May 2001
That's not all like how it works. Take a look at Huffman coding to get a feel for a pretty simple algorithm that works well on text.

ETA: LZW is another famous algorithm, but a bit more difficult to understand.

Last edited by ultrafilter; 08-04-2012 at 09:45 PM.
#3
08-04-2012, 09:47 PM
 astro Member Join Date: Jul 1999 Location: Taint of creation Posts: 28,348
See

http://www.howstuffworks.com/file-compression.htm

It's more complex than your description.

Last edited by astro; 08-04-2012 at 09:49 PM.
#4
08-04-2012, 09:47 PM
 RaftPeople Guest Join Date: Jan 2003
The more you compress something, the more the resulting data is lacking patterns that can be exploited by compression (it's more random), so you can only go so far.
#5
08-04-2012, 09:51 PM
 friedo Charter Member Join Date: May 2000 Location: Brooklyn Posts: 19,248
No, that's not how compression algorithms work. It doesn't make sense to convert strings of binary digits into letters, because those letters must themselves be stored as binary digits. Hexadecimal notation is just a human-friendly format for looking at binary data. You can't store things in hexadecimal; everything is ones and zeroes.

The most basic compression algorithm is Huffman coding, which works like this. Suppose you are compressing a text file that contains a lengthy English novel. It probably has the word "the" a lot of times. In binary (assuming we are using ASCII encoding) the word "the" amounts to the following 24 bits: 01110100 01101000 01100101. What you can do is put a little marker at the beginning of the file that assigns a special code, say 10000001, to stand in for the word "the." As long as the space that it takes to store the information about this code is less than the space saved by converting every "the" to 10000001, thus saving 16 bits, you have compression.

Huffman coding works by assigning the shortest codes to the things that appear most frequently. There are diminishing returns, because in addition to the original text, you have to store a "dictionary" mapping each special code to what it stands for. This is how you recover the original text. So you can only take it so far. It will work for natural-language texts which have something like a normal distribution of word frequency. It will not work for random data, since every symbol will appear with roughly equal frequency. Indeed, one of the ways to test a data stream for randomness is to try to compress it.

This is an example of lossless compression, because the exact original data is recovered after decompression.

There are also lossy forms of compression, which take advantage of the limits of human perception. JPEG image files, for example, are compressed from the original images by sacrificing some image detail. The algorithm is designed to sacrifice detail in such a way that the important bits (to the human eye) will be preserved well enough so the image looks pretty close to the original. You can never recover the original data from a lossy-compressed file, because information is lost in the process. Lossy compression for audio works the same way. With lossy compression, you can generally tune input parameters to specify how far you want to take it. If you turn it up too high, the result is eventually of such poor quality that it is useless.
#6
08-04-2012, 09:52 PM
 davidm Charter Member Join Date: Mar 2002 Location: Near Philadelphia PA, USA Posts: 5,445

To answer your question simply, compression works be eliminating redundancy in the data. Once you've eliminated all, or nearly all, of the redundancy, there's no way to compress it any further.

That's the simple answer. It's actually a little more complicated than that. Depending on the technique, you can reach a point of diminishing returns while there's still some remaining redundancy.
#7
08-04-2012, 11:46 PM
 septimus Guest Join Date: Dec 2009
Quote:
 Originally Posted by Fenris I know my idea/question is flawed...I just don't know why. ... "Take patterns of ones and zeros and convert them into (say) hexidecimal." So 111111 would be "AAA", 111110 would be "AAB" and so on.
If you think hexadecimal is fundamentally different from binary, that's your confusion right there.

Compression is all about probabilities. For example, to represent the 256 different 8-bit bytes you need 256 code words; these are represented most simply with 8 bits ... No Compression! But if one of the 8-bit bytes (say '7D') occurs 50% of the time, you represent it with a single bit (say 0), represent the other 255 possibilities with 9 bits each (say 1, followed by the byte) for an average cost of 5 bits ... 8:5 compression ratio!

In that example the input file was highly non-random (half its bytes were '7D') but the output will appear much more random. Random == Incompressible.

The above discussion applies to essentially all compression schemes though the idea is often hidden behind high-level reversible transformations which serve to prepare the data for such simple probability-based coding. You can read about a simple yet impressive example of such a transformation sometimes used on text files by Googling for Burrows-Wheeler Transform.
#8
08-05-2012, 12:09 AM
 TriPolar Member Join Date: Oct 2007 Location: rhode island Posts: 19,728
I was starting off on an explanation, but septimus has covered it pretty well. Compression consists of identifying patterns that can be encoded using fewer bits than the pattern uses. The encoded results tend to contain well distributed patterns that can't easily be encoded using fewer bits. Common compression utilities often check for recompression based on the internal format of a compressed file or directory information, and don't allow any actual recompression. But if you can find a compression program that allows recompression, or modify compressed file so they aren't recognized as such, you'll see that very little, if any, additional compression can be done after the first compression. Because the output contains additional internal format information, additional recompression may actually increase the size of the file.
#9
08-05-2012, 01:45 AM
 Derleth Guest Join Date: Apr 2000
Quote:
 Originally Posted by RaftPeople The more you compress something, the more the resulting data is lacking patterns that can be exploited by compression (it's more random), so you can only go so far.
True. This ties into the math behind compression, mostly in terms of the 'pigeonhole principle'.

The pigeonhole principle states that if you have five boxes to put things into, one thing per box, you can't stick six things into boxes.

In terms of compression, a fixed number of bits can only be put into so many specific arrangements (in specific, n bits can be arranged a total of 2n ways, so 256 ways for 8 bits), so if someone claims to have a compression algorithm that can always compress any file, they're a conman or an idiot because, for example, there are 4 possible 2-bit files (22 = 4) but only 2 possible 1-bit files (21 = 2). So, once you assign bit 0 to the file containing 00, and bit 1 to the file containing 01, what's the file containing 10 going to be assigned to?

We can actually prove the pigeonhole principle mathematically, but, really, it is that simple. Yet there still are people who can't quite grasp it.

So, the result is that there is no perfect compression algorithm: For any given algorithm, there are some files it can't compress at all and will leave the same size, and, likely, some files it will actually make larger. (That said, no piece of prose written in any human language cannot be compressed. Human languages are extremely redundant and can be compressed quite a bit with even primitive algorithms.)
__________________
"Ridicule is the only weapon that can be used against unintelligible propositions. Ideas must be distinct before reason can act upon them."
If you don't stop to analyze the snot spray, you are missing that which is best in life. - Miller
I'm not sure why this is, but I actually find this idea grosser than cannibalism. - Excalibre, after reading one of my surefire million-seller business plans.
#10
08-05-2012, 01:55 AM
 Indistinguishable Guest Join Date: Apr 2007
You know how, in Morse code, common letters like 'E' and 'T' have short codes, while uncommon letters like 'X' and 'Z' have long codes?

That's a good scheme. It's an efficient scheme. The encoded length of different messages is well-tailored to their frequencies.

If it was common for people to use some other, less-efficient scheme for encoding their messages as dots and dashes, you could improve upon it by translating into Morse code.

That's compression.

Last edited by Indistinguishable; 08-05-2012 at 01:59 AM.
#11
08-05-2012, 03:42 AM
 Jragon Member Join Date: Mar 2007 Location: Miskatonic University Posts: 7,057
Quote:
 Originally Posted by Derleth True. This ties into the math behind compression, mostly in terms of the 'pigeonhole principle'. The pigeonhole principle states that if you have five boxes to put things into, one thing per box, you can't stick six things into boxes.
Actually, the pigeonhole principle states merely that if you have six items and five boxes, one of the boxes must have multiple items. If that's unacceptable for your application, then you need more boxes/less items. But the principle itself doesn't make the claim that you "can't" put six items in five boxes. Just that each item can't have a unique box.
#12
08-05-2012, 07:09 AM
 Jragon Member Join Date: Mar 2007 Location: Miskatonic University Posts: 7,057
Isn't there a popular compression algorithm that uses something like a tree? I remember learning about something a couple years back where you'd have a tree like

Code:
[]
/  \
0      1
/         \
[A]         []
/\
0   1
/      \
[C]     [b]
Where the most common bit-strings (or in this case, letters) were on higher leaf nodes. So then the string "bac" (011000100110000101100011) would be represented as 11100.

Or is that just a simple implementation of Huffman coding? On review they seem pretty much the same.

Last edited by Jragon; 08-05-2012 at 07:12 AM.
#13
08-05-2012, 08:37 AM
 ftg Guest Join Date: Feb 2001
Quote:
 Originally Posted by Derleth So, the result is that there is no perfect compression algorithm: For any given algorithm, there are some files it can't compress at all and will leave the same size, and, likely, some files it will actually make larger.
This is without question the most important thing anyone who thinks you can just recompressing stuff needs to understand. If an algorithm can compress some files, then that intrinsically means that others have to get larger.

In general, non-random files (lots of repeating patterns, for example) compress well. Random files don't compress. The output of compression sort of looks random, so trying to compress it again is not going to work well.

In fact, one of the many definitions for how non-random string is is how well it can be compressed. (Formalized by Kolmogorov if you want to Google for more.)
#14
08-05-2012, 09:59 AM
 pulykamell Charter Member Join Date: May 2000 Location: SW Side, Chicago Posts: 25,361
Quote:
 Originally Posted by ftg If an algorithm can compress some files, then that intrinsically means that others have to get larger.
Do they have to? Couldn't worst-case scenario for an algorithm be that it stays the same size, or are you counting the compression/decompression utility as part of the file size?
#15
08-05-2012, 10:08 AM
 leahcim Member Join Date: Dec 2010 Location: New York Posts: 1,196
Quote:
 Originally Posted by pulykamell Do they have to? Couldn't worst-case scenario for an algorithm be that it stays the same size, or are you counting the compression/decompression utility as part of the file size?
Yes, they have to (at least for lossless compression). In order to be lossless, the compression must be reversible, which means that every bit string must be mapped to a unique compressed bit string. I.e. you can't have distinct bit strings X and Y, compress to the same bit string Z, since if you then try to decompress Z, how would you know whether to give back X or Y?

But there are fewer possible short bit strings than there are possible long bit strings, so since if each long bit string "consumes" the bit string it compresses into, making it unavailable for any other bit string to compress into, only so many long bit strings can compress into sort bit strings before you run out. The rest of the long bit strings have to "compress" to even longer bit strings.
#16
08-05-2012, 10:18 AM
 chrisk Charter Member Join Date: Nov 2003 Location: Southern ontario Posts: 5,657
Quote:
 Originally Posted by Fenris So...my question: Assuming this is at least correct in a baby-talk version, why can't you take it further? Take every group where AAA is followed by AAB (AAAAAB) and convert it to "AAA" and so on.
Other people have given lots of good information, but though your premise is partially flawed, this is a good question that hasn't been answered. The answer is that when decompressing the file, you need to know what to translate 'AAA' into - there can't be two different sequences of data that would both translate to 'AAA' in the same place. (Note, this rule does not apply to 'lossy' compression, where the output is not the same as the input data but considered 'close enough' in human terms - such as MP3 music or JPG photos.)

I definitely recommend going through and building a Huffman tree based on text input, if you want to understand lossless compression algorithms better. If the OP or anyone else is interested, I can summarize some instructions and make some example problems right here.
#17
08-05-2012, 10:20 AM
 TriPolar Member Join Date: Oct 2007 Location: rhode island Posts: 19,728
Quote:
 Originally Posted by pulykamell Do they have to? Couldn't worst-case scenario for an algorithm be that it stays the same size, or are you counting the compression/decompression utility as part of the file size?
Not necessarily the additional size of the utility, but as a practical matter some data must be added to identify that the contents are in a compressed format, even if that format is unchanged from the original.

It can be done by convention, for example changing the extension of a file. But it is advantageous to have compresses data recognizable from it's content. In all circumstances, without meta-data, any set of randomly selected data might end up being identical to the compressed result of another set of data. So even a signature format is insufficient in all cases to determine whether or not a set of data is compressed or not.

To have data that can be unambiguously identified as compressed, additional data must be added to the the compressed results. If you don't mind the ambiguity, worst case could be no change in size.
#18
08-05-2012, 10:27 AM
 TriPolar Member Join Date: Oct 2007 Location: rhode island Posts: 19,728
Quote:
 Originally Posted by chrisk Other people have given lots of good information, but though your premise is partially flawed, this is a good question that hasn't been answered. The answer is that when decompressing the file, you need to know what to translate 'AAA' into - there can't be two different sequences of data that would both translate to 'AAA' in the same place. (Note, this rule does not apply to 'lossy' compression, where the output is not the same as the input data but considered 'close enough' in human terms - such as MP3 music or JPG photos.) I definitely recommend going through and building a Huffman tree based on text input, if you want to understand lossless compression algorithms better. If the OP or anyone else is interested, I can summarize some instructions and make some example problems right here.
That just depends on whether the compression algorithm is content specific or not. So to use the morse code example, the frequency of letters is pre-established, and the most frequently used letters are encoded in fewer bits. The compressed data doesn't have to include any other information about how to decompress it since the same rules are used for any message.
#19
08-05-2012, 10:38 AM
 Fenris Guest Join Date: Jan 2000
Hey--I wanted to thank everyone for the info--the specific question I had was answered in posts 6 and 8, but the other posts on why my basic understanding of how compression works were great as well. I'm still reading and appreciate all the help!

Thanks!
#20
08-06-2012, 08:06 PM
 iamthewalrus(:3= Guest Join Date: Jul 2000
Quote:
 Originally Posted by Fenris So...my question: Assuming this is at least correct in a baby-talk version, why can't you take it further? Take every group where AAA is followed by AAB (AAAAAB) and convert it to "AAA" and so on.
This example involves taking a 6-hex-character string and encoding it as a 3-hex character string. There are a lot more 6-character strings than 3-character strings.

So, either

1. You're only using a small subset of the possible 6-character strings (small enough that there are few enough possibilities that each one can be encoded using just 3 characters). In this case, there's a lot of duplicated data in your set and you probably can compress it well.
-or-
2. You're going to "overflow" your short string supply, and you'll have to start using longer strings. Given the overhead required for compression, you eventually end up getting a larger file than you started with.

Point #2 is pretty easy to demonstrate with common compression algorithms. Take a file and zip it. Then zip that into another zip. Continue. At some point (probably around the 2nd or 3rd iteration), the files start getting larger.

If the above isn't clear, let's go back to binary. How would you encode all 2-character binary strings (00, 01, 10, 11) as 1-character strings (0, 1)? This is analogous to your question above. It's pretty clear that you can't do that, because there are 4 2-character strings and only 2 1-character strings.

Last edited by iamthewalrus(:3=; 08-06-2012 at 08:07 PM.
#21
08-06-2012, 08:21 PM
 Jragon Member Join Date: Mar 2007 Location: Miskatonic University Posts: 7,057
Quote:
 Originally Posted by iamthewalrus(:3= This example involves taking a 6-hex-character string and encoding it as a 3-hex character string. There are a lot more 6-character strings than 3-character strings. So, either 1. You're only using a small subset of the possible 6-character strings (small enough that there are few enough possibilities that each one can be encoded using just 3 characters). In this case, there's a lot of duplicated data in your set and you probably can compress it well. -or- 2. You're going to "overflow" your short string supply, and you'll have to start using longer strings. Given the overhead required for compression, you eventually end up getting a larger file than you started with. Point #2 is pretty easy to demonstrate with common compression algorithms. Take a file and zip it. Then zip that into another zip. Continue. At some point (probably around the 2nd or 3rd iteration), the files start getting larger. If the above isn't clear, let's go back to binary. How would you encode all 2-character binary strings (00, 01, 10, 11) as 1-character strings (0, 1)? This is analogous to your question above. It's pretty clear that you can't do that, because there are 4 2-character strings and only 2 1-character strings.
However, given the correct input, and with a compression algorithm with enough catch cases, you could probably compress a file this way if it doesn't contain, say, too many different 6 character strings.

Of course, this wouldn't be very good because it would be very difficult to find a file it works with, but in theory you could make a compression algorithm that works very, very well, but only on a very, very small number of inputs.
#22
08-06-2012, 08:50 PM
 Indistinguishable Guest Join Date: Apr 2007
Well, it's easy to make a compression algorithm that works very, very well on a very, very small number of inputs. E.g., I can pick four 10GB files of my choosing and compress them down to 2 bits each. And in counterbalance, other files will go up in size. On average, nothing will be saved.

Lossless compression is only possible to the extent that it compresses files on weighted average, weighted by how common those files are. And you can only compress files on weighted average to the extent that their frequencies are not in tune with their coding; that is, to the extent that they were originally stored under some memory-inefficient coding such that some shorter files are less common than some longer files.

Last edited by Indistinguishable; 08-06-2012 at 08:54 PM.
#23
08-06-2012, 09:12 PM
 Jragon Member Join Date: Mar 2007 Location: Miskatonic University Posts: 7,057
Quote:
 Originally Posted by Indistinguishable Well, it's easy to make a compression algorithm that works very, very well on a very, very small number of inputs. E.g., I can pick four 10GB files of my choosing and compress them down to 2 bits each. And in counterbalance, other files will go up in size. On average, nothing will be saved. Lossless compression is only possible to the extent that it compresses files on weighted average, weighted by how common those files are. And you can only compress files on weighted average to the extent that their frequencies are not in tune with their coding; that is, to the extent that they were originally stored under some memory-inefficient coding such that some shorter files are less common than some longer files.
I never said it was useful, just that what he was saying was technically doable so long as you keep control of the space of valid inputs.
#24
08-06-2012, 09:33 PM
 TriPolar Member Join Date: Oct 2007 Location: rhode island Posts: 19,728
Quote:
 Originally Posted by Indistinguishable I can pick four 10GB files of my choosing and compress them down to 2 bits each.
You'll likely need a 40GB decompression utility in order to use them again. The combined size of the decompression utility and the 8 bits of compressed data will end up being larger than the original files.
#25
08-06-2012, 09:35 PM
 For You Guest Join Date: Aug 2012
Quote:
 Originally Posted by Jragon Isn't there a popular compression algorithm that uses something like a tree? I remember learning about something a couple years back where you'd have a tree like Code: [] / \ 0 1 / \ [A] [] /\ 0 1 / \ [C] [b] Where the most common bit-strings (or in this case, letters) were on higher leaf nodes. So then the string "bac" (011000100110000101100011) would be represented as 11100. Or is that just a simple implementation of Huffman coding? On review they seem pretty much the same.
I think you are confusing search trees (either binary trees or balanced "B-Trees") with data compression. Search trees tend to compress time rather than data, by requiring less effort to locate a data record.

Last edited by For You; 08-06-2012 at 09:35 PM.
#26
08-06-2012, 09:37 PM
 pulykamell Charter Member Join Date: May 2000 Location: SW Side, Chicago Posts: 25,361
Quote:
 Originally Posted by For You I think you are confusing search trees (either binary trees or balanced "B-Trees") with data compression. Search trees tend to compress time rather than data, by requiring less effort to locate a data record.
But Huffman encoding uses a binary tree, too. I thought that's what he was remembering.
#27
08-06-2012, 09:38 PM
 TonySinclair Guest Join Date: Feb 2012
I'm looking for investors. I have perfected a compression algorithm that reduces any file, of any size, to ten bytes.

Still working on the decompression part.
#28
08-06-2012, 09:41 PM
 Jragon Member Join Date: Mar 2007 Location: Miskatonic University Posts: 7,057
Quote:
 Originally Posted by pulykamell But Huffman encoding uses a binary tree, too. I thought that's what he was remembering.
Yeah, on further reading that's what I remember. The main difference between a Huffman tree and a BST seems to be that a Huffman tree only has leaf nodes with values -- for ease of parsing reasons.

Quote:
 Originally Posted by TonySinclair I'm looking for investors. I have perfected a compression algorithm that reduces any file, of any size, to ten bytes. Still working on the decompression part.
What a coincidence, I have a decompression utility that will decompress any 10byte file into a picture of a kitten in a teacup.

Last edited by Jragon; 08-06-2012 at 09:43 PM.
#29
08-06-2012, 10:29 PM
 Indistinguishable Guest Join Date: Apr 2007
Quote:
 Originally Posted by Jragon I never said it was useful, just that what he was saying was technically doable so long as you keep control of the space of valid inputs.
Ah, alright.
Quote:
 Originally Posted by TriPolar You'll likely need a 40GB decompression utility in order to use them again. The combined size of the decompression utility and the 8 bits of compressed data will end up being larger than the original files.
Sure. Though if you let me design the machine architecture, I can get the decompression utility to be a built-in machine code of minimal length as well...

Indeed, every machine architecture defines its own notion of ultimate compression, "The smallest (or any) program that outputs this file". But this measure will be quite architecture-dependent.

Last edited by Indistinguishable; 08-06-2012 at 10:33 PM.
#30
08-06-2012, 10:33 PM
 Jragon Member Join Date: Mar 2007 Location: Miskatonic University Posts: 7,057
There's actually malware which is written almost entirely in meta-code. A small sized code snippet that does some often crazy stuff to write the actual malicious part on the fly. This is an example, albeit an incredibly rare and difficult to create one, of a file "compression" that produces exactly one bit-string as output.

Last edited by Jragon; 08-06-2012 at 10:34 PM.
#31
08-07-2012, 02:21 AM
 Mangetout Charter Member Join Date: May 2001 Location: Kingdom of Butter Posts: 47,502
Quote:
 Originally Posted by iamthewalrus(:3= If the above isn't clear, let's go back to binary. How would you encode all 2-character binary strings (00, 01, 10, 11) as 1-character strings (0, 1)? This is analogous to your question above. It's pretty clear that you can't do that, because there are 4 2-character strings and only 2 1-character strings.
This is the absolute nub of compression theory - and it's the bit people often either overlook or can't grasp because it's wrapped up in more complex terminology. There just aren't enough possible combinations in a small space to be able to describe every possible combination of a larger space.
#32
08-07-2012, 06:32 AM
 Fenris Guest Join Date: Jan 2000
Quote:
 Originally Posted by iamthewalrus(:3= If the above isn't clear, let's go back to binary. How would you encode all 2-character binary strings (00, 01, 10, 11) as 1-character strings (0, 1)? This is analogous to your question above. It's pretty clear that you can't do that, because there are 4 2-character strings and only 2 1-character strings.
Ah-HA!!

Thank you for that! I get it now in a way I didn't before. Really great answer.
#33
08-07-2012, 09:49 AM
 Chronos Charter Member Join Date: Jan 2000 Location: The Land of Cleves Posts: 47,920
That's nothing, Indistinguishable. I have a lossless compression algorithm that can take any file at all as input, and is guaranteed to never increase the file size, and which can be iterated such that it'll eventually get any given input file down to an arbitrarily small size.
__________________
Time travels in divers paces with divers persons.
--As You Like It, III:ii:328
#34
08-07-2012, 11:38 AM
 Indistinguishable Guest Join Date: Apr 2007
Kid, I like your idea of storing the file info in the number of times of compressed. It shows spunk. Dammit, it's just crazy enough to --

But, wait, what do you do if the input is already as short as possible?

If compression is reversible without increasing file size, it must be permuting the shortest possible files. Which means none are left available to send any other files to. Which means you'll never be able to get any other files down to that size, no matter how many times you compress.

Alas, even this slight claim is still too crazy to work.

Last edited by Indistinguishable; 08-07-2012 at 11:43 AM.
#35
08-07-2012, 02:56 PM
 Chronos Charter Member Join Date: Jan 2000 Location: The Land of Cleves Posts: 47,920
Well, then, obviously, you compress a 1-bit file into a 0-bit file.

EDIT: I knew I'd mentioned that scheme on the SDMB before, but I'd forgotten that the previous time was also in a discussion with you. I might have guessed, though.

Last edited by Chronos; 08-07-2012 at 02:57 PM.
#36
08-07-2012, 02:59 PM
 Indistinguishable Guest Join Date: Apr 2007
Quote:
 Originally Posted by Chronos Well, then, obviously, you compress a 1-bit file into a 0-bit file.
But what do you compress the 0-bit file into? You can't send it to itself if someone else (e.g., that 1-bit file) is being sent to it. But then its file size must increase...

Last edited by Indistinguishable; 08-07-2012 at 03:01 PM.
#37
08-07-2012, 03:06 PM
 Indistinguishable Guest Join Date: Apr 2007
That is, there's a fundamental constraint on injective (i.e., lossless) maps: if no files go up in size, then no files go down in size either. The only way to move some files lower is by displacing some lower files higher.

Last edited by Indistinguishable; 08-07-2012 at 03:08 PM. Reason: Which, alas, kills the joke

 Bookmarks

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is Off HTML code is Off
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home Main     About This Message Board     Comments on Cecil's Columns/Staff Reports     Straight Dope Chicago     General Questions     Great Debates     Elections     Cafe Society     The Game Room     In My Humble Opinion (IMHO)     Mundane Pointless Stuff I Must Share (MPSIMS)     Marketplace     The BBQ Pit Side Conversations     The Barn House

All times are GMT -5. The time now is 06:30 AM.