Ok–lemme explain how I think compression algorithms work, then tell my question…and then someone can tell me why I’m wrong. I know my idea/question is flawed…I just don’t know why.
Let’s imagine you have a bmp file. It’s a bunch of 1s and 0s. When you zip or rar a file, you’re saying (and this is the baby talk version) “Take patterns of ones and zeros and convert them into (say) hexidecimal.” So 111111 would be “AAA”, 111110 would be “AAB” and so on. Then when you unzip, it does the reverse: AAA gets turned back into 111111 (and so on). So by turning every six digit group into a three digit group, you’re reducing the file size by half.
So…my question: Assuming this is at least correct in a baby-talk version, why can’t you take it further? Take every group where AAA is followed by AAB (AAAAAB) and convert it to “AAA” and so on.
It’s obvious that it won’t work, because eventually you could compress the Oxford English Dictionary into a single six (or however many) character string. So there’s clearly a flaw in my logic. Also, if you try to zip a Zip file, it rarely gets even marginally smaller and sometimes gets bigger. So it doesn’t work in the real world.
But…why not?