Random patterns

ultrafilter · December 12, 2003, 4:48pm

There are 2[sup]n[/sup] binary strings of length n, and 2[sup]n[/sup] - 1 strings of length less than n. Therefore, at least one string of length n is uncompressible.

The_Great_Unwashed · December 12, 2003, 5:44pm

Good one – so that’s an existence proof for incompressibility.

Two further questions (feel free to go tell me to read a book).

Accepting the correspondence that you indicated earlier (random strings are uncompressible), does it follow that uncompressible strings are random?

I ask because we could assign the 2[sup]n[/sup]-1 “codewords” shorter than n to the “lowest” the 2[sup]n[/sup]-1 strings, leaving just 1111…1111 as “uncompressible”, but certainly not random.
2) These uncompressible binary strings, they are orderable right? Consider the lowest uncompressible binary string of length n, call that U[sub]1/sub, er, isn’t that a kind of compression? (I’m sure I’m appealing to the same fallacy used in the proof of “All numbers are interesting”).

ultrafilter · December 12, 2003, 5:50pm

Yes, since that’s the definition I quoted at you earlier–a string is random iff it’s incompressible.

The encoding scheme doesn’t matter; 111…111 is compressible no matter what.

**

There is an order, but you can’t use that to generate them due to the fact that randomness is an undecidable property–how would you know when you’d hit the first one?

The_Great_Unwashed · December 12, 2003, 9:24pm

Forgive me ultrafilter, I undersand that we could define randomness to mean uncompressibility, but really I’m asking if we’d want to do so? It seems more than definitional to state that random strings are uncompressible, I just don’t “see” why it is necessary that uncompressible strings are random.
On another matter, of course the encoding method has got something to do with it, my example demonstrates that much surely? Here’s what I shall call S5, which is the set of binary 5-tuplets and they’re endoding according to a scheme

00000->0
00001->1
00010->00
00011->01
00100->10
00101->11
00110->000
00111->001
01000->010
01001->011
01010->100
01011->101
01100->110
01101->111
01110->0000
01111->0001
10000->0010
10001->0011
10010->0100
10011->0101
10100->0110
10101->0111
10110->1000
10111->1001
11000->1010
11001->1011
11010->1100
11011->1101
11100->1110
11101->1111
11110->NULL (I believe we probably should have excluded that one, oops)
and hence
11111->UNCOMPRESSIBLE

It might be a lousy encoding scheme (it’s certainly a lousy compression scheme – its codewords lack terminal markers for one), but it’s an encoding scheme.
Whatever, suppose you wish to argue this, then consider this string from S1000000000

10011011001011010001011010001011011010… …111000101100100101100101000010101110110

How do we compress that? Sure we might have some whizzo, nth-order, super-dictionary-optimised, fractometric, babbatronic compression-engine (batteries not included), but if I tell you that they just happen to be the second billion binary digits of my e-mangled-pi, this algorithm is looking pretty clunky.

And let me try again to ask the question that I am struggling to form: what’s the difference between random data and data that we cannot compress (I don’t mean data that is incompressible, I mean data that we do not know sufficiently enough about to be able to discern any “pattern”)?

ultrafilter · December 12, 2003, 10:04pm

Because that’s what we define random to be. If the information content of the string is not shorter than the string itself, it’s random. What other definition would you use?

**

On another matter, of course the encoding method has got something to do with it, my example demonstrates that much surely? Here’s what I shall call S5, which is the set of binary 5-tuplets and they’re endoding according to a scheme

00000->0
00001->1
00010->00
00011->01
00100->10
00101->11
00110->000
00111->001
01000->010
01001->011
01010->100
01011->101
01100->110
01101->111
01110->0000
01111->0001
10000->0010
10001->0011
10010->0100
10011->0101
10100->0110
10101->0111
10110->1000
10111->1001
11000->1010
11001->1011
11010->1100
11011->1101
11100->1110
11101->1111
11110->NULL (I believe we probably should have excluded that one, oops)
and hence
11111->UNCOMPRESSIBLE

It might be a lousy encoding scheme (it’s certainly a lousy compression scheme – its codewords lack terminal markers for one), but it’s an encoding scheme.**

The thing you’re missing here is that the length of your decoder counts as part of the length of the shortest representation. In this case, no string of length 5 is at all compressible.

I may not have been clear on that point, so let me state it unequivocally: the information content of a string is the length of shortest program/input combination that will print out that string.

**

Random data is data that cannot be compressed by any compression method, in the sense above. That’s the definition.

Data that we can’t find a compression algorithm for is just that: data that we can’t find a compression algorithm for.

The concept here is Kolmogorov complexity. Search on that and you’ll find enough to keep you busy for a long time.

Topic		Replies	Views
Non-lossy compression implications Factual Questions	21	1513	April 4, 2007
How does file compression work? Factual Questions	38	2435	August 3, 2002
Definition of randomness Factual Questions	31	1215	February 13, 2003
One use of calculating Pi! Cecil's Columns/Staff Reports	56	3331	June 30, 1999
Another Pi question Factual Questions	46	2139	December 24, 2002

Random patterns

Related topics