Odds concerning monkeys and the lottery

It depends on the “randomness” of whatever is generating your output. With monkeys typing, you could be unlucky because they could produce the same wrong output many times before they got around to the right output. In other cases where you’re systematically exhausting possibilities, you’re guaranteed to hit the right output once before you hit the time limit because you never repeat anything.

This is a common metric for people who brute-force encryption keys. If you’re systematically testing each encryption key, then you will eventually find it. You can’t get unlucky and have to wait longer than your max limit because you design your algorithm not to repeat trials. In these cases, the rule of thumb is that you will probably find the right key when you’ve run through half the key space. You might have to test them all, but you might hit it in the first try. This measure of “likely” is only really relevant if you’re trying to brute force a lot of keys and want an average requirement. Since you’re going to hit some early and some late, your average requirement is testing half the possibilities.

On the otherhand, with monkeys typing there is no guarantee. Their quality control is notoriously lax.

You’re correct. No particular sequence must come up in a finite length of time. Even with an infinite amout of time, it isn’t guaranteed (according to traditional probability theory). This was the point ultrafilter stated more formally.

True. I was changing the rules a bit. But then, you’d need a copy of Hamlet anyway to determine if one of the monkeys had actually recreated the play.

The initial argument seemed to be that enough randomness will eventually create any desired result. My addition of “guided randomness” will simply help randomness arrive at the final result much quicker.

Besides, you wouldn’t really need a copy of Hamlet to help guide the monkeys along. Even some simple rules would speed things up considerably (such as tossing out a “v” that followed a “g” or a third “p” in a row).

Couple of thoughts (I’m not a statistics guru):

  1. How are the odds changed if you give the monkeys a keyboard with only 27 keys on it (letters plus a space bar)?

  2. The odds are equally likely that they’d type Hamlet on their first try as any other, correct?

  3. In the “Yoshitsune” example (gotta remember that name), the time given is the max, right? Is there any way to calculate the expected time? I would think that there is a 50% chance your name would be found in half the time. Is this correct?

If you want to actually calculate the odds, you have define what character set you’re using. If you allow the monkeys to type a lot of characters which aren’t part of the desired output, the odds go up. If you limit them to the characters in the output you desire, the odds are better simply because you’re excluding the possibility of characters which would contribute to a failure. Of course, if you’re going to be picky about capitalization and punctuation, you’re dealing with more than 27 characters, more like 56+ depending on the actual punctuation marks which appear in Hamlet.

Yes.

See my earlier post. The time is the max if you’re preventing the monkeys from ever duplicating their work. This requires highly trained monkeys with good memories. If you’re restricting your trials so they’re unique (no repeats) then, by definition, there is a 50% chance you’ll hit in the first half of the trials.

You’re absolutely right, the 4 million years was just a rough guess of about how long it would probably take, within an order of magnitude or so. It could take much longer, and there’s no guarantee that it will happen in any finite amount of time.

Regarding ultrafilter’s post: while it’s true that given an infinite number of possible outcomes, there is a there is a chance that a particular outcome will never happen, even given an infinite amount of time, I don’t think it applies here. If you just look at Hamlet-sized blocks of text (about 200,000 characters), then the number of possible samples is finite, though extremely large. Given an infinite amount of time, therefore, a perfect copy of Hamlet would have to come up at some point.

(My favorite short story is Library of Babel. Does it show?)

That’s an interesting point. However, I don’t think it happens here, since it’s ‘possible’ that the monkey hits ‘a’ every time. Then he never gets Hamlet. IIRC The monkey typing ‘aaa…’ is an example of what ultrafilter said: it has probability 0, yet can happen. So is any other sequence of letters.

Well, I’m having a hard time getting my mind around this. That seems to violate the nature of infinity that I’ve come to know and love. The idea that something with a probability of 1 may not ever occur in an infinite number of tries is curious enough, but the idea that something with a probability of 0 can somehow occur is just… NO WAY!

The only thing that keeps me from snorting in indignation at the very notion is that I’ve been fooled by the nature of infinity before.

Yeah, it’s bad to intuit about the nature of infinitary processes. They’re weird.

Would it help if I told you that every infinite sequence of characters has probability 0?

Your chances of winning the lottery are only marginally improved by actually purchasing a ticket.

To the OP, may I suggest reading How to Lie With Statistics? Additionally, The Cartoon Guide to Statistics is very good, and Calculated Risks is super (IMO).
But don’t let that stop you from posting questions!

Let me recount the reasoning in a pseudo-Socratic dialogue because I’m bored :slight_smile:

Shade: Let’s see. The probability of ‘aaaa…’ must be zero because it’s so unlikely that all the letters are ‘a’.
Shade: Does everything have probability zero?
Shade: No, something must happen. Umm… ummm… yeah, ‘abadfijzasetpq…’ might be better.
Shade: But you’re actually thinking of all sequences like that. If you pick THAT sequence (assuming we know how an infinite sequence ends), it must be probability 0; it’s not more likely that the sequence starts ‘abad’ than ‘aaaa’ [ed: if you don’t get this bit, read one of the many dialogues on the subject :)] so why should it be different for long ones?
Shade: OK, OK, so any sequence is prob 0.
Shade: It must be!
Shade: But… SOMETHING must happen!
Shade: Well, yes. Something with infinitessimal probability.
Shade: Umm…
Shade: Colloquially.
Shade: OK. And all these zeros add to one?
Shade: I don’t like it, but what else could happen?
Shade: AHA! It’s like a line: the length of any point is zero, but they can be added up to get a length!
Shade: Yeah! Right! I think…
Shade: Cool.
Shade: But I bet the maths works out wrong. You can’t deal with infinities that easily.
[Current state of mathematical research in the world ex machina]: Actually, I did the maths earlier and it it wasn’t easy, but eventually I got that. Your intuition was, surprisingly, right.
Shade: But… my intuition also says other stuff. Shouldn’t we be dealing wih infinitessimal (‘just’ over zero) numbers?
[maths]: Well, maybe. But trust me, the first way works out more useful in the end. Go with it.
Shade: OK.

But it’s not 0, it’s damn close to 0 but not 0. Or to put it in terms of the little calculus that I remember, it approaches zero as the sequence’s length approaches infinity. Add up an infinite number of things damn close to 0 and you get 1, I guess.

Firstly, sorry - my internal dialogue was just supposed to show the thought process, not be perfectly rigorous. Perhaps I should have made clear exactly what is normally accepted.

But I assure you that by far the standard and almost universal way of defining probabilities, the probability of an event is a real number.

If so, the probabilities in this case can be consistently defined, and the prob. of ‘aaa…’ would be 0, but it ‘can still occur’. I’m sure you could define probabilities using infinitessimals (numbers ‘damn close to zero’ ie. not zero, but between zero and any real number) somehow, which can be defined in terms of limits iirc, but I suspect that doing so is a can of worms.

Also, remember the ‘it approaches’ type argument is risky. For instance, P(first letter is ‘a’), P(first two letters are ‘a’), P(first three letters are ‘a’) etc. approach 0, and P(all letters are ‘a’) is ‘close’ to 0. But P(first three letters contain strings of 'a’s of any finite length) is 0, and is if we replace ‘three’ by anything, but P(all letters contains strings of 'a’s of any finite length) is close to 1. Sorry I don’t have a better example, but basically, I’ve learnt the hard way ‘be very careful with infinity’.

I’m afraid I don’t know what you’re saying there, but I do agree that one should be very careful with infinity.