Expected frequency distibution of randomly selected numbers.

I know nothing of statistics, so this may be a dumb one.

I was thinking about the big lottery tonight and the fact that even with the large number of tickets sold it may still be possible that there would be no winner. But tomorrow’s headline will likely announce that there was more than one winning ticket sold.

So I started to wonder about the “expected” distribution of randomly selected numbers. If I pick at random ten numbers from 0 to 9, I wouldn’t expect to get one of each number but rather two or three incidences of some numbers and no incidences of other numbers (or would I?) In fact, if I selected 20 numbers from 0 to 9, I still wouldn’t expect exactly 2 of each number. But if I selected 1MM numbers from 0 to 9, I’d expect that the distribution would be fairly flat, at least in relation to the number of numbers selected.

Does there exist a mathematical formula that describes the expected distribution of random numbers for a number of selected numbers (n?) within a range (0…X)? And yes, I realize that there is no way to use this information to up one’s chances in the lottery because such a formula would not predict which numbers would be more selected.

Yes, there are formulae for this, which I’ve forgotten. If you know the total number in the sample, and you know that your random numbers are truly random and independent, you can figure out what the chances are for any possible result to come up a certain number of times. Obviously the “expected” number of times doesn’t help you too much, since if the numbers are truly random each result would have the same expected frequency.

Let’s take a possible range of one to two. A random integer within this range will be generated. Your sample size is 100. Your expected number of ones will be 50 and twos will be 50. You can calculate the chances against all of them being ones by taking 2 to the 100th power. But as I’ve said, I’ve forgotten all the details. Your final answer should give you a standard deviation. [Slowly realizing I have written a totally useless essay…]

Flip a coin 10 times and the most likely ratio of heads to tails will be 5:5. The probability of this is just under 25%. This is slightly more likely than getting 4 heads and 6 tails (or 6 heads and 4 tails), which in turn are slightly more likely than . . . [you get the idea]

Flip a fair coin n times. The probability of getting exactly x heads and (n-x) tails is n! / x! * (n-x!) * 2^n

n! is defined as 123*…(n-2)(n-1)*n

The formula is different for events that have other than 50% probability.

Chapter six of The Feynman Lectures on Physics has a very lucid and readable introduction to probability. No particular knowlege of higher mathematics (or physics) is required.

Given a large number (“N”) of choices, and an equal number (“N”) of random samples, I believe that N/e (e=2.718) choices (about 37%) will not be chosen. If 2*N tickets are purchased, then the number of un-chosen will be N/(e^2) or about 14%.

But people aren’t random, so my “hunch” is that N ticket purchase will only consume about 55% of the combinations. People love to pick birthdays.

You can calculate this with a little tool called the binomial distribution. Despite the complicated-sounding name, it’s really a fairly simple and useful thing.

If you perform N experiments, each having an independent probability of success of p, then the chance of getting x successes is:

C(N,x) * p^x * (1-p)^(N-x)

where C(N,x) is the number of combinations of N things taken x at a time, or N! / [ x! * (N-x)! ].

So if you pick letters (letters are less confusing to explain than numbers) at random from the set ‘a’ to ‘j’, and do it ten times, then the chances of getting exactly two of the letter ‘f’ (or any letter) is:

C(10,2) * 0.1^2 * 0.9^8
45 * 0.01 * 0.430
which is 0.194, or almost 20%. The probability for each number of times is:

0: 0.349
1: 0.387
2: 0.194
3: 0.057
4: 0.0111
5: 0.00149
6: 0.000138
7: 0.00000875
8: 0.000000365
9: 0.000000009
10: 0.00000000010000

As you would expect with ten chances picking from ten letters, the chances of getting exactly one is the greatest. As the number of trials gets larger, this distribution quickly approaches the shape of the normal Gaussian (bell) curve.

CurtC is right – the binomial distribution will give you the answer, although for large numbers this tends to get difficult to calculate. Since you’re dealing with millions of tickets, you’d end up trying to figure (several million)! which would require some special work on almost any computer. Luckily, as he also pointed out, this form of looking at it approaches the normal distribution as your sample size gets bigger.

A note on looking at the lottery problem :

This is equivalent to putting 1 win ticket in a box, along with the millions of losing number combinations (in this case, around 76 million). You pick a number from the box one at a time, put it back in, and count the number of wins. Each trial is equivalent to one ticket bought, with the numbers chosen completely randomly (not usually the case, but maybe close). Each trial has probability p = 1/76 million of being a win; note that with this method there might not be a winner even if hundreds of millions play.

The normal approximation with n trials , each with p prob. of success and q = 1-p not success :

mean (expectation) mu = np
sigma (std. deviation) sig = sqrt (n
p*q)

P(k successes in n trials) ~= 1/sig * phi(z)

where z is the “standard units” , z = (k-mu)/sig
and phi(z) is the function describing the normal curve :

exp(-1/2*z^2) / sqrt(2*pi)

also, phi(-z) = 1-(phi(z)) since z is centered on 0

Okay, maybe more math than you needed. but you can figure some odds :

Say only 10 million tickets are sold. mu = .13, sig = .36 and the probability of 1 win is .06, 2 wins is 2e-6, and 3 wins is 29e-15.

If we increase it to, say 100 million, mu = 1.31, sig = 1.15, and P(1) = .54, P(2) = .29, P(3) = .12; quite a change.

The news reports say that there were 2 winners, although they don’t report how many tickets were sold. If about 170 million tickets were sold, the likelihood that there were 2 winners is around .40.

panama jack


Hopefully, I didn’t screw up in this post.

Okay, so I did screw up in my last post. Hope you all don’t think I’m stupid or something ( I am, I just don’t want you thinking that.)

Well, the mistake I made was sort of a minor one, but it did make for some weird numbers in special cases. What I did was confuse phi(z) - the normal density function, with PHI(z) (that’s capital greek phi when hand written), which is the cumulative distribution function for the normal, a fancy way of saying it’s the integral up to that point. It’s used for figuring out probabilities that span a certain range of values.

The formula i gave for the normal approximation was correct, but it shouldn’t have that bit about -z. That only applies to (PHI) which can’t be figured easily since it’s an integral of an exponential. If you look at the formula I gave (which is phi(z), you see that z gets squared, so that phi(-z) = phi(z).

This doesn’t change the approximate numbers I gave, except for what I said at the very end about 2 winners. The value of n that gives an expectation of 2 is 150 million; the probability that you get 2 winners in this case is about 35%.

One final note for those who don’t want to work through lots of formulas : the old “gambler’s rule”, which works out pretty well, says that if something has fairly low odds of winning ( a single number in roulette is good) which you label 1/N, you need to play about 2/3N games before you have a 50% chance of winning at least once (the actual value is ln 2 = .69) . That means for a lottery of with odds 1 in 75 million, about 50 million tickets need to be sold before there’s a 50% chance that there’s a winner. Again, this assumes that all numbers are picked randomly; not really the case here.

panama jack


Hopefully not too many people saw that.