Correct my statistics...

Let’s say you have a lottery of the 25 / 5 format: i.e., possible numbers are integers 1 to 25 (inclusive) and each draw is 5 numbers.

So the total number of combinations possible is 25 nCr 5 = 53130.

If you perform 50,000,000 random trials, you would expect

50,000,000 / 53130 = 941.1 occurrences of matching all 5 numbers.

Here’s where I need correction: In how many of the 50,000,000 trials would I expect to match exactly 4 of the 5 numbers?

The number of 4 number combinations is 25 nCr 4 = 12650. But dividing 50,000,000 by 12650 doesn’t give me the right result.

How do I determine the number of expected 4 number matches in 50,000,000 trials?

Thanks for helping,
J.

There are 5 ways you can get exactly 4 numbers right, and for each one the wrong number can be any one of the remaining 20. So expect 100 times as many matches.

I don’t follow your logic in the second part. I don’t see the relevance of the number of 4-number combinations, because each trial produces a 5-number combination, not 4-number.

Your mistake is that you replaced the wrong number when going from the 5-match to the 4-match case. 25c5=53510 tells you how many possible outcomes there are. That doesn’t change, so you should leave the “5” alone here.

If you want to match all five numbers, there is exactly one outcome that works. Thus, 1-in-53510 is the chance of hitting all five. Multiplying by the number of attempts gets you your first (correct) answer:

50,000,000 * (1/53510) = 941.1

If you want to match four numbers, you need to count up the number of possible 4-match outcomes. If your numbers are 1-2-3-4-5, then the complete list of 4-match outcomes is:

[6,25]-2-3-4-5 (20 possibilities here…)
1-[6,25]-3-4-5 (another 20…)
1-2-[6,25]-4-5
1-2-3-[6,25]-5
1-2-3-4-[6,25]

for a total of 100 4-match outcomes. So, the chance of matching four numbers is 100-in-53510, for an expected total of:

50,000,000 * (100/53510) = 94109

In short, the number that changes is the number of interesting outcomes (not the number of possible outcomes.)

Ok, I think I see. I’d like to put this in a formula so that I can calculate not only the frequency of 4 number matches in the 5, but also the number of 3, 2, and 1 matches in the 5.

And I THINK this is the formula I need to calculate the number of “interesting” outcomes:

r * ((n - r) C (r - t))

where:
r = the number of numbers drawn in the lottery (in our case, 5)
n = the number of “balls in the urn” (in our case 25)
t = the number of exact matches within the “r” number drawn.

For the case of 4 exact matches, r = 5, n = 25, t = 4 which gives

5 * ((25 - 5) C (5 - 4)) =
5 * (20 C 1) =
5 * 20 = 100

For the case of 3 exact matches, r = 5, n = 25, t = 3 which gives
5 * ((25 - 5) C (5 - 3)) =
5 * (20 C 2) =
5 * 190 = 950

2 exact matches:
5 * ((25 - 5) C (5 - 2)) =
5 * (20 C 3) =
5 * 1140 = 5700

1 exact match:
5 * ((25 - 5) C (5 - 1)) =
5 * (20 C 4) =
5 * 4845 = 24225

Using these figures, the expected number of “interesting” combinations is

4 Matches:
50,000,000 * (100 / 53510) = 93440.5

3 Matches:
50,000,000 * (950 / 53510) = 887684.5

2 Matches:
50,000,000 * (5700 / 53510) = 5326107.3

1 Matches:
50,000,000 * (24225 / 53510) = 22635956

BUT! these numbers don’t agree with my experimental results:

                            Expected           Actual
                            -----------           ----------

5 matches 934 933
4 matches 93440 94115
3 matches 887684 1788064
2 matches 5326107 10726961
1 match 22635956 22798614

The 5 match number is right on, 4 and 1 are close but off a little, 2 and 3 are completely off.

So maybe my formula is incorrect? Anyone?

Thanks a lot,
J.

You’re on the right track. It looks like you tried the r out front because that’s what worked for the 4-match case. If you plug in the 5-match numbers, though, you’ll find it doesn’t work for that one (or, as you found, the 2-match or 3-match).

You’re spot on with the second factor: You have to pick (r-t) values from the list of (n-r) un-guessed numbers. For the numbers you got corrent, you need the same logic: You have to pick (t) values from the list of (r) guessed numbers. For example, if you chose 1-2-3-4-5 as your numbers and you are looking to guess two right, you might match 1-2, 1-3, 1-4, etc. The number of possible pairs is (5 C 2) since you are picking two items from a list of five.

The final result is thus the product of the two combinations:

(r C t) * ( (n-r) C (r-t) )

The reason your formula worked for the 4-match and 1-match cases is because

(5 C 4) = (5 C 1) = 5

Cool! Thanks a lot for all of the help!

J.