# Simple combinatorics question: Odds change if proportionate size increases?

OK, my combinatorics is very rusty, so. . .

Let’s say I have a set of 100 numbers, and I randomly choose 10, allowing for numbers to be reused. There’s a way to determine what the chances are that one of the random 10 is a duplicate.

Now, I increase the numbers tenfold. Now I have 1000 numbers, and I randomly choose 100. Do my chances of snagging a duplicate change, or will they be the same?

nm

Yes, the odds do change. The formula for the odds of getting no duplicates when you choose m objects from a set of n is

P = n! / n[sup]m[/sup] (n-m)!

There’s not going to be some cancellation in there between the factorials that leads to the exact same number in both cases. For the two cases you specifically mentioned, the odds are [fires up Mathematica][ul][li] n = 100, m = 10: P = 0.628157n = 1000, m = 100: P = 0.00595893[/ul][/li]
It actually surprises me that the probability goes down as n goes up (for a fixed value of m/n), but probability is weird sometimes. Still, if someone else could check my work here I’d be grateful.

Let me lay out the rules, and if they match what I think:

You have balls numbered 1-100. You randomly draw a single ball - say 42. The ball gets replaced and you select another ball from 1-100. You do so until you’ve drawn 10 times.

If this is the case, the two situations definitely have different probabilities.

For the 10 out of 100 case, there are P(100,10) = 6.28*10^19 ways to select 10 unique numbers out of 100 without repeats. In this case, we use permutations instead of combinations. There are 100^10 = 10^20 ways to select any 10 balls. So, the probability of getting ANY repeat (including all 10 the same) is:

``````         1 -  P(100,10) / 10^20
``````

This is roughly 37.2%. So, you have about a 1 in 3 shot at seeing a repeat.

For the 100 out of 1000 case, there are P(1000,100) ways to draw the 100 balls without repeats, and 1000^100 ways to draw any 100 balls.

`````` P(1000,100) = 5.96*10^297
1000^100     = 10^300

1 - P(1000,100)/10^300 = 99.4%
``````

You are almost assured of getting a repeat with 1000 balls and selecting 100.

You forgot to subtract from 1. Your probability is for NO repeats and matches what I get. It makes sense with more numbers, you would expect to see more repeats or, equivalently, fewer cases of unique draws with more numbers.

No, I was actually surprised that the probability of all outcomes being distinct decreased with n. My way of thinking was that since there were more objects in the pool, the odds of any particular object being chosen twice goes down — but now that I think about it, that’s a distinct question from whether there are any duplicates at all.

This distinction is crucial to the Birthday Paradox (which is not the same thing as what the OP was asking about, but there is perhaps some relation).

Substitute 365 days instead of 100 numbers and “n” people instead of drawing 10 times, and it is exactly the same underlying math problem.

I should have noticed that earlier. Oh well, can’t get 'em all.

Thanks, gang!

Here is a fairly crude approximation to the question. If you choose m items out of n with replacement, the odds of no repeats is roughly e^{-m^2/2n}. This is quadratic in m, but only linear in n and that is why it cannot scale linearly. Roughly speaking m^2 is the number of pairs of choices and every pair must consist of two distinct elements. As I said, it is fairly crude, but you can do it on your computer’s scientific calculator without Mathematica.

Suppose you stop picking numbers once you get a duplicate.

After 0 picks, the probability of advancing further is 100/100 (the first number you pick is definitely not a duplicate).

After 1 pick, the probability of advancing further is 99/100.

After 2 picks, the probability of advancing further is 98/100.

After 3 picks, the probability of advancing further is 97/100.

Etc.

Overall, the probability of making it all the way with no duplicates is 100/100 * 99/100 * 98/100 * 97/100 * … * 1/100.

After 0 picks, the probability of advancing further is 1000/1000 = 100/100.

After 10 picks, the probability of advancing further is 990/1000 = 99/100.

After 20 picks, the probability of advancing further is 980/1000 = 98/100.

After 30 picks, the probability of advancing further is 970/1000 = 97/100.

Etc.

Only including these factors, the calculation would be exactly the same as before. But on top of that, we also have to include the factors for possibly hitting the first duplicate after a number of picks which isn’t a multiple of 10. Thus, what actually happens is that the probability of hitting a duplicate goes up.

Another way of thinking about it is that if you have a pile of n different objects and you chose from them with replacement, you will have to take approximately (i.e., within an order of magnitude) m objects from the pile before you can expect the chances to be around 50% that you will have a least one duplication among the objects that you have chosen. So if there are 100 objects in the pile, you will have to take about 10, if there are 1,000,000 objects, you will have to take about 1,000; if there are 1,000,000,000,000 objects, you will have to take about 1,000,000, etc.

I messed up my post. This is what I meant to say:

Another way of thinking about it is that if you have a pile of n different objects and you chose from them with replacement, you will have to take approximately (i.e., within an order of magnitude) the square root of n objects from the pile before you can expect the chances to be around 50% that you will have a least one duplication among the objects that you have chosen. So if there are 100 objects in the pile, you will have to take about 10, if there are 1,000,000 objects, you will have to take about 1,000; if there are 1,000,000,000,000 objects, you will have to take about 1,000,000, etc.