I agree with Valguard’s approach, though not his numbers. The point is that the question is not what the average number of people is, but what is the most likely number of matches is. As MikeS suggests, we need to find the mode rather than the mean.
The place where I disagree with Valguard is that I think his calculation of p is wrong. If it were right, the sum over all the possible number of answers should add up to 1, but if you sum them, it’s much less. The way I look at it, you first “mark” 6,100 people out of the 60,000, and then pick 3,500 people from the 60,000 at random. You then sum the number of people who you’ve now selected who have been marked by the initial selection. In that model, p would be 6100/60000 = 0.10166666…
In bc, the Unix calculator, I defined the three functions:
define f(a) {
if (a > 1)
return (a * f(a-1));
return (1);
}
define c(a,b) {
return (f(a)/f(b)/f(a-b));
}
define m(p,n,t) {
return (c(t,n) * p^n * (1-p)^(t-n));
}
Where f is the factorial function, c is the choose function, and m is the chance that you get n successes out of t trials when the probability for one success is p. I ran this with 450 digits of precision, where p = 0.101666666666667 and t = 3500. The function m() hit its maximal value at n = 355 successes, which interestingly enough is quite close to the mean calculated by Tyrrell McAllister at the beginning of this thread.
A sample of the probabilities I got:
n = 0: 1.07473603544798e-161%
n = 300: 0.01436011140540525%
n = 350: 2.13020672765995334%
n = 354: 2.22419553669614445%
n = 355: 2.23071660438062294%
n = 356: 2.23026321614830110%
n = 400: 0.11137275995696081%
n = 450: 4.97431757661914e-6%
And summing up all the probabilities up to n = 450, the probability that 0 <= n <= 450 is 99.999984284%, which implies that we are converging to 100%, which is what we expect.
So, to sum up, if you ran a simulation many times, the average number of successes that you’d get would probably be close to 356, but the most common result would probably be 355 successes. How many is many? Well, the probability for 356 successes is so close to 355, you would need to run it billions of times. If I recall the equation correctly, you’d need to run it at least 1/(0.0223071660438062294 - 0.0223026321614830110)[sup]2[/sup] = 48,647,386,373 times to get a statistically significant difference.