I need to know the exact odds that, out of a population of 30,000, containing 100 articles that are “fake”, and from which a sample of 120 is tested (assuming testing is always correct and that the odds of pulling a fake sample is equivalent to the real ones) that a fake would be detected.
I’m sorry, I should know this stuff, but its been a long time and I am beginning to doubt my math.
First calculate the odds that no fake will be detected:
probability of seeing no fake = P(first draw is real) * P(second draw is real given that first draw is real ) * … * P(120th draw is real given that all preceding draws are real)
probability of seeing no fake = (29900/30000) * (28999/29999) * … * (29781/29881)
probability of seeing no fake = 0.669
So, the probability of seeing at least one fake is one minus this:
Excellent. Thank you. I was doing it wrong, forgetting to take out the influence of the real ones I removed searching for the fakes. It makes a small difference.
You didn’t specify whether you’re sampling with or without replacement, and Pasta’s answer assumes that you’re sampling without. In that case, the number of fake articles you pull is distributed hypergeometrically with N = 30000, m = 100 and n = 120.