Statistics help for an old codger

jharvey963 · December 15, 2008, 6:30am

It’s been more decades than I care to admit since I’ve had statistics, and I’d like to ask for some assistance with some counting and odds problems. (For my own edification. These aren’t homework problems.)

I remember the formula for “N choose M, without replacement”: N! / (M! (N - M)!)

But now, let’s assume that you choose another set without replacement, say P, from the original N. I have some questions about the relationships between M and P.

It always helps me to give actual numbers, so let’s say …
N is 20 (the original set contains 20 items),
M is 2 (2 elements are chosen, without replacement), and
P is 5 (5 new elements are chosen, without replacement, from the original 20).

What are the formulas for these:

How many elements from M would you “expect” to see in P?

2a. What are the odds of finding exactly 0 elements from M in P?
2b. What are the odds of finding exactly 1 element from M in P?
2c. What are the odds of finding exactly 2 elements from M in P?
…
2x. What are the odds of finding all elements from M in P?

3a. How big would P have to be for you to expect that P contains exactly 1 element from M?
3b. How big would P have to be for you to expect that P contains exactly 2 elements from M?
…
3x. How big would P have to be for you to expect that P contains all elements from M?
Thanks,
J.

muttrox · December 15, 2008, 2:56pm

Silly clarification – I assume you can do the replacing after picking all of M? Otherwise there is no overlap at all.

Some basics:
M=20 choose 2 = 190 combinations
P=20 choose 5 = 15,504 combinations
M and P are independent.

Question 2c/2x is simply 190/15504 = 1.2%

That’s a start - my brain is up to speed on Monday morning yet for the rest.

jharvey963 · December 15, 2008, 5:00pm

Yes, M elements are chosen first (without replacement). Then those elements are replaced back into N. Then P elements are chosen (without replacement) from the original 20 elements in N.

J.

ultrafilter · December 15, 2008, 5:12pm

The number of elements from the first sample in the second sample is distributed hypergeometrically with suitable parameters.

jharvey963 · December 15, 2008, 7:02pm

uhhhhhhh, thanks…

The link, while undoubtedly the correct distribution, is pretty dense.

Would someone mind giving a more “layman’s” explanation of this, and how it would apply to my problem?

Thanks a lot,
J.

ultrafilter · December 15, 2008, 7:51pm

I will write “n choose k” as C(n, k). In the simplest version of the hypergeometric distribution, you have a population of N items that can be split into two non-overlapping groups of size M and N - M. You want to sample without replacement from the overall population, and you want to compute the odds of drawing a certain portion of that sample from the first population. If you sample P times, the probability that k of those draws come from the first population is C(M, k)C(N - M, P - k)/C(N, P).

The number of elements from the first population you expect to see in your sample is the mean, which is equal to PM/N. For your example, that’s 5*2/20, or 1/2. You can use that formula to find how many times you have to sample to expect to see a certain number of elements from the first population. It’s instructive to compute how large a sample you have to take to expect to see the entire first population, so I’ll let you work that one out.

nivlac · December 15, 2008, 8:06pm

It is indeed the hypergeometric distribution. In the first draw you are essentially selecting m “defective” units out of the N available. Then in your second draw of n units you are looking for these same m units. So N and m are obviously 20 and 2, respectivly, in your example. So what should n and k be in the wiki article? I get 1/19 as the answer.

Topic		Replies	Views
Basic statistics question Factual Questions	7	651	July 25, 2002
Simple combinatorics question: Odds change if proportionate size increases? Factual Questions	12	2105	August 21, 2011
Simple combination/permutation odds question Factual Questions	3	715	January 25, 2017
Probability Theory... little help Factual Questions	7	918	August 29, 2005
Another 2nd grade math question: What would this be called? Factual Questions	18	1473	January 15, 2005

Statistics help for an old codger

Related topics