Simple statistics question

Except ultrafilter has been correcting for small finite populations this entire time. Everything he’s said has been along the lines of “the proportion of the sample size to the population…” In fact, I think you both agree about Mary’s uncertainty with 50 samples. It’s whatever you’re doing around 30 samples where somehow John has a better estimate that’s confusing.

I’m not willing to take your word for this. Please provide a cite.

I’m really not sure what you wish a cite for?

Mary wants to know the population mean which is defined as (x[sub]1[/sub]+ … + x[sub]50[/sub])/50.

She knows x[sub]1[/sub] + … + x[sub]30[/sub] exactly after her sample of 30.

A standard assumption about each of the unknown x[sub]31[/sub], … , x[sub]50[/sub] is they have mean
m = (x[sub]1[/sub]+ … + x[sub]30[/sub])/30 and variance v = (x[sub]1[/sub][sup]2[/sup] + … + x[sub]30[/sub][sup]2[/sup] - 30m[sup]2[/sup])/29 (or divide by 30 if you wish the MLE).

Assuming the remaining x’s are independent, var[x[sub]31[/sub] + … + x[sub]50[/sub]] = 20v.

Var[k*z] = k[sup]2[/sup] var[z] for any constant k

so var[(x[sub]1[/sub] + … + x[sub]50[/sub])/50] = var[(x[sub]1[/sub]+ … + x[sub]30[/sub])/50] + var[(x[sub]31[/sub] + … + x[sub]50[/sub])/50] = 0 + var[(x[sub]31[/sub] + … + x[sub]50[/sub])/50] = 20v/50[sup]2[/sup]

Which step(s) do you want a cite for?

This is not the variance of Mary’s estimator. The variance of Mary’s estimator depends only on the observations that she’s seen, and assuming independence, has absolutely no relationship to the unobserved part of the population.

Mary’s estimator is (x[sub]1[/sub] + x[sub]2[/sub] + … + x[sub]30[/sub])/30, and it’s variance assuming that she’s sampling from an infinite population is Var(x[sub]1[/sub]) / 30. Since she’s not sampling from a infinite population we have to multiply by the finite population correction factor that I described above.

This is really basic stuff that even a non-mathematical introductory textbook will cover. The book I linked earlier by Friedman et al. is a really fantastic discussion of the concepts of statistics and I recommend it to anyone who’s willing to put in the work to understand them.

I apologize. I couldn’t see the error because my math was correct. My assumption was wrong. In case anyone still cares:

What we want when sampling n out of N is not variance of (x[sub]1[/sub] +…+ x[sub]N[/sub])/N but the variance of the estimator’s error which is (x[sub]1[/sub] +…+ x[sub]N[/sub])/N - (x[sub]1[/sub] +…+ x[sub]n[/sub])/n. This is var of (x[sub]n+1[/sub] +…+ x[sub]N[/sub])/N + (n-N)(x[sub]1[/sub] + … x[sub]n[/sub])/nN. Assuming independence, as usual, a little algebra gives the variance to be

[(N-n)/N]v/n

This is the finite sample correction mentioned except if you don’t know the population variance v you use v/(n-1) rather than v/n.