Probability Calculation: Is There An Easy way To Do This?

ralph124c · June 5, 2004, 4:39pm

This is a question for a statistician: is there an easy way to calculate the probability that the mean and SD (of a sample) will differ significantly (by chance) from that of the parent population?
I know that the accuracy of a sample increases as the sample size goes up…how do you calculate the chances of the sample being different from the population?
Is there a quick way to estimate this?

ultrafilter · June 5, 2004, 4:40pm

I pretty seriously doubt that there is.

Shalmanese · June 5, 2004, 4:57pm

uh, isn’t this just what a confidence interval is? you work out how likely your sample mean (or p hat) is to the real mean (or p).

I forget the calculations but I thought this was 1st year stat stuff.

Squink · June 5, 2004, 5:10pm

I don’t think so, as it’ll depend on the actual skewness and kurtosis of the population being sampled.

Pasta · June 5, 2004, 7:46pm

Let’s say you’re measuring x and let’s say that the parent distribution has a true (although unknown) mean of [symbol]m[/symbol] and a true (although unknown) variance of [symbol]s[/symbol][sup]2[/sup]. The estimator you are using for the mean [symbol]m[/symbol] of the underlying distribution is the arithmetic mean of the sample:

[symbol]m[/symbol]’ = 1/N * sum(x[sub]i[/sub],i=1,N).

(I’ll use a tick mark (’) to indicate estimators of quantities.) This estimator [symbol]m[/symbol]’ has a variance of [symbol]s[/symbol][sup]2[/sup]/N. Equivalently: the standard deviation of this estimate is [symbol]s[/symbol]/sqrt(N). Unfortunately, you don’t know the true [symbol]s[/symbol], so you must also estimate that. An estimator of [symbol]s[/symbol] is

[symbol]s[/symbol]’ = 1/(N-1) * sum((x[sub]i[/sub]-[symbol]m[/symbol]’)[sup]2[/sup],i=1,N).

For a moment, let’s ignore any error in this estimate of [symbol]s[/symbol] (i.e., take [symbol]s[/symbol]’=[symbol]s[/symbol].) Then for a Gaussian (normal) distribution, one can say:

The mean [symbol]m[/symbol] is [symbol]m[/symbol]’ +/- [symbol]s[/symbol]’/sqrt(N).

For non-Gaussian distributions, [symbol]s[/symbol]’ is still a valid estimator for the standard deviation, but the standard deviation alone isn’t enough to answer a question like yours (“How likely is a certain deviation from the true mean?”) One must know what the underlying distribution is for that. However, a Gaussian approximation gets you quite far in many (but not all) applications. (Much of science would screech to a halt if it weren’t for this.)

Getting back to the fact that [symbol]s[/symbol]’ doesn’t equal [symbol]s[/symbol]: The standard deviation of the estimate [symbol]s[/symbol]’ involves the fourth moment of the distribution (see below). For the large N Gaussian case, though, it simplifies to [symbol]s[/symbol]/sqrt(2N). This approximation again gets you quite far, espeically in cases where you are only interested in the mean, as one is often satisfied with using this quantity only to verify that the approximation [symbol]s[/symbol]’=[symbol]s[/symbol] is sufficient for determing the error on the mean that is shown above.

Since you did ask about deviations of [symbol]s[/symbol]’ from [symbol]s[/symbol], here’s the variance of the estimate [symbol]s[/symbol]’:

(variance of [symbol]s[/symbol]’) = 1/N * (m[sub]4[/sub] - [symbol]s[/symbol][sup]4/sup/(N-1)),

where m[sub]4[/sub] is the fourth moment of the distribution which can be estimated by

m[sub]4[/sub]’ = 1/(N-1) * sum((x[sub]i[/sub]-[symbol]m[/symbol]’)[sup]4[/sup],i=1,N).

Like before, the error of this estimate will scale like 1/sqrt(N) (with a coefficient of order unity which I don’t know off the top of my head), and one can use this fact to determine if the approximation m[sub]4[/sub]’=m[sub]4[/sub] is okay when calculating the variance of [symbol]s[/symbol]’.

Topic		Replies	Views
Simple statistics question Factual Questions	44	6229	December 15, 2014
Questions about doing statistics on sampled data Factual Questions	9	1087	January 14, 2017
Statistics question - comparing single samples from normal distribution Factual Questions	4	6089	July 10, 2010
A quick stats computation Factual Questions	7	655	November 13, 2008
Risk and standard deviation Factual Questions	4	694	March 10, 2004

Probability Calculation: Is There An Easy way To Do This?

Related topics