Calculating large standard deviations

Shalmanese · December 22, 2009, 6:52pm

I’ve always wondered how to find out what proportion of a population is say, 15 standard deviations above the norm but this information seems hard to find. Even Wolfram Alpha tops out at 8 standard deviations.

I know the number is ridiculously small but it’s not 0. I was wondering if there’s any site/formula that can calculate this for me.

pulykamell · December 22, 2009, 7:14pm

The formula for what percentage of values are within n deviations from the mean in a normal distribution is:

erf(n/2^.5)

(That’s really n over the square root of two, but I’m not sure how to code it here.)

erf is the error function. If you use Wolfram Alpha’s math module, you can just type in erf (1/2^.5), for instance, to get 0.682689492137, which is the percentage of values within one standard deviation on the mean. Plugging in 15, we get:

0.9999999999999999999999999999999999999999999999999926580676…

So that’s the percentage of values within 15 standard deviations of the mean. Subtract that from one (or use the erfc function for the compliment), and you get:

7.341932398625501771572179310669486972832503256080314… x 10^-51
outside the confidence interval.

Out of curiosity, are you calculating that you have a value 15 standard deviations from the norm?

Omphaloskeptic · December 22, 2009, 7:23pm

Assuming your distribution is normal, you want Q(x), which is related to erf.

Note that Q(15)=3.7E-51. This is ridiculously small (even Q(8) is probably “ridiculously” small), in the sense that in any physical application the assumption of normality has almost certainly failed to this level of accuracy. Normal distributions are often good approximations near the mean, but real systems usually have tails that diverge from the normal after a few standard deviations.

(pulykamell gave the two-sided error, which is 2Q(15).)

pulykamell · December 22, 2009, 7:27pm

Yes. I missed the part that said specifically “above the norm,” making it a one-sided problem.

Napier · December 22, 2009, 9:50pm

I think the Handbook of Mathematical Functions by Irene Stegun and (?) Abramowitz has that. If there’s a particular number you care about especially I could look it up for you.

Like Omphaloskeptic says, real distributions of physical properties are unlikely to be close enough to Gaussian for such a number to be meaningful.

I object (though nobody goes along with me) that “normal distribution” is a misleading term and we should say “Gaussian distribution” instead. The problem is that this distribution is typical or “normal” for some situations but not all. The number of fission events a lump of plutonium undergoes in any millisecond period will be a Poisson distributed variable. The locations of bullet holes along a straight wall, caused by a randomly firing rifle nearby mounted on a turntable, will be a Cauchy distributed variable. And so forth.

ultrafilter · December 23, 2009, 3:34am

It’s zero for all intents and purposes. If you see a value that’s more than about six standard deviations away from the mean, you can be assured that you’re not looking at a normal distribution.

statsman1982 · December 23, 2009, 5:01am

Not assuming a normal distribution, you can use Chebyshev’s Inequality. It applies to ANY distribution whatsoever, so the bounds will be somewhat loose. Basically, the probability that an observation is k standard deviations away from the mean is less than or equal to (1/k^2).

Napier · December 23, 2009, 10:15pm

For 500 standard deviations away from the mean, the base ten log of Q(x) is -54289.90830. That’s as far as Abramowitz and Stegun took it.

ultrafilter · December 24, 2009, 4:35am

That gives an upper bound of 1/225, which is not particularly tight. There’s probably something you can do with Chernoff bounds, but it’s not immediately obvious to me how you can get something that applies here.

not_alice · December 24, 2009, 5:56am

How you doin’?

Napier · December 24, 2009, 1:59pm

What, is this a come-on? You should know I’m not used to being popular and am all thumbs at this sort of thing.

Or did I completely misread the situation again?

Nava · December 24, 2009, 3:50pm

WAAAAAAAAAAAAAAAH! Napier called me “nobody”! That’s a different poster, you know :p.

A pet peeve of mine is that, since the Gaussian is the only continuous statistical distribution many people have been told about and they’ve been told time and again that it’s “a normal distribution,” they refuse to use anything else even when (as it happens >50% of the time in the Chemistry, Pharma or Food industries) their sample number is too small (I’ve seen people slamming a Gaussian on n=3) or they have a “hard limit” on their specs (you can’t have a negative amount of impurities) or their target value equals one of the extremes of the spec range, usually that hard limit (impurities again; also many values linked to the mechanical properties of a chemical mixture).
Napier, some people just love it when you talk nerdy to us I guess not_alice is another one.

Napier · December 24, 2009, 5:46pm

Uhhh… how you doin’?

ultrafilter · December 26, 2009, 5:52pm

Apparently the word “normal” in the normal distribution refers to orthogonality, not commonality. There are some details here, and I’m trying to track down more information, but there’s not a lot on the web and I don’t have access to a library for another week or so.

Napier · December 26, 2009, 8:30pm

From the wikipedia article for “normal distribution”:

“The name “normal distribution” was coined independently by Peirce, Galton and Lexis around 1875; the term was derived from the fact that this distribution was seen as typical, common, normal. This name “normal distribution” was popularized in statistical community by Karl Pearson around the turn of the 20th century.[4]”

ultrafilter · December 26, 2009, 11:25pm

I’m having a bit of trouble finding out how the author of that part of the article got that out of the cited source. The entry that I linked to cites Jayne’s book, which is available online. By my reading of the closing comments of chapter 7, Prof. Jayne seems to think that the orthogonality origin is plausible, and I’m inclined to believe him.

Zakalwe · December 26, 2009, 11:35pm

[narrator voice]
As you can see, the mating dance of the rare Math Nerd is a complex and slow affair. Some speculate that this is due to the infrequency of two Math Nerds meeting in a congenial setting leading to an unfamiliarity with the ritual.
[/narrator voice]

Napier · December 27, 2009, 12:07am

Don’t get ahead of the game, Zak. Fun’s fun, but ultrafilter and I barely know one another.

ultrafilter · January 3, 2010, 4:59am

Wikipedia didn’t have the particular Chernoff bound that I was looking for, but Ross’s Stochastic Processes does. If X is any random variable with moment generating function M(t) and a is any positive constant, P(X > a) < e[sup]-at[/sup]M(t) for all t. Therefore, if you pick t to minimize the right hand side, you have the tightest possible (simple) upper bound for the probability that a random variable whose moment generating function exists is greater than some value of interest.

If Z is standard normal, this gives you that P(Z > 15) < e[sup]-225/2[/sup]. The upper bound given by Chebyshev’s inequality is roughly 46.5 orders of magnitude higher, which means (IMO) that’s it not significantly different from one.

ETA: You might be able to get an extra factor of two in that ratio because the Chernoff bound deals with the probability that X is larger than some value, whereas the naive application of Chebyshev’s inequality deals with the probability that the absolute deviation of X is larger than that value, but really, what’s a factor of two here?

Topic		Replies	Views
Statistics help: Where do the Six Sigma numbers come from? Factual Questions	24	1285	November 7, 2022
Formula for the "Normal Curve" Factual Questions	9	1015	January 21, 2003
Why is 1 unit of standard deviation accurate only 68% of the time? Factual Questions	5	1448	April 3, 2014
Z-Scores and Percentages from the Mean Factual Questions	19	2858	April 24, 2003
Randomness and the Bell Curve Factual Questions	46	1672	June 5, 2022

Calculating large standard deviations

Related topics