The normal (or Gauss) distribution is a terrific approximation in a wide variety of settings. And it’s backed by the central limit theorem, which I admittedly do not understand.
But is it exactly observed in nature, for samples with an arbitrarily large number of observations? By exact, I mean that large samples consistently reject the hypothesis of non-normality.
In finance, rates of return tend to be leptokurtotic - they have long tails. There’s even some (negative?) skew. That doesn’t stop analysts from using the Gaussian distribution as a rough and ready model (sometimes to the chagrin of their investors, but that’s another matter).
What about other settings? Biology? Demography? Chemistry? Atmospheric science? Engineering?
I’m guessing that a reliably perfect Gaussian process is empirically rare, since there’s typically some extreme event that kicks in periodically. A large and important part of reality may consist of sums of very small errors, but I suspect that another aspect involves sporadic big honking errors. For every few thousand leaky faucets, we get a bust water main.
While I’m at it, why do the Shapiro-Wilk and Shapiro-Francia tests for normality cap out with sample sizes of 2000 and 5000?
I posed this question at the now defunct website, teemings.org in 2008. The short answer (by the esteemable ultrafilter) was, “no, nothing’s going to be exactly normally distributed. On the other hand, particularly in cases where the central limit theorem applies, the difference between what you see and what you’d expect to see is negligible.”
A tighter version of my question follows:
-
Has anybody stumbled upon an unsimulated and naturally occurring dataset with, say, more than 100 observations that looks exactly like a textbook normal curve?
-
Does any natural process consistently spin off Gaussian distributions, with p values consistent with normality virtually all the time? (Presumably this would be produced by something other than the central limit theorem (CLT) alone.) Ultrafilter says, “No”, if I understand him correctly.
-
Does any natural process consistently spin off large datasets (thousands of observations each) where normality is not rejected - at least 95% of the time (at the 5% level of confidence)? If the CLT is the only thing in play, there should be natural processes like this. But I suspect that black swans are pretty much ubiquitous.
Then again, it should be possible to isolate a process that hews to a conventional Gaussian.
Bonus question: Do any of the tests for normality evaluate moments higher than 4?
As always wiki is helpful: Normal distribution - Wikipedia , but I’m not sure whether I should trust their claims of exactness.
Finally, if anybody has an empirical dataset whose process is plausibly Gauss, sample is huge and is in a reasonably accessible computer format, feel free to link to it if you’re curious. I’ll run some tests at some point using the statistical package Stata. Datasets are admittedly “All around the internet”, but extracting lots of fairly large (2000+) samples typically requires some work.