Can I use kurtosis to construct confidence intervals?

Saffer · June 22, 2010, 11:03am

I actually asked this question a while ago but I didn’t get a satisfactory answer.
Probably because I wasn’t specific enough.

If I have a data set, with a mean of 0 and a standard deviation of 1, and I assume that it is normally distributed, then I would expect that 95% of the data lies between -2 and 2. Similarly, I can construct any confidence interval, for any data set with a known mean and standard deviation, if I assume a normal distribution.
If my data set has kurtosis of 400 then I know that the confidence intervals are wrong as the tails are rather fat and the distribution is peaked around the mean.

My question is, has anybody figured out how to incorporate skewness and/or kurtosis to calculate reasonably accurate confidence intervals

ultrafilter · June 22, 2010, 3:54pm

As long as you have iid draws, you can compute the 2.5% and 97.5% quantiles of your sample and use that as a (consistent) estimate of the population quantiles. This will work regardless of the distribution that you’re drawing from. That’s not really a confidence interval in the sense that the phrase is usually used, but I think it matches up well with what you’re trying to do.

glowacks · June 22, 2010, 7:25pm

Without having an assumption on the distribution of the population, you cannot meaningfully construct confidence intervals. At very least, it has never been discussed in any of the statistics classes I’ve taken. I don’t have a ton of advanced statistics knowledge, but I would think that there are any number of distributions with the first few moments the same. There might be a standard distribution you’re supposed to assume if you calculate the skewness and kurtosis to be significantly different from a normal distribution, but I’m not familiar with it.

The best way to construct confidence intervals without assuming an underlying distribution is to do them empirically as ultrafilter mentioned.

Chronos · June 22, 2010, 8:14pm

If I’m understanding you correctly: When you have a mean and a standard deviation and no other information, one would typically assume or approximate the distribution as Gaussian. And you’re asking, if one has a mean, standard deviation, and kurtosis and/or skew, if there’s some other distribution (or more precisely, family of distributions with those three or four parameters) which one should assume as an approximation?

Buck_Godot · June 22, 2010, 10:40pm

What everyone stated above is correct. We generally assume a normal distributions when constructing confidence intervals because (due to the central limit theorem) normal distributions crop up all the time, particularly when dealing with conglomerations of small perturbations. If you are observing data with a kurtosis of 400 then you are clearly not observing a normal distribution. The question is what distribution are you observing.

If you have a large number of data points, you can use the empirical confidence intervals as described by ultrafilter. But if you have only 50 data points, then your confidence intervals are going to be dependent on just a few outlier points.

If you have a understanding of the process that went into making the data, then that might also suggest some distribution. If all this fails, and if your data is sort of bell shaped, then you might want to look at Generalized normal distributions which have an additional “shape parameter” along side the mean and variance that adds kurtosis or skewness to the normal distribution. You can plug in your observed skewness and kurtosis and solve for this shape parameter using the formulas in the link above. This is called method of moments estimation.

At the end of the day its probably also good to check that your data fits the distribution you assumed. This can be done using theKolmogorov–Smirnov test

Chronos · June 22, 2010, 11:42pm

On thinking about it some more, you might also get some use out of Chebyshev’s inequality. No matter what the distribution, nor how non-Gaussian it is, no more than 1/k[sup]2[/sup] can lie beyond k standard deviations (so, for instance, for a 99% confidence interval, 10 standard deviations will always suffice). Note that this is only an upper bound: It might still be possible for the confidence intervals to be an arbitrarily small fraction of the standard deviation, depending on the distribution, and it’s even possible for a perfectly well-defined distribution to have infinite standard deviation.

Buck_Godot · June 23, 2010, 6:00pm

Chebyshev’s inequality is pretty harsh. For a 95% confidence interval it basically assumes you have a 3 point with 95% of the data at 1 point in the middle and 5% at the either of the two ends. This is great for proving theorems where you want to consider every possible case, but except in the most extreme cases its confidence limits will be way over estimated. To look for fractions of the SD to get confidence intervals, we get back to assuming or fitting an alternate parametric distribution.

Topic		Replies	Views
Questions about doing statistics on sampled data Factual Questions	9	1083	January 14, 2017
Question on statistics, kurtosis and the financial crisis Factual Questions	4	1717	August 7, 2009
Calculating large standard deviations Factual Questions	18	5452	January 3, 2010
Am I doing this basic math question correctly? Factual Questions	3	615	March 7, 2005
How do I calculate a really simple (and stupid) confidence interval? Factual Questions	10	1393	December 22, 2006

Can I use kurtosis to construct confidence intervals?

Related topics