Statistics question: polling margin of error

Exactly.

Like driving on the right instead of the left, it doesn’t really matter which choice is the answer to “this is what we went with” (as the UK dopers will quickly testify if asked); it does kind of matter that “yes we have a convention that we have decided to use and you should use it too, please”

See prior comment on comparing one study to another.

EDIT: it always pagewraps when I didn’t quote!

this is about going with 95% confidence and whether doing so is arbitrary. Ultrafilter said “this”.

If you know what you’re doing, you can construct statistics for any confidence interval you’d like. I might say, for instance, that Smith has 49 ± 1 at a confidence level of 90%; he has 49 ± 3 at a confidence level of 95%; and he has a 49 ± 5 at a confidence level of 99%. This would mean that his true level of support has a 90% chance of being between 48 and 50, a 95% chance of being between 46 and 52, and a 99% chance of being between 44 and 54. I could pick any of those numbers to publicize, or all of them, and which one I choose is just a matter of convention. I could even, if I wanted, calculate the margin of error for a 73.2% confidence interval, or something silly like that, and have perfectly valid statistics, but nobody else uses 73.2% confidence intervals, so it would probably just confuse folks.

On the question of why standard deviation is used instead of some other measure of width of a distribution, there are a good number of folks who do prefer other measures. For one thing, not all distributions have a standard deviation. Heck, there are even some perfectly valid (though contrived) distributions which don’t even have a mean. However, all distributions will always have a median, and all distributions will always have a 50 percentile width, so some folks prefer to use those as their measures of the typical value and width of a distribution.

I was under the impression that it’s related to the fact that in a normal distribution, 95% of all data lie within two standard deviations of the mean.

But wouldn’t this still be arbitrary? Why two standard deviations?

The number of standard deviations (or sigma) required is determined by the type of testing being done and level of confidence needed to convince others. Two sigma is realistic for population sampling. It’s inadequate for testing physics.

Basically, the error bounds goes down as the inverse of the square root of the number of sample points. So if 1000 people gets you about 95% accuracy, then to get a 99% accuracy you’d need to sample 25,000 people, and then 100,000 people to get 99.5% accuracy. Historically, folks thought that 95% wasn’t unreasonable for their needs, and that sampling 1000 people wasn’t too onerous a burden.

I use these to teach basic CI methodology for proportions.

The margin of error is basically the half-width of the confidence interval estimate, which is on the order of

ZFsqrt(p(1-p)/n)*

where Z is some sort of multiplier for approximate confidence level, usually the 97.5% point of the Gaussian distribution, or an appropriate t-distribution. F is a factor that accounts for variation in sampling design. Thing is, calculus shows that the sqrt(p91-p)/n) bit is maximized at p=.50, so one can conservatively plan by using p=.50 for planning purposes. So a gross formula for MoE is something like 2SQRT(1/4n).

The thing to not do is to apply the 95% to the single interval. The 95% confidence applies to the process.