Okay, I know how to come up with a confidence interval when you are only after one proportion (or two), such as the proportion of people who prefer Bush vs. the proportion who don’t. How is it done when there are three or more? For example, in a three-way Bush-Kerry-Nader poll, the margin of error for the proportion favoring each would be different if you calculate it as you would for a poll with just two choices. Am I really asking what the standard deviation is for such a poll? I have a feeling that’s what I’m after, but I’m not sure.
My Dad, a Ph.D. statistician, didn’t seem to recall how to do it off-hand. Maybe this is more at the front of some Dopers’ minds than his…
Here is an easy to follow explanation of the margin of error and the factors that go into it. Basically, it is a confidence interval with other things factored in.
Sorry, didn’t read through much of Shagnasty’s link. But here’s the low down – at least as far as it applies to year 13 statistics as taught in New Zealand.
You can view the results of a multi choice opinion poll as a number of binomial distributions. That is,
[ul]
[li]will vote for Bush / will not vote for Bush[/li][li]will vote for Kerry / will not vote for Kerry[/li][li]will vote for Nader / will not vote for Nader[/li][/ul]
Of course, these three binomial distributions are not independent, but that doesn’t matter. They each have their own p value (probability of event occurring) and they all have the same n value.
You could simply calculate the confidence interval the same as normal. Except, as you rightly pointed out, you would have three (or more) different margins of error.
The worst case scenario is when p is close to 0.5. That is when the margin of error is greatest. What is done in practice it to calculate the margin of error for p=0.5 and use that value when you report the results of the poll. Typically in NZ a poll of 1000 is used with a confidence level of 95%. The results are reported with a “margin of error of plus or minus three percent”.
[/doing the math]
There are some problems with this approach. It is confusing when you have low polling candidates whose polling may actually be below the margin of error. The other and more subtle difficulty is that the Binomial --> Normal approximation (upon which the central limit theorem and hence confidence intervals are based) is less valid for low values of p. This is particularly true when n is also low.
Statisticians don’t really expect journalists to understands the subtlety of all of this. Nor do they think the general public really understands. And politicians are told by their media spin doctors not to pay too much attention to the polls (unless of course they are ahead). The confusion introduced by the somewhat ad-hoc approach is small compared to the confusion that already exists concerning statistics and the additional confusion that would be added by adopting a more rigourous approach.
To put it bluntly,
If polls are reported as having a margin of error of plus or minus 3% (at a 95% confidence level)
then,
5% of such polls are completely wrong. That is the true value is more than three percentage points from the reported value.
And this is assuming that the sampling process is completely unbiased. Which is something that I would doubt in most cases.