Polls ... what does the % error mean?

When a poll says that 1,000 people were surveyed and candidate “A” got 52%, candidate “B” got 48%, and the error is x%, what does the error mean and how do they calculate it?

It means there is a 95% chance that A will actually get 52% plus or minus x percentage points, and candidate B will actually get 48% plus or minus x percentage points. Falling within the margin of error does not mean that the candidates are tied, as many media outlets seem to imply. Candidate A in the above example is still more likely than B to win. There just isn’t enough evidence to say with greater than 95% probability that A will win.

I admit, I never did well in statistics, and have forgotten a lot of what I know, but I do know that that’s sampling error. Whenever you get the opinions of a small group of people, it’s not neccesarily going to be the same as the opinions of the population as a whole. Sampling error is a way to deal with that…you usually either set the margin of error at .5% or .1% (I think…like I’ve said, it’s been a while). I think polls usually use .5. So, if the poll says Gore’s popularity is at 45% with a 3 percent margin of error, that means that there’s a 95.5% chance that the support for Gore is between 43.5% and 46.5%

If you could poll everyone in the country, you would know the exact percentages. The more people you poll, the more accurate your results are. If the margin of error is 3%, that means that you can be 95% certain that the true answer is within 3% of what you determined. I think that the 95% is called the confidence interval. You can set it to different values, but the higher the confidence interval, the greater the margin of error.

Even if the margin of error is 3%, you never know for sure that the real answer is within 3% - you could happen to pick 200 Gore supporters at random. Thats why you have to say you are 95% sure.

It is also important to remember that the reported margin of error is the sampling error - the error reflected by which 200 people you happen to pick at random. But this is really only one of many types of errors in polls. Other types include poor or confusing questions, and not truly picking people at random. I think that the sampling error is less than other types of error put together. But this is never reported.

Of course, its been a long time since statistics class and I may have some of my terms wrong.

  • Stephen

A clarification: When I said 95% chance I was assuming the confidence level was 95% (corresponding to alpha .05 in statistics-speak). This is the usual confidence level for public opinion polls, according to one of my statistics textbooks.

bibliophage (notice lowercase ‘b’, I can be taught!) brings up an interesting point. When the news quotes a poll as having a 4% margin of error, but doesn’t tell you the confidence level, they’ve essentially told you nothing. It’s like saying it’s 45 degrees today, without mentioning if that’s in Fahrenheit or Celcius, or some other scale known only to themselves.

Thanks Greg. I have never seen a confidence level used on poll results either. Is the confidence level suppose to be obvious? Wouldn’t the difference between a 90% and a 99% confidence level be quite significant in terms of a “margin of error”?

Going from a 90 to a 95% level of confidence actually is very significant in terms of margin of error. When we try to determine a confidence interval for the population mean (or proportion in this case) with a point estimate (the sample mean/proportion) provided by the sample, the difference in “margin of error” between 90 and 95% confidence is fairly substantial. Assuming that sample is randomly taken from a population, and thus that the distribution of all possible sample means/proportions will follow a normal trend (true for just about every population), it’s the difference between 1.645 and 1.96 in the z-score used to calculate the error. In other words, you could have a sample with approximately a total of 3 percentage points of error at the 90% level of confidence, and be 90% certain that the actual result will fall somewhere in that interval, or about 4 percentage points either way and be around 95% sure that the real result will fall somewhere in this (wider) interval.
That pollsters often don’t tell us what confidence level their statistics were calculated at is really a bit of a disservice-for all we know, it could be a sample of 5 people with a 10% level of confidence and still have an “error” that we’d consider acceptable with the information we’re given on the news!
Sorry if the wording’s a little vague. Just wanted to add on a little more information.

I also think it’s more ethical of pollsters to reveal the sample base. For example, in a recent LA Times poll, they had the footnote “This poll is based on a random telephone sample of xxx voters…”

What amazes me is how statisticians can have 95% confidence level on polls that only sample several hundred or a thousand voters out of the millions that make up the US electorate.

And, of course, there’s that nagging question brought up in the last post–is the sample random enough? A telephone sample is NOT a proper random sample, but it’s closer than one of those mall voluntary-poll things. I have a hard time imagining, however, those polls that people only respond to when they feel strongly about an issue, such as those that often pop up on AOL and whatnot, have even a shred of accuracy. After all, the only individuals that are going to respond to a poll like that are those that are opinionated about the issue in question and want to share. How can any kind of accurate estimate originate from THAT? Really, I’m not sure, when I think about it, that the proper statistical confidence rules actually can be applied in just about any poll at all, since it’s so difficult to get a sample that even approaches proper randomness–in telephone polls, only certain people will be home AND want to respond, etc.

As for confidence interval estimates ("+or- x%"), you’d be amazed how small a sample is needed to get a pretty small interval.

I thought that the + or - was for each candidate. Say Gore is at 45% with + or - 3 and Bush is at 48% with + or - 3 according to one pollster. Stated differently Gore 42% to 48% and Bush 45% to 51%. There is an overlap of 3 meaning it is too close to call.

Please correct me if I am wrong.

Not exactly. There are many variables here. There is an arbitrary confidence level, which they don’t tell you, but 95% is standard. So let’s say they say Bush has 49% with a 3% margin of error, with a 95% confidence level. That means you are 95% sure that Bush will get between 46% and 52%, inclusive. If you raise the confidence level, the margin of error goes up. if you raise the sample size, the margin of error goes down. To get 100% confidence level and 0% margin of error, you need to poll everyone who is going to vote and make sure they don’t change their minds.

The overlapping margin of error [even tho it is at 95% confidence level] still means that pollsters cannot say that Bush is leading. The race is:
.neck ‘n’ neck,
.too close to call,
.Nader is taking away from Gore,
.Buchananites are taking away from Bush but that doesn’t mean much 'coz there are not that many of them,
.we decideds are all at the whim of the can’t-make-up-their-minds.

Now that is interesting fodder for all the political pundits after election day.

Although 1,000 people does not sound like enough to make up a decnt sample of the entire US, it is based on pretty well-established statistical principles, assuming you have a truly random sample which is nigh impossible. That is why you don’t even bother seeing percent errors on internet or those call-in polls, because those are not random samples. They suffer from what is generally called “sample selection bias”.

There are a lot of the other problems people mentioned, wording of the question, not divulging the confidence interval, and even the truthfulness of the respondents. I amnot sure how it plays out in these presidential polls. But there is plenty of evidence that people lie about other things.

‘Sir, what magazines do you read regularly?’
‘Time, The Economist, Forbes, the New Yorker’
Sitting on his coffee table…Maxim, Pro Wrestling Weekly, the Recycler, Cat Fancy.

I always thought that ‘error is x%’ refers not to the subject themselves but to the people who made the poll.

In other words, if one person who works for the poll calculates the statistics to be 70% and then another person (im sure they have more than one person doing it) doing this poll calculates it to be 74%, you would say the error is +/- 4%.