Polls and margin of error.

Can anyone explain, without citing a semester’s worth of statistics, how MOE is figured into a poll?

Let’s say you take a poll of 1000 likely voters and get 501 for one thing and 499 for the other. How do you get a MOE on that? A typical one I see is 3 or 3.5%. Is this taking into account a few percentage of people that may be psychotic liars? Or is there a legit reason? The first that comes to mind is wording of a polling question that may skew the results, but not sure.

So how is the margin determined to a certain number, and why is it even needed?

Just thought of one more possibility. Could it be akin to a point spread in sports? Where a polling company gives it best estimate but covers it’s ass by building in a slight chance the poll will be wrong, thereby maintaining credibility?

Here’s a good explanation of it:

http://www.survey-usa.org/methodology.html

It’s easy to read and understand. Here is an excerpt:

The margin of error takes into account the possibility that your sample wasn’t representative of the population you’re trying to test.

Say the state of Ohiowa has 1,000,000 voters. If you were perfectly omniscient, you would know that on election day 501,000 of them will vote for the Dempublican, while 499,000 will vote for the Republicrat. Since you aren’t perfectly omniscient, you poll a random sample of Ohiowans. The margin of error is to take into account the possibility that, just by sheer bad luck, you happen to draw the names of 501 Republicrats and 499 Dempublicans out of the hat. Generally, it’s expressed as something like “50.1% Dempublican to 49.9% Republicrat, and we’re 95% sure that the actual population is within plus or minus 2% of that”. Note that this means there’s a 5% chance that you’re even further off than the plus or minus 2% number you gave. Note also that if you make the plus or minus number larger, you can get really high degrees of confidence. We can say with great confidence–99.999…%–that Ralph Nader will not make a stunning upset and be elected President of the United States next Tuesday, and that neither Bush nor Kerry will take 75% of the national popular vote. Of course, those predictions aren’t very useful either. You trade accuracy for precision. “It will rain in Iowa sometime in November” is almost certainly accurate, but not terribly precise; “It will rain in downtown Cedar Rapids after lunchtime next Wednesday” is very precise, but has a very good chance of being completely inaccurate.

Also, the “margin of error” is taking into account random chance, not a flawed survey. (“Are you voting for our noble patriotic Dempublican incumbent Senator, or for his weaseling, freedom-hating, wife-beating, puppy-killing Republicrat opponent?”) Flaws can be more subtle than leading questions (and bad question design can be a lot more subtle than that, of course). For example, if you pick your sample by going through the phone book, you will miss people with unlisted numbers. Perhaps more Republicrats have unlisted numbers, because that party attracts rich people who don’t like to be bothered at home, and can afford to pay for an unlisted number. Or maybe single women are more likely to be Dempublicans, and also more likely to have unlisted numbers to stave off creeps and stalkers. (In practice, modern pollsters use automated dialing systems, and call people with unlisted numbers as well. One factor which no one really knows if it will have any affect on this round of polls are the growing number of people with cell phones only but no land lines, who don’t get called.)

There’s a whole lot more to poll-taking, which makes it in practice more of an art than a science. (How do you determine who’s a “likely” voter, anyway? How do you control your sample so that it truly represents the population at large?)

It has nothing to do with lying or poorly phrased questions. It has to do with the fact that the number of people surveyed is a relatively small subset of the population. In the example you give, the polling result would be 50.1%. But can we have any confidence that the “real” answer is not, say, exactly 50%? If it were, there would be an excellent chance of getting a deviation this large from 50:50 (flip a fair coin 1000 times and only rarely will you get excactly 500 heads). Well, what if the real answer were 55%? Could the poll give 50.1% by chance? How about 60%? We need to know in order to interpret the poll. Even if the real answer is 99%, there is a nonzero, albeit extremely small, chance of obtaining 501:499. Thus one computes the range of real answer for which the observed polling result is within, say, a central 95% chunk of the probability distribution.

Here’s a simplified version of the type of reasoning involved. With a thousand people in the poll and a real underlying probability of 0.5, the standard deviation of the poll result would be sqrt(0.5*(1-0.5)/1000) = 0.0158 or 1.58% (this is a known result for the binomial distribution, which describes things like repeated coint tosses). 95% of the time the result is within approximatedly two standard deviations. This is not precisely how a margin of error is computed, but it gives the flavor and yields answer near 3%.

Here’s a rule of thumb that’s mostly good enough. Take the square root of the number of people in the survey. That’s the margin of error. If you take a survey of 1000 people, the margin of error is 32 people, or 3.2%. 95% of the time, your poll will be within 3.2% of the real percentage. Notice then that to make the survey 10 times as accurate, you would have to survey 100 times as many people. If you did a survey of 100,000 people, your margin of error would then be 316 people, or .32%.

That’s a nice rule of thumb, and corresponds exactly to the approximation above. Note that it depends on a roughly 1:1 split. I works reasonably well over a broad range: 0.4*(1-0.4)=0.24, as compared to 0.5*(1-0.5)=0.25, and even for 0.3 the number is 0.21. However, using this, for example, to compute the margin of error for the fraction of Nader voters would give seriously misleading results.

And just to beat a point to death, “margin of error”, all by itself is meaningless. The full description includes both margin of error and confidence level.

Speaking just of margin of error without the confidence level is like talking about a fraction and only mentioning the numerator without disclosing the denominator.

It’s a statistical convention to assume the confidence level is 95% if unstated, but that’s just a convention.

Particularly in things political, there’s an awful lot of lying that occurs by telling the truth, but only only half the whole story and leaving the audience to assume the other half. By careful arrangement of the two halves, they can simultaneously “tell the truth” and be totally misleading. Provides good plausible deniability while still giving all the spin desired.

Well, you are close on the “covers its ass” thing.

If you run a poll and are willing to be wrong one time in 20 when making a prediction based on the poll results then you use a 95% confidence table to find the limits of error. For example if your poll determines that candidate A gets 45% support the table says that you will be right 95 times of of 100 if you state that A’s true percent support is between 42% and 48%.

If you want to be right 99 times out of 100 then your statement will be that A’s support is between 41% and 49%.

That is, the more you want to make correct statements the wider your limits have to be so that you can be more sure that the true support for A will be within the limits you state.

The 3% error only applies to results than are in the vicinity of 50-50 and for a sample size of 1000 poll responders. For example if A only gets 30% then your 95% confidence statement is that A’s support is between 27% and 30% which is an error of 10% of the 30% figure.

If support for A is in the Nader zone, say 3% then your statement would be that the true support is between 2% and 4% or 33% of the poll’s 3%.

The statistical tables are based on the assumption of a sample that is actually representative of the general population and that the responders told the truth.