Statistics question with respect to poll margins of error

This question arises from a remark made on this past week’s West Wing episode. Suppose a poll were taken that showed your candidate 9 points down, and its margin of error was ±3. Suppose the same poll was repeatedly taken over the course of a week, and on all seven days, you were down 9 points. Does this mean that You are in fact 9 points down, and not 6 or 12 — i.e., that the margin of error no longer applies?

Since the normal course of polling is to do a different random sample each time, the samples are independent.* That means there’s no way to know if the 9 point differential is anything more than chance. The margin of error still applies.

If you’re a good pollster to begin with, you don’t even talk about a 9 point differential but always about a 6-12 point differential. But shorthand jargon happens even when people know it to be wrong.

*Technically, they’d be independent anyway, but it would be possible to ask more meaningfully if attitudes had changed in the previous week.

Presumably they polled n people to get a margin of error of +/- 3 (n approx eq 1100). If you polled n different people every day of the week, you’d have 7n total trials, so the margin of error would be less than +/- 3, but there would still be a margin of error. The only way to reduce it to 0 is to poll everyone in the country. If it’s the same n people every day, you’d still have a +/- 3 margin of error, assuming they gave the same reply each day.

That’s not right. Most people are suprised to learn that random sample size really isn’t dependent on the size of the population you are trying to represent. A poll of 1100 people can just as accurately represent the whole country or the whole world as it can your state or your hometown. It is the absolute, not relative size of the sample. Aiming for big numbers won’t achieve anything on its on.

This calculator will show you exactly what I am talking about.. On the top one, Confidence Level = 95%, Confidence Interval = 3 (+/- Margin or error). Now enter a population. A population of 8000 gives you a sample size of 942. Now scale up to the world’s population (6,000,000,000) and you a sample size of 1067.

That sample size of 1100 lets you model any population in the world as long as the sample is truly random. That is the trick however and that is the biggest source of error.

The problem is getting a truly random sample. This is the biggest source of margin of error. It is very difficult to get a truly random sample with polling. Certain types of people won’t talk to the pollers at all and some will lie.

It doesn’t work that way under most circumstances. The margin of error takes into account flaws in the execution and design of the poll. Repeating the same poll should pick up on the same flaws each time. An execption is if there was bias in only one poll such as as PETA convention being held one day.

If the poll is meant to pick real election day results, the poll may not match reality for a variety of reasons. One is that some people will lie about who they are going to vote for if their real candidate isn’t socially popular. Repeating the same poll won’t help that.

In the simplest case, you use a simple random sample, in which subjects are drawn with replacement, with equal proabability for each draw.

You’re estimating P, the proportion of subjects who have some attribute … for example, voting for your guy, or rather, saying that they will.

If you use n subjects, then

Pest= (number of subjects with the attribute)/n and for a 95% confidence interval the form is

[Pest-ME,Pest+ME], where

ME = Margin of Error = 1.96Correction FactorSQRT(Pest(1-Pest)n).

The ME reflects the amount of incertainty due to the effect of using a random sample.

To understand the meaning of the 95% confidence idea, you have to imagine a Family of Samples, each of size n, each drawn in the same random way. Imagine computing a separate confidence interval for each member of the Family of samples: that yields a Family of Intervals.

Approximately 95% of the intervals in the family actually contain the true proportion. The idea is to view single intervals as being drawn randomly from the Family of Intervals.

It is possible in the original example that the first interval is in fact correct, and the others are incorrect. Assuming independence. the probability is something like:

(.95)(.05)(.05)(.05)(.05)*(.05)

However, inference of this type tends top focus on one survey at a time. Polling trend data is usually more informative, with the trend yielding information about the changes in the voter preference.

But viewing voter preference as constant in time is not correct.

Gallup Polls FAQ

Confidence Interval Proportion

But you can get a narrower confidence interval with a greater sample size. Look at the second calculator. Input a sample size of 1100 with a population of 300 million. The confidence interval is +/- 2.95. Now increase the sample size to 7700. The confidence interval narrows to +/- 1.1. I understand that 1100 people can represent 300 million, but more people gives more confidence. Greater sample size means less variance.

Of course getting a true random sample is difficult. My answer was implicitly assuming that the sample was truly random.

The simplest formula for standard error is

SQRT(P*(1-P)/n), which is maximised at P=.50=1/2, so, the worst precision in the simplest case is no worse than

SQRT((1/2)*(1/2)/n) = SQRT(1/4n), which indeed drops precipitously with increasing n.

However, most national surveys go with n at or near the 1,000 to 2,000 respondent range. The issue is cost and timeliness, as well as severely diminishing gains in precision.

One of the numbers generally left out of these things is the confidence level. The reason they generally leave it out is that it would be highly unusual for it to be anything other than 95%. There’s no inherent reason to cite stats for a 95% confidence level instead of, say, 87% or 99% or whatever, it’s arbitrary, but having a standard arbitrary confidence level makes stats comparable and lets people reading stats have some sense of <ahem> err, confidence, that they know what those numbers mean.

But IMHO it makes it clearer what’s being stated if you state the full phrase, including the confidence level: “Our sample population measured X% in favor of Ballot Issue 4, so we can say that, assuming our sample is representative of the real population, as a truly random sample would be, we have a 95% confidence that in the real population the support for Ballot Issue 4 is X% plus or minus 3.”

In other words, as sheer mathematics of statistical sampling goes, picking a sample at random from the real population is going to give you a measure within 3 points of what the real population itself would measure at, 95% of the time. The other 5% of the time, by sheer coincidental happenstance, you’re going to be off by *more * than 3 points.

The other important factor is true random sampling. All statistics assume that the sample population is representative in the sense that the method for selecting the sample is as random as a lottery. In real life, samples are obtained from the willingly cooperative, from the available, from the reachable, from the subset of the larger true population that just happens to be compliant about asking questions that others might resent as not giving them any good answers to choose from. Very few studies include the opinions of people who detest pollsters and habitually hang up on them. If none of these factors bear any relationship to a respondent’s attitude towards Ballot Initiative 4, such things make no difference, but you can’t know that. So statistics in real life are always a bit skewed from the pure ideal of a random sample population. Remember the concern in the '04 elections about the unpolled opinions of young folks who don’t have a land line telephone? Sometimes these things are relevant.

(There are other ways in which statistics have a lot in common with lies and damn lies but most of the rest have more to do with how questions are worded and available answers rather than the math of it. Although you could probably get me cranked up to rant about controlling for variables that are intrinsically related to dependent variables, or playing fast and loose with stepwise regression…never mind…)]

I appreciate the responses. I’m still not sure I know the answer to the question though, not because you all haven’t explained it well, but because I’m so dense about statistics. In the final analysis, and given all the formulaic hubbub, does repeated polling with the same results mean that margins of error are moot?

Making basic assumptions that nothing wacky is going on (polling the same people, some relevant event occurs during some of the polling), the margins of error will be reduced, but not eliminated. The way to look at is that you are not repeating the polling, but extending polling to a larger sample, which reduces the sample.

For the most part, the population size doesn’t matter. However, your model is never perfect unless you poll the entire population. If they polled all 6 billion people, except for Shagnasty, then the poll still has a margin of error, since they don’t know what Shagnasty would do. The margin of error is incredibly small and negligible for any intelligent purposes, but it is still there. And increasing the sample size always helps, just decreasingly so past a certain number.

No. Repeated polling ever so slightly reduces the margins for error, but it is NOT true that you can drive the margin to zero by a coupleof repititions.

And again, the margin for error is only half the story. A margin of error of, say .0001 percent for a truly perfect poll would have the true results fall in that margin only 95% of the time. the other 5% of the time the sample might differ from reality by 5%, 10%, even 75%. The larger deviations are less likely, but they are not impossible.

Statistics is sort-of 3D. You have the sample, the margin, and the confidence. Talking about any two leaves out the rest of the story. It’s (very metaphorically) like trying to describe the shape of a cube in totally 2D language. Depending on how you’re looking at it, it may look like a square, a diamond, or an off-kilter hexagon. But its the same thing; it isn’t changing shape. just your point of view is making it look like it’s changing shape.

The question you asked was essentially: What shape is this cube in 2D? The answer is: It depends on how you look at it. There is NOT one simple answer.

I am not sure what you mean when you ask whether or not the margin of error “applies.” The margin of error is simply a restatement of the sample size, N. The fact that you repeat the poll with a newly drawn random sample does not affect the limits of statistical inference. There will always be variance due to sample size.

The margin of error statistic is useful only for comparing different polls. It should never be used to evaluate the likelihood of percentages reported within polls.

There is a reasonably straightforward calculation that you can do to in the circumstances you suggest above. Suppose your guy is up 9 points and the margin of error is 3. It turns out that there is a 99.8% chance that your guy is actually leading his opponent, regardless of spread.

In context of the West Wing episode, then, it appears that the sentiment surrounding the verbal exchange was essentially from the perspective of a layman. Depressed at being consistently behind in the polling, the lady snapped when someone said, “Well, it could be that we improved to six percent — don’t forget the margin of error!” Whereupon she deadpanned something to the effect of, “When you get these numbers day in and day out, the margin of error loses its meaning.”

Ok. In this case, this is a pretty serious abuse of the margin of error. It should never be interpreted in this way, though in the media, it very often is. The idea of a “statistical dead heat”, for example, is nonsensical.

Thanks, Maeglin. And again, thanks to all. I’m glad I asked the question.

Operationally speaking, I have problems with a lot of these answers.

For example, it’s a perfectly valid technique to combine polling results from polls taken over several days. In practice, however, you would never do so unless the poll had been designed that way from the inception. In the West Wing example, you’re talking about political tracking polls. Combining their results defeats the whole purpose of taking them.

And what is this purpose? Tracking trends. Not absolute numbers: trends. No experienced pollster would ever talk about the margin of error in a series of a tracking polls in the first place. The fact that the number is staying the same is operationally far more important. Tracking polls are used to guide resource allocation. A steady difference means a different media strategy than a rising or falling difference does.

Political polling is also different from other types of polling. For one thing, the world is not normally split into two close to equal groups. For another, at the end of the campaign you get an absolute count of the results. This makes the margin of error problem significant to the extreme. A statistical dead heat - as shown in almost any election cycle - means that you don’t know the outcome ahead of time, yet you have to continue your marketing. This is entirely different than a taste test polling campaign, in which you wouldn’t go ahead with your product launch unless you had clear indication that it is heavily preferred. In politics, there is no such luxury.

That’s why these small number tracking polls are only a tiny part of an overall polling strategy. In presidential politics there is little to be gained by a national media campaign. Campaigns are by states, groups, and media markets. To get at these, either much larger national polling must be done to make the segments of the poll large enough for statistical meaningfulness or, more realistically, a much narrower slice of the population is polled, both in area and in likelihood of voting.

Most of the tracking polls you read in the papers are just horse race journalism. If you check the articles carefully, you’ll usually find some campaign insider saying that they have information that these polls aren’t showing. Yes, they would of course say that in any event, but it happens to be true. Their polls are much more accurate for the slice of the voting population they’re referring to.

Not having seen the original West Wing dialog, I don’t know the context it was presented in, which makes for a lot of difference here. To be charitable, it probably was intended to mean that the lack of a change meant that the current campaign strategy wasn’t working and a change in tactics was needed, rather than a mathematical comment on the accuracy of the polling.

Not quite right. The population size is essentially irrelevant only if it is large when compared to the sample size. If the population size is only 3,000, the margin of error will be different if the sample size is, say, 1,100 vs. 1,200. There is something called the finite population correction factor that’s part of the formula. But if one assumes an infinite population (true for all intents and purposes for most polls), then the sample size to achieve a required margin of error will not depend on the population size.

This is not correct. The margin of error actually assumes that the sample was done perfectly, that you have drawn a valid random sample. Given a desired confidence level (CL), the margin of error measures the expected error around the point estimate for the given sample size. It’s the sampling error. In other words, repeated samples of the same size will result in a point estimate that falls within the range determined by the margin of error CL% of the time. Mistakes in execution or design are part of non-sampling error.

Every poll with have some bias, too many Republicans, too many old folks, too few black voters. But having all these polls come in at nine, the bottom line is that you are losing. Even if the actual result is outside the 95% confidence interval, maybe you are down 2 or 3 points. You are still losing.

I think earlier in the conversation they mention that nine points at one point in the campaign would have been really great. So they have gained, but now momentum has stalled. And 6-12 points in presidential politics is a nightmare scenario, especially if the Republican is winning California.