Statistics question: polling margin of error

Smith has 49% and Jones has 46%. The margin of error is 3%.

Does this mean that Smith might actually have as much as 52% and Jones as low as 43%? Or does it mean that there can be no more than a 3% difference between them, with Smith having as low as 46% and Jones having as much as 49%?

Well here is the Wiki answer.

That’s right. The phrase also spawned the joke name of Car Talk’s statistician Marge Innovera.

The former. There is a 95% chance that Smith’s true level of support is between 46% and 52%. A 46-49 or 52-43 result would lie within this confidence interval.

The former, with a caveat.

Wiki also has an article on margin of error, but if you want something a bit more formal, try this pdf from the American Statistical Association. From this last pamphlet, the following (p. 67):

The caveat, of course, is that the “margin of error” is typically defined as a 95% confidence interval: in other words, in your example, Smith would have from 46% to 52% support (49% +/-3%) with 95% certainty (which is what Whack-a-Mole was explaining). Theoretically, Smith’s support could be as low as 0% and as high as 100%.

The former, but remember that 3% is a 95% confidence interval, which in English means that the poll will still be off by more than 3% about one time out of twenty.

And of course that’s assuming there’s only statistical error (because the poll only asked a few people); it assumes that for the people who were polled, the pollster really did find out who each person supports. Obviously there are lots of reasons why a person might say something different to a pollster than what they actually think (and there are reasons why pollsters might try more or less hard to get to what people really think).

There are really two numbers that have to be considered, although the news always drops the second one, which is the confidence interval.

So take the example in the OP. Smith has 49% and Jones has 46%. The margin of error is 3%.

That Wiki article assumes a confidence interval of 95%. We don’t know this, because that depends on the size of the sample. However as a first step let’s accept it.

Therefore there is a 95% chance that the actual percentage in the population for Smith is somewhere from 46% to 52%. (Or, another way, there is a 1 in 20 chance that the number lies outside this range.) There is similarly a 95% chance that Jones has 43% to 49% of the population.

Jones could be leading Smith. Jones could be far behind Smith. And one in 20 times both numbers could be wildly wrong.

The confidence interval is tied to sample size, so knowing the size of the sample is important. A larger sample - assuming that it is a good random sampling - will likely be wrong in fewer cases. And you can break down a larger sample in groups - male, female; black, white; young, old - with more confidence. That’s why the “real” Nielsen ratings don’t go by the 1000 or so homes that have meters but by the 10,000 that keep diaries. You can’t break down a sample of 1000 into dozens of smaller groups with any confidence.

Of course, in the real world there is no such thing as a truly random sample. All real world surveys are adjusted to make up for the groups that are underrepresented. Each survey outfit has its own proprietary algorithms for weighting samples, some of which are better than others, which is why they give better results. None work all the time.

Does all this make polls ridiculously unreliable? No, just plain ordinary unreliable. The real problem with political polls is that many, many people don’t make up their minds until they get into the voting booth, and not all of them admit to being undecided to a pollster. And even a 1% margin of error is useless if the final vote margin is less than one percent.

The secret to political polls is to watch trends among many separate polling companies. And even those are still vulnerable to massive mind-changing, as often happens in primary season.

Which is more than the question asked, but it’s hard to stop when this comes up.

What Freddy the Pig and others have said. But be aware that the notion that “there is a 95% chance that the true support is between x and y” is itself a statistician’s convention. (No, not the kind you attend in Las Vegas).

There is a lot of statistical math that is the way it is without any substantiating explanation for why it is done that way, it’s just “the way it is done”. Take standard deviation for an example. The difference between a given data point and the average data point is some number. At some point in history, statisticians decided that when examining all the data points in their sample set, it would be useful to have a number that indicated how far, on average, each data point varied from the mean. They could have simply taken the average distance between average (mean) and data point, summed those values up, then divided by the number of sample records. Instead, they square the distance of each distance from data point to mean, add those numbers up, divide by number of sample records, then take the square root of what they end up. Why? I dunno… the answer they give to incoming freshmen is “well if you subtract a number that is lower than the mean from the mean you get a positive number, whereas if you subtract a number that is higher than the mean from the mean you get a negative number, so we square the value to get an always-positive number”. Well yeesh, everyone who passed 4th grade knows what “absolute value” means. Whatever. It’s how it’s done.

There’s nothing magical about 95% confidence. We could quote “plus or minus x” for a confidence of 98% or 93% instead of 95%. It’s 95% because “we always do it that way”.

And the math that says “the actual value will be this value plus or minus x 95% of the time” is similarly based on a lot of rules that boil down to “that’s what we’ve come up with”.

None of which is to say that stats is bullshit (it’s not), just that the real meaning of statistics like these is emergent and self-referential: it tells you how Scenario A which you are studying compares to ten zillion prior scenarios in which similar stats were run.

Would the statement that “Smith and Jones are statistically tied” be correct? It doesn’t seem like it from what everyone is saying. It sounds to me like it is more likely that Smith is ahead.

No, not really, except to the extent that it’s understood shorthand for “a tie would lie within the confidence interval”. You’re correct that the reported poll result makes it more likely that Smith is ahead, assuming that the poll sample was randomly chosen from the relevant population.

Thanks to everyone for the explanations and resources.

One single indiv poll putting Smith ahead of Jones by 5% where the results are said to be plus or minus 3% (with 95% confidence yadda yadda) can be thought of as “a statistical tie”.

Four separate indiv polls each putting Smith ahead of Jones by 5% where, again, the results are said to be plus or minus 3%, … things not looking so rosy for ol’ Jones at this point. The likelihood of four different polls accidentally coming out showing Smith ahead by a smidge where actually in the real-live data set (as opposed to the sample set) Jones is in the lead is a heckuva lot less likely than it happening once.

And going back to 95% confidence: if we can’t say on that one poll that Smith is ahead of Jones with 95% confidence (we can only say give or take 3%, which creates overlap), we can nevertheless say (for example) that Smith is ahead of Jones with oh I dunno 84.2% confidence plus or minus 1.9% and lo and behold they don’t overlap any more! (But we’re just that much less confident of what we’re asserting). That’s where the statistican’s convention kicks in: it just isn’t done that way. Sticking to 95% makes it more possible to compare Radmussen’s polls with Gallup’s polls and so on and for follks like the guy who runs electoral-vote.com to do maps based on the average outcome of the last four polls in any given race or state.

In some cases, maybe, but in many others there’s a very good reason for doing things a certain way that becomes obvious once you actually get into mathematical statistics. You can’t really teach that in an intro stats course, so things tend to look more arbitrary than they actually are. (And if you’re looking for examples of arbitrary things, you really couldn’t do worse than to pick the standard deviation. That’s a very natural measure of variation.)

I would assume that there is some rigorous math and study behind the 95% number and it is not just something they pulled out of their ass.

The Wiki link I quoted suggests that they can improve on the 95% with larger samples but you hit a law of diminishing returns here pretty fast and it gets quite expensive and impractical to nudge up a few percentage points.

So, on a cost/benefit analysis the sampling done to achieve 95% has been decided on to be good enough for these purposes.

It sill amazes me that a sample of 1,000 has been found to be statistically significant when polling something like the US population’s opinion on a candidate but what do I know…they seem to think it works and it probably (mostly) does.

The issue here is that the phrase “statistical tie” in itself is rather misleading. The way it’s (normally) used, “statistical tie” means “we can’t tell, with X certainty (usually 95%), who is ahead.” It does not mean “the two are likely to actually be tied.”

In your example (Smith has 49% and Jones has 46% with margin of error is 3%), you could, if you wanted to, calculate the actual probability that either Smith or Jones is ahead (tossing in some reasonable assumptions). I’m not sure what it would be, but probably something like an 85% chance that Smith was actually ahead v. 15% for Jones. Not a tie, just less than the 95% standard.

Linky-link

I agree that the way it’s often explained leaves a lot to be desired. But there are many things that are done with the standard deviation that would not work (or would at least have to be adjusted) if it were defined differently.

One way to think about standard deviation, if you have the right mathematical background, is that it’s a sort of distance measurement, measuring how far a set of data is from its mean. You can think of it as the (Euclidean) distance, in n-dimensional space, between the point whose coordinates are your data points and the point where you would be if all those values were, instead, their mean. For example, in a set of three values {x[sub]1[/sub], x[sub]2[/sub], x[sub]3[/sub]}, with mean m, the standard deviation would be the distance in 3-dimensional space between the points (x[sub]1[/sub], x[sub]2[/sub], x[sub]3[/sub]) and (m, m, m).

…can be found in formula (3) on p.9 of this Census Bureau PDF.

It actually discusses standard errors, but it works just as well for margins of error.

What it boils down to is that:

MOE(X-Y) = sqrt[MOE[sup]2/sup + MOE[sup]2/sup - 2r*MOE(X)*MOE(Y)].

If MOE(X) and MOE(Y) are the same, as they will be in a single poll, then we can let MOE(X) = MOE(Y) = M, and it simplifies to:

MOE(X-Y) = M*sqrt(2-2r)

which is a lot simpler.

The problem is r, the correlation between X and Y. When X and Y represent the support levels of two politicians running for the same office, they’re obviously negatively correlated. The question is, how strong is the correlation?

I’ve never seen a good rule for this, and I personally think the ASA’s 1.7 rule (for what to multiply M by to get MOE(X-Y)) is a bit weak - it represents a value of r that’s around -.445, and ISTM that the (negative) correlation between the two candidates’s support is much higher than that, if the number of undecideds isn’t unusually large.

But WTF - I might use something in the 1.8 to 1.9 range, depending on the proportion of undecideds, but 1.7 is pretty close.

AHunter is right - that’s the part that has essentially nothing behind it but custom and habit, and not everyone’s customs are the same. The Census Bureau, for instance, uses 90%. Damned if I know why.

Are you saying that the “95%” number is just a guess? Kind of, “We think this all works out to be 95% valid but we have no idea really.”

Or is it more that, “We know the 95% number is mathematically sound…why we picked 95% to be good enough instead of a study that provides 90% or 98% we have no idea…just seemed a good number to us.”

If the latter I would think it again goes to a cost/benefit analysis. Going from 90-95% certainty is not overly difficult or expensive to achieve. Going from 95-98% is considerably more difficult for relatively little gain. 95% just looks like a good compromise between accuracy and effort.

This.