Florida polls: Now Bush is up by 1% in two different ones.

It looks to me that the candidate who wins two of the three big swing states (OH, PA, FL), wins the electoral vote. It’s possible to win with only one of three, but only if there are two or three surprises in smaller states. That wouldn’t surprise me, since these polls are so unreliable.

Thanks, Sam Stone, you explained that well.

The simple fact is that results lying within the poll’s MOE are not capable of predicting anything. They don’t draw a sharp line at 49%, they draw a nice fat crayon line centered on 49%. And the answer is statistically somewhere in that crayon mark.

Now, one can argue that poll results even within MOE are effective predictors of perfect data, but I’m not sure that’s entirely tenable mathematically.

That’s quite optimistic! Now I’m off to vote.

But fat the crayon line isn’t solid, it’s darker in the centre and lighter on the edges. It’s a bell curve.

Say a poll is taken: 50% Bush, 50% Kerry, MOE 3%. That means you have 95% confidence that Bush really has 47-53% support. However, it is more likely that he has 50% than 47%.

That assumes you have a proper sample for the poll. As Sam desribed, there are all kinds of problems with getting a good representative sample for your poll, but those problems are separate from the statistical MOE that is included in the poll data.

That is simply not correct. If you have a collection of polls, then those results will begin to have a normal (bell curve) distribution. But on a single sample (poll) of a given population, the statistics offer no additional information inside the MOE (all values are equally likely). In fact, there is a 5% chance that the actual value is outside the confidence interval.

For example, let’s say you want to know if a coin is fair (when flipped, 50% chance of heads or tails). If you flip the coin 1000 times, and end up with 511 heads, does that tell you that the coin is probably biased towards heads, or that the coin is probably fair?

I guess I may be wrong, but that makes no sense to me. Hopefully I can ask a question without hijacking the thread:

In my 50%/50% example, the pollsters would have generated their “3% MOE with 95% confidence” from their set of poll responses. Can they also calculate an “X% MOE with 98% confidence” from the same data, or would they need to do more polling? That is, from a single set of poll results, can they calculate a margin of error for any % confidence they like?

AZcowboy, I’m not sure if your wrong or I’m misunderstanding you, but here is a cite that explains MOE in more detail.

The basic gist of my original post was that differences between the candidates scores that are within MOE are still meaningful predictors of who will win the election. As the link shows, whoever is shown to be ahead in a statistically ideal poll, even within MOE, is still the most likely winner. If, in every election where the candidates are within MOE, you bet five dollars on the person who is behind, you will loose more often then you win.

I’ll mention once again that, in this election anyways, the polls appear to be far from statistically ideal.

AZCowboy:

I’m almost positive you’re wrong. As positive as I’m ever likely to be on a quantitative issue!

Let’s say we’re polling Iowa. It has a real population of registered voters. Ignoring for the moment the logistical problems with getting a representative sample, let’s simply assume we have one, so we now have a real population and much smaller group, the representative sample. Now let’s say our study found that in our sample, 47% of the vote would go to Bush and 50% to Kerry. Then we extrapolate from that to the real population and say something like “the margin of error is 3 points”. What does that mean? The unspoken (because it’s an ironclad tradition, but an arbitrary one nonetheless) variable is that there’s a 95% confidence attached to that statement, so if I unpack it what’s being said is “In 95% of all hypothetical samples of this nature where these results are obtained, the actual real population will be within 3 points of these values. In the remaining 5% of those hypothetical samples, the real population will vary by more than 3 points.”.

Now suppose I ignore tradition and go with a 90% confidence instead. I can now, with equal validity, say “the margin of error is 2 points”*. The unpacked version this time would read “In 90% of all hypothetical samples of this nature where these results are obtained, the actual real population will be within 2 points of these values. In the remaining 10% of those hypothetical samples, the real population will vary by more than 2 points.” Voíla, the margin of difference between the two candidates is now greater than the margin of error!


  • No of course I didn’t actually do the math. I didn’t even ferret out the true population and sample sizes.

When you assert that statistics don’t “say anything” about measured differences when they are less than the margin of error, you’re misunderstanding how stats work.

Let’s switch our attention to Nebraska. Let’s say my sample says Bush is up 62% to Kerry’s 32%. Far and beyond the MOE as conventionally constructed, but there does exist some number such that, when used as the confidence figure, creates such a wide margin of error that the gap between the Bush and the Kerry figures is less than the margin of error. Let’s say it’s 99.999999727 for the sake of argument. I could say, accurately, “In our sample, Bush has a 30 point lead over Kerry, but it is not statistically significant at the 99.999999727% level. In 0.000000273% of all cases, the actual population will vary more than 30 points from the values found in the sample size, so Kerry could actually be ahead.” Of course he could.

AZCowboy:

I’m almost positive you’re wrong. As positive as I’m ever likely to be on a quantitative issue!

Let’s say we’re polling Iowa. It has a real population of registered voters. Ignoring for the moment the logistical problems with getting a representative sample, let’s simply assume we have one, so we now have a real population and much smaller group, the representative sample. Now let’s say our study found that in our sample, 47% of the vote would go to Bush and 50% to Kerry. Then we extrapolate from that to the real population and say something like “the margin of error is 3 points”. What does that mean? The unspoken (because it’s an ironclad tradition, but an arbitrary one nonetheless) variable is that there’s a 95% confidence attached to that statement, so if I unpack it what’s being said is “In 95% of all hypothetical samples of this nature where these results are obtained, the actual real population will be within 3 points of these values. In the remaining 5% of those hypothetical samples, the real population will vary by more than 3 points.”.

Now suppose I ignore tradition and go with a 90% confidence instead. I can now, with equal validity, say “the margin of error is 2 points”*. The unpacked version this time would read “In 90% of all hypothetical samples of this nature where these results are obtained, the actual real population will be within 2 points of these values. In the remaining 10% of those hypothetical samples, the real population will vary by more than 2 points.” Voíla, the margin of difference between the two candidates is now greater than the margin of error!


  • No of course I didn’t actually do the math. I didn’t even ferret out the true population and sample sizes.

When you assert that statistics don’t “say anything” about measured differences when they are less than the margin of error, you’re misunderstanding how stats work.

Let’s switch our attention to Nebraska. Let’s say my sample says Bush is up 62% to Kerry’s 32%. Far and beyond the MOE as conventionally constructed, but there does exist some number such that, when used as the confidence figure, creates such a wide margin of error that the gap between the Bush and the Kerry figures is less than the margin of error. Let’s say it’s 99.999999727 for the sake of argument. I could say, accurately, “In our sample, Bush has a 30 point lead over Kerry, but it is not statistically significant at the 99.999999727% level. In 0.000000273% of all cases, the actual population will vary more than 30 points from the values found in the sample size, so Kerry could actually be ahead.” Of course he could. But I wouldn’t wager lots of money on it.

Back to our Iowa example. Until you loosen the confidence figure to 50%, a difference found, but lying within the margin of error, is still more likely than not reflective of the ordinal value of the difference (i.e., who’s ahead) within the real population.

OK NOW… Another factor, which statistical math does not in any fashion address, is the extent to which the sample is a representative sample to begin with. They never are in the pure sense (if nothing else, you’re going to miss the opinions of people who always hang up on pollsters; they may not differ in relevant opinion from the people who are willing to be polled but neither the poll answers nor statistics will tell you that one way or the other). Here is where cell phones, phrasing of poll questions, and whether or not Republicans are more likely to have working phone lines in Florida can mess with your data. And from a general sense of this sloppiness you may feel justified in saying “A spread of 1.5% in a poll with a 4% margin of error is totally meaningless”. And probably with good reason.

Just not with good statistical reason.

hah! Looks like even Roger Ailes is giving up on Bush.