Stick a fork in her: Clinton's done

There are two things commonly referred to as “margin of error.” One is a confidence interval for individual statistics within a poll. The other is an overall error. My understanding is that the media usually reports the latter, not the former. In this case, I assumed SurveyUSA reported the latter because they themselves said the overall difference was outside the MoE. Can you explain where I’ve gone wrong in that reasoning?

I didn’t major in statistics, so I’d like someone to confirm my understanding:

If a poll has A at 53% and B at 43% with a MOE of 5%, doesn’t that mean that A is most likely between 48% and 58% and B is most likely between 38% and 48%? So A’s lead over B could be from 0 to 20%? Or am I totally wrong?

The contribution questions gave me pause, too. Because the charge card is in my name but I don’t personally have anything resembling taxable income, my husband earns all our money and he’s a foreign national and gah… :confused: I figure it’s OUR money. We’re not talking about a lot of money, here, either. I’m not in trouble, am I?

Looks like John Lewis’s on-again, off-again flip from Hillary to Obama is on again.

Got it.

TY.

There is no such thing as an “overall error”. The “margin of error” has a very specific meaning and can be calculated at various levels of confidence for each random variable. Typically, when the media reports a margin of error, it is calculated at the 95% level of confidence. The random variables captured by the survey are the inferences that you draw from the sample, that is, 45% of the country supports a particular candidate.

The margins of error about each of the survey estimates are the same because the margin of error is a function of the sample size. Obviously, the sample size is the same regardless of the preferences of the individuals surveyed.

Again, the margin of error just reflects the sampling error inherent in the estimate, not any other type of error like selection bias.

No, this is not what it means.

There is a true population value out there somewhere. You do a survey because you can make inferences about the true value using a sample as long as you follow a few rules. The true value is not the variable; the number estimated by the survey is the variable. The true value is theoretically fixed, but the survey value can change depending how you cut the sample. This is the essential definition of sampling error. You are always going to get outliers, you are always going to get abnormalities, you are always going to get noise.

What the margin of error tells us is that it is 95% likely that the confidence interval 48% to 58% falls about the true value based on the vagaries of sampling alone. This says absolutely nothing about the difference between A and B, and it certainly does not mean that if you sampled the same population 100 times, the results for a particular candidate would be distributed uniformly about the confidence interval. You could do the same poll 100 times and get almost exactly the same results. The fact that margin of error exists does not in any way invalidate the results of the poll. Almost no additional information is conveyed when candidates lead each other “within the margin of error”.

In order to reduce the margin of error, all you have to do is increase the sample size. However, there are seriously diminishing marginal returns. There is a huge reduction in the margin of error when you increase the sample from 500 to 1000, but a relatively modest one when you go from 1000 to 10,000.

I think that a huge amount of confusion and misinterpretation of survey results, and accordingly individual voting logic, is driven by misunderstanding of these concepts.

Of course, MoE has a particular mathematical definition. But as applied to a poll, there are different MoE’s for different parts.

Margin of error - Wikipedia

The one being reported by SurveryUSA, as I understand, is the overall MoE as calculated based on the number of LV’s (as opposed to the MoE of one of the crosstab statistics).

What hasn’t been explained to me sufficiently is why we’re not talking about a single, binary random variable: do you support BHO or HRC?

Missed the edit. Let me word my question more precisely: Why is this not just one sample in which you draw a certain amount of Obamas and a certain amount of Clintons, thus yielding one MoE that is the one reported?

In this case, these would be exactly the same thing, as I mentioned above. The number of respondents is the same. These numbers only differ if the number of respondents to particular questions or parts of the survey differ. With respect to interpretation, it is a distinction without a difference.

Also from the wiki:

This is not an “overall error” that is in any way different than a point estimate margin of error. This is just sloppy, misleading reporting.

I do not understand your confusion. This has nothing to do with how many survey questions you ask. The purpose of the survey is to infer from the sample how many people support each candidate. You can do this with one survey question; you can do this with ten. That does not change the survey’s output, which is, obviously, two numbers.

Each is a random variable because each is subject to change if you recut the sample and repeat the survey.

Read the rest of the wikipedia article. It is actually pretty informative.

And this is absolutely correct provided that the number of respondents is the same. This will be true in a poll like this almost by construction. But in surveys with multiple questions, lines of inquiry, etc, you can have a lot of people refusing to answer certain questions. This makes the margin of error different for inferences driven by those questions.

Ok, so if it is absolutely correct, why is it wrong to say that a 4 point difference is outside the 3.8 MoE?

ETA: I’m not disagreeing, just trying to get educated.

“Totally irrelevant” is wrong.

Poll results represent point estimates because of inherent sampling fuzziness. In reality, we should really be speaking in terms of confidence intervals instead of treating these things as absolutes as we tend to do. When two CIs overlap, we really can not conclude a significant difference exists between them and keep our credibility. The reason is due to there not being enough distance between them to say with 95% confidence that the true difference is greater than zero. We shouldn’t be discarding the importance of this certainty by declaring it “totally irrelevant” when we want to. None of us know the “true value” these point estimates are supposed to represent.

I’ll illustrate my point by taking the example you gave with the 48% (A) and 46% (B) point estimates and a 4% moe. You say that there’s a 75% chance that there’s a significant difference between these two point estimates. **But this conclusion only makes sense if the true values represented by these point estimates are random. ** But that’s not the case. A “true” value is what it is and isn’t determined with probabilities.

Earlier you said:

This is essentially correct . But it’s important to remember that we have no way of knowing how our results will precisely distribute before we send out these surveys. Since true values are not random variables, there should be no expecation that they will distribute randomly within the confidence interval. If the true value is 49%, it’s possible that results will skew towards the right.

No problem. If I am not understanding you, please feel free to clarify.

It is not so much that is “wrong” so much as it is uninformative about the actual state of the world. The difference between two random variables is also a random variable. Its margin of error is calculated quite differently. As it turns out, it is much harder to call the spread than it is the winner, and a larger confidence interval around the difference reflects that. Handily, the wiki addresses this:

The fact that a spread falls within the margin of error does not tell us very much because we do not know how the point estimate errors are distributed in the first place. Maybe if you resample 99 more times you get almost the same results. Maybe if you sample 99 more times you get a large diversity of results, distributed uniformly about the interval. Maybe they skew one way or another. Since we do not know any of these things, we cannot infer that a difference inside the margin of error means that the survey is any muddier, or, worse, that it is somehow a “statistical tie”.

It is a tie if they both receive an identical percentage of support. Aside from that, there is no tie.

This thread went from sticking a fork in Hillary Clinton to one of the most dreaded courses to take on College History - statistics. Maeglin is technically right - however, MoE should be considered with a grain of salt.

From here. Good read if you have a second to skim it.

Ok, give me a little room for hyperbole here. All the same, I do not entirely agree with the below.

That may be true, but not because it makes sense. Very often the margin of error for the difference is so large that, as the wiki says, it can be larger that the confidence interval around any of the point estimates.

When two CIs overlap, you can pretty easily calculate the probability that one candidate is actually leading the other within the context of the survey. Even with a small spread, if your sample size is around 1000, you can quantify the impact of sampling error and determine how fuzzy the results are.

I am not convinced that with most close political surveys, there is enough distance to say with 95% confidence that the true difference is greater than zero, even when the difference is not within the margin of error of the point estimates. My argument is that the margin of error for a point estimate is not relevant to interpreting the difference, since this difference has its own margin of error and a probability of A leading B can be calculated using the point estimates and the sample size alone.

While this is true, I am not sure what practical difference is being made here. We are already in a world where we do not know the true value. We can make inferences based on a sample and use them to predict actual outcomes. I don’t think we are really confusing true values for point estimates here: the probability that A is ahead of B is still conditioned on the survey itself. We don’t know what the true value is, and we do not that estimates are affected by sampling error that creates some fuzziness around them. This probability just quantifies how fuzzy these estimates really are and does not really draw any inferences about true values.

Absolutely agreed, of course.

Heh. I could have made the joke about you being an old softy. :wink:

John D. Holum, who served in the Clinton Administration, first as Director of the U.S. Arms Control and Disarmament Agency and then as Under Secretary of State for Arms Control and International Security, isendorsing Barack Obama.

These kinds of defections have got to be killing the Clintons.

Wow that’s got to sting! :smiley:

I wonder what all these defections are actually doing to her morale and campaign?

Ack, that just killed me.

Nononononono.

There is a 95% chance that the interval 56% and 64% falls about the true value.

A small thing, fine, but if you are trying to teach people the basics of interpreting statistics, this is kind of a key issue.

I don’t see that much matters, though. You’re talking apples and oranges. The MoE of the difference would have to be larger spread, because it has to “account” for the extremes of the two CI’s being compared, right? (That’s how I conceptualize it, at least.) I don’t see why that changes anything. If the CI of the difference has a negative lower limit, you can infer non-significance. If the CIs of two point estimates overlap, you can also infer non-significance.

Using CI’s as a statistical test is done all the time in science, which is why I’m wondering why you think they aren’t meaningful when it comes to comparing poll results. The methodology is no different.

How? Doesn’t that assume that the true values are random and not fixed?

Yes, the difference has its own MoE, but it’s influenced by the same things that the MoE of the percentages are; namely, sample size. Which means these numbers aren’t produced without respect to one another; they are just different ways of reaching the same conclusion. If two CIs overlap, then I can’t see how the CI of the difference should NOT contain zero. Maybe I’d understand your position if you could provide an example of there being a disconnect between MoE of the difference and MoE of two point estimates.