Stick a fork in her: Clinton's done

Richard_Parker · February 26, 2008, 7:01pm

Maeglin:

RT is absolutely right. The fact that this confuses people is due both to the incredibly sloppy way the media reports statistics and due to widespread innumeracy.

A poll result for either candidate is a random variable due to the fact that it is derived from a sample. The margin of error reflects the magnitude of sampling error around one predicted random variable. If your survey says 45% for Obama and you repeat the survey 100 times, 95 surveys will return results from 41 to 49.

I repeat, this margin of error is only around individual point predictions, not around the difference between them. This difference is also a random variable whose margin of error can be calculated, but this calculation is more complicated. What is usually good enough for me is to calculate the probability that one candidate actually leads another given two point estimates and a number of observations in the survey.

This is why the idea of a “statistical tie” is such a load of malarky. Suppose Obama and Clinton are at 48 and 46, respectively, and the sample size is 800, and the margin of error is about 4%. Given these numbers, there is about a 75% chance that Obama is actually leading Clinton, regardless of the margin of error around each of the individual point estimates. The fact that both candidates’ point estimates are within the margin of error of each other is totally irrelevant.

There are two things commonly referred to as “margin of error.” One is a confidence interval for individual statistics within a poll. The other is an overall error. My understanding is that the media usually reports the latter, not the former. In this case, I assumed SurveyUSA reported the latter because they themselves said the overall difference was outside the MoE. Can you explain where I’ve gone wrong in that reasoning?

BobLibDem · February 26, 2008, 7:15pm

I didn’t major in statistics, so I’d like someone to confirm my understanding:

If a poll has A at 53% and B at 43% with a MOE of 5%, doesn’t that mean that A is most likely between 48% and 58% and B is most likely between 38% and 48%? So A’s lead over B could be from 0 to 20%? Or am I totally wrong?

SparrowHawk · February 26, 2008, 7:15pm

The contribution questions gave me pause, too. Because the charge card is in my name but I don’t personally have anything resembling taxable income, my husband earns all our money and he’s a foreign national and gah… I figure it’s OUR money. We’re not talking about a lot of money, here, either. I’m not in trouble, am I?

Spoke · February 26, 2008, 8:05pm

Looks like John Lewis’s on-again, off-again flip from Hillary to Obama is on again.

RedFury · February 26, 2008, 8:05pm

Got it.

TY.

Maeglin · February 26, 2008, 8:19pm

There is no such thing as an “overall error”. The “margin of error” has a very specific meaning and can be calculated at various levels of confidence for each random variable. Typically, when the media reports a margin of error, it is calculated at the 95% level of confidence. The random variables captured by the survey are the inferences that you draw from the sample, that is, 45% of the country supports a particular candidate.

The margins of error about each of the survey estimates are the same because the margin of error is a function of the sample size. Obviously, the sample size is the same regardless of the preferences of the individuals surveyed.

Again, the margin of error just reflects the sampling error inherent in the estimate, not any other type of error like selection bias.

No, this is not what it means.

There is a true population value out there somewhere. You do a survey because you can make inferences about the true value using a sample as long as you follow a few rules. The true value is not the variable; the number estimated by the survey is the variable. The true value is theoretically fixed, but the survey value can change depending how you cut the sample. This is the essential definition of sampling error. You are always going to get outliers, you are always going to get abnormalities, you are always going to get noise.

What the margin of error tells us is that it is 95% likely that the confidence interval 48% to 58% falls about the true value based on the vagaries of sampling alone. This says absolutely nothing about the difference between A and B, and it certainly does not mean that if you sampled the same population 100 times, the results for a particular candidate would be distributed uniformly about the confidence interval. You could do the same poll 100 times and get almost exactly the same results. The fact that margin of error exists does not in any way invalidate the results of the poll. Almost no additional information is conveyed when candidates lead each other “within the margin of error”.

In order to reduce the margin of error, all you have to do is increase the sample size. However, there are seriously diminishing marginal returns. There is a huge reduction in the margin of error when you increase the sample from 500 to 1000, but a relatively modest one when you go from 1000 to 10,000.

I think that a huge amount of confusion and misinterpretation of survey results, and accordingly individual voting logic, is driven by misunderstanding of these concepts.

Richard_Parker · February 26, 2008, 8:28pm

Of course, MoE has a particular mathematical definition. But as applied to a poll, there are different MoE’s for different parts.

Margin of error - Wikipedia

The one being reported by SurveryUSA, as I understand, is the overall MoE as calculated based on the number of LV’s (as opposed to the MoE of one of the crosstab statistics).

What hasn’t been explained to me sufficiently is why we’re not talking about a single, binary random variable: do you support BHO or HRC?

Richard_Parker · February 26, 2008, 8:34pm

Missed the edit. Let me word my question more precisely: Why is this not just one sample in which you draw a certain amount of Obamas and a certain amount of Clintons, thus yielding one MoE that is the one reported?

Maeglin · February 26, 2008, 8:40pm

In this case, these would be exactly the same thing, as I mentioned above. The number of respondents is the same. These numbers only differ if the number of respondents to particular questions or parts of the survey differ. With respect to interpretation, it is a distinction without a difference.

Also from the wiki:

This is not an “overall error” that is in any way different than a point estimate margin of error. This is just sloppy, misleading reporting.

I do not understand your confusion. This has nothing to do with how many survey questions you ask. The purpose of the survey is to infer from the sample how many people support each candidate. You can do this with one survey question; you can do this with ten. That does not change the survey’s output, which is, obviously, two numbers.

Each is a random variable because each is subject to change if you recut the sample and repeat the survey.

Read the rest of the wikipedia article. It is actually pretty informative.

Maeglin · February 26, 2008, 8:45pm

And this is absolutely correct provided that the number of respondents is the same. This will be true in a poll like this almost by construction. But in surveys with multiple questions, lines of inquiry, etc, you can have a lot of people refusing to answer certain questions. This makes the margin of error different for inferences driven by those questions.

Richard_Parker · February 26, 2008, 8:46pm

Ok, so if it is absolutely correct, why is it wrong to say that a 4 point difference is outside the 3.8 MoE?

ETA: I’m not disagreeing, just trying to get educated.

you_with_the_face · February 26, 2008, 8:48pm

“Totally irrelevant” is wrong.

Poll results represent point estimates because of inherent sampling fuzziness. In reality, we should really be speaking in terms of confidence intervals instead of treating these things as absolutes as we tend to do. When two CIs overlap, we really can not conclude a significant difference exists between them and keep our credibility. The reason is due to there not being enough distance between them to say with 95% confidence that the true difference is greater than zero. We shouldn’t be discarding the importance of this certainty by declaring it “totally irrelevant” when we want to. None of us know the “true value” these point estimates are supposed to represent.

I’ll illustrate my point by taking the example you gave with the 48% (A) and 46% (B) point estimates and a 4% moe. You say that there’s a 75% chance that there’s a significant difference between these two point estimates. **But this conclusion only makes sense if the true values represented by these point estimates are random. ** But that’s not the case. A “true” value is what it is and isn’t determined with probabilities.

Earlier you said:

This is essentially correct . But it’s important to remember that we have no way of knowing how our results will precisely distribute before we send out these surveys. Since true values are not random variables, there should be no expecation that they will distribute randomly within the confidence interval. If the true value is 49%, it’s possible that results will skew towards the right.

Maeglin · February 26, 2008, 8:56pm

No problem. If I am not understanding you, please feel free to clarify.

It is not so much that is “wrong” so much as it is uninformative about the actual state of the world. The difference between two random variables is also a random variable. Its margin of error is calculated quite differently. As it turns out, it is much harder to call the spread than it is the winner, and a larger confidence interval around the difference reflects that. Handily, the wiki addresses this:

The fact that a spread falls within the margin of error does not tell us very much because we do not know how the point estimate errors are distributed in the first place. Maybe if you resample 99 more times you get almost the same results. Maybe if you sample 99 more times you get a large diversity of results, distributed uniformly about the interval. Maybe they skew one way or another. Since we do not know any of these things, we cannot infer that a difference inside the margin of error means that the survey is any muddier, or, worse, that it is somehow a “statistical tie”.

It is a tie if they both receive an identical percentage of support. Aside from that, there is no tie.

Phlosphr · February 26, 2008, 9:09pm

This thread went from sticking a fork in Hillary Clinton to one of the most dreaded courses to take on College History - statistics. Maeglin is technically right - however, MoE should be considered with a grain of salt.

From here. Good read if you have a second to skim it.

Maeglin · February 26, 2008, 9:10pm

Ok, give me a little room for hyperbole here. All the same, I do not entirely agree with the below.

That may be true, but not because it makes sense. Very often the margin of error for the difference is so large that, as the wiki says, it can be larger that the confidence interval around any of the point estimates.

When two CIs overlap, you can pretty easily calculate the probability that one candidate is actually leading the other within the context of the survey. Even with a small spread, if your sample size is around 1000, you can quantify the impact of sampling error and determine how fuzzy the results are.

I am not convinced that with most close political surveys, there is enough distance to say with 95% confidence that the true difference is greater than zero, even when the difference is not within the margin of error of the point estimates. My argument is that the margin of error for a point estimate is not relevant to interpreting the difference, since this difference has its own margin of error and a probability of A leading B can be calculated using the point estimates and the sample size alone.

While this is true, I am not sure what practical difference is being made here. We are already in a world where we do not know the true value. We can make inferences based on a sample and use them to predict actual outcomes. I don’t think we are really confusing true values for point estimates here: the probability that A is ahead of B is still conditioned on the survey itself. We don’t know what the true value is, and we do not that estimates are affected by sampling error that creates some fuzziness around them. This probability just quantifies how fuzzy these estimates really are and does not really draw any inferences about true values.

Absolutely agreed, of course.

EddyTeddyFreddy · February 26, 2008, 9:14pm

Heh. I could have made the joke about you being an old softy.

Shayna · February 26, 2008, 9:15pm

John D. Holum, who served in the Clinton Administration, first as Director of the U.S. Arms Control and Disarmament Agency and then as Under Secretary of State for Arms Control and International Security, isendorsing Barack Obama.

http://www.huffingtonpost.com/john-d-holum/foreign-policy-why-obam_b_88359.html?view=print

As a long time friend of the Clintons and a member of President Clinton’s foreign policy team, I naturally assumed I’d be firmly in Senator Clinton’s camp in 2008. Instead, by last fall I’d become an enthusiastic supporter of Barack Obama. Why?

There are two good reasons: The first comes from who Barack Obama is.

. . .

In short, we’ll have to re-connect with the world, through means other than arms and bluster.

The election of Barack Obama will, in and of itself, jump-start those endeavors. His heritage and extraordinary life story will capture the imagination of people all over the world, and be seen as a confirmation, more powerful than any words, that America has returned to our best ideals. In one stroke, it will propel us out of the hole Bush has dug for us and onto the high ground, where we can engage from strength and respect.

The second reason for supporting Obama is change – a word lately so widely cribbed and overused as to be nearly drained of meaning. But in Obama’s case it carries profound content – indeed, on some of the very issues on which he’s been assailed, he’s shown a way of looking at foreign and national security policy that breaks through tired old talking points and opens up new avenues for progress.

. . .Senator Obama said in a debate he’d be prepared to talk directly with the heads of rogue states. . .

. . .Obama suggested he wouldn’t brandish nuclear weapons against Osama bin Laden. . .

. . .Senator Obama said if we knew where bin Laden was hiding and Pakistan’s leader wouldn’t allow us to go after him, the U.S. would act on its own. . .

. . .On the spread of nuclear weapons, Senator Obama has grasped the core truth that to enforce the global agreement against proliferation, the U.S. must live up to our side of the bargain and move toward a world entirely free of such weapons. . .

On each of these issues the other candidates and foreign policy experts have become increasingly receptive to Senator Obama’s views. But as with his 2002 opposition to the Iraq war, it has been Barack Obama demonstrating the judgment, foresight and courage to lead the way.

In sum, because of both who he is and what he believes, Senator Obama offers the hope of a rapid recovery from the Bush years, and a liberation from the foreign policy conformity that too often holds us back. He is our best hope for not just the terminology of “change,” but the reality – and embodies an opportunity America cannot afford to pass by.

These kinds of defections have got to be killing the Clintons.

Phlosphr · February 26, 2008, 9:19pm

Wow that’s got to sting!

I wonder what all these defections are actually doing to her morale and campaign?

Maeglin · February 26, 2008, 9:19pm

Ack, that just killed me.

Nononononono.

There is a 95% chance that the interval 56% and 64% falls about the true value.

A small thing, fine, but if you are trying to teach people the basics of interpreting statistics, this is kind of a key issue.

you_with_the_face · February 26, 2008, 9:35pm

I don’t see that much matters, though. You’re talking apples and oranges. The MoE of the difference would have to be larger spread, because it has to “account” for the extremes of the two CI’s being compared, right? (That’s how I conceptualize it, at least.) I don’t see why that changes anything. If the CI of the difference has a negative lower limit, you can infer non-significance. If the CIs of two point estimates overlap, you can also infer non-significance.

Using CI’s as a statistical test is done all the time in science, which is why I’m wondering why you think they aren’t meaningful when it comes to comparing poll results. The methodology is no different.

How? Doesn’t that assume that the true values are random and not fixed?

Yes, the difference has its own MoE, but it’s influenced by the same things that the MoE of the percentages are; namely, sample size. Which means these numbers aren’t produced without respect to one another; they are just different ways of reaching the same conclusion. If two CIs overlap, then I can’t see how the CI of the difference should NOT contain zero. Maybe I’d understand your position if you could provide an example of there being a disconnect between MoE of the difference and MoE of two point estimates.

Topic		Replies	Views
Leave the fork in her: Clinton's still done Great Debates	494	17979	March 16, 2008
Hillary is going to STEAL the democratic nomination, mark my words!! Great Debates	213	8223	April 2, 2008
how could Hillary possible win (without cheating) ? Great Debates	96	5244	June 3, 2008
Obama leads in delegate count Great Debates	210	9944	February 18, 2008
Hillary: Screw the voters, crown me! The BBQ Pit	48	3600	February 12, 2008

Stick a fork in her: Clinton's done

Related topics