Statistical Dead Heat

As we enter the election season, and are inundated with polls from all sides, we hear the term “statistical dead heat” more and more often. This is explained to mean that the candidates are within the margin of error for that poll. Newspeople and analysts frequently assert that this means that the candidates are essentially tied.

I believe that this is an error. Anyone studying statistics learns that a given statistical test is assigned a significance level, which is to say that the null hypothesis is not considered disproved until it is disproved with the given level of probability. Typically this is 90% or 95%. A result which falls below the significance level does not mean that the test has proved the null hypothesis, or even that the test says nothing about the data - merely that the null hypothesis cannot be disproved with 90% or 95% certainty.

Thus if Candidate A is shown to be 4% ahead of Candidate B, and the “statistical margin of error” for that survey is 4%, this does not mean that the survey shows the two candidates to be essentially tied. It merely means that it cannot be said with 90% certainty (if this is the standard used) that Candidate A is ahead. There is a probability of around 10% that the candidates are tied. There is also a probability of about 10% that Candidate A is actually 8% ahead. All in all, candidate A is in better shape, though not by a whole lot, obviously.

Furthermore, the margin of error only applies to the given poll itself. If other independent polls also show that candidate to ahead by a few points it lessens the possibility that the race is actually a tie, even if it is within the margin of error for each poll on its own. This was recently the case with the Hillary Clinton/Rick Lazio race for senate in NY. (The most recent poll shows a true dead heat).

I am therefore positing that the use of the term in politics is misleading, and came about as a result of a misunderstanding of statistical terms by non-statisticians.

I think it depends on who is using the term ‘dead heat.’ I am inclined to believe that the media uses it to heighten the tension of the race whereby it broadens its audience. Political groups put their own spin on every poll under the sun, so I don’t trust their analyses of polls anyway. The term is probably willfully misused rather than misunderstood.

Just my HO.

MR

I basically agree with the point made. The language of a “statistical dead heat” is misleading. A better phrase might be something which simply says that there is not evidence strong enough to convince us that we have identified the winner.

I will, however, pick a nit. In the OP, IzzyR does make a common mistake about probabilities when he says “there is a probability of around 10% that the candidates are tied. There is also a probability of about 10% that Candidate A is actually 8% ahead.” The actual standing of Candidate A vis a vis Candidate B is a matter of truth. It has one value (which I will call v). We don’t know what it is, but most standard understandings of probability would tell us that there is a probability of 1 that A is ahead of B by v, and a probability of 0 that A is ahead (or behind) B by any other amount. (If you want to get into Bayesian probability statements, then you need to start specifying prior expectations, etc., and I don’t really want to start down that road.)

Just had this beat into my head by stats profs, so I thought I would share the favor.

IzzyR wrote:

> Thus if Candidate A is shown to be 4% ahead of Candidate
> B, and the “statistical margin of error” for that survey
> is 4%, this does not mean that the survey shows the two
> candidates to be essentially tied. It merely means that
> it cannot be said with 90% certainty (if this is the
> standard used) that Candidate A is ahead. There is a
> probability of around 10% that the candidates are tied.
> There is also a probability of about 10% that Candidate A
> is actually 8% ahead. All in all, candidate A is in
> better shape, though not by a whole lot, obviously.

Not quite. Let’s go through this slowly. When it’s said that the margin of error in a poll is X, what that actually means is that X equals 2 standard deviations. What’s a standard deviation? Well, all you need to know is that if you say that something equals Y with a standard of deviation of Z, then the probability that the real value is in the range (Z-Y,Z+Y) is approximately 68.268%. Furthermore, the probability that the real value is in the range (Z-2Y,Z+2Y) is approximately 95.448%. That’s why, rounding slightly, one margin of error (= two standard deviations) is said to be 95% certainty.

Say that you are told that candidate A has 48% of the vote in the poll and that candidate B has 52% of the vote in the poll and that there’s a margin of error in the poll of 4%. Then you know that there’s a 95.448% probability that A’s percentage of the vote is between 44% and 52% and there’s a 95.448% probability that B’s percentage of the vote is between 48% and 56%.

So what’s the probability that candidate A’s share of the vote is really greater than B’s share? I don’t know. It’s a messy calculation. I tried to do an approximation of it just now, and I think that the probability that A’s share of the vote is really greater than B’s is somewhere between 10% and 20%.

In any case, if the candidates’ share of votes in the poll differs by less than the margin of error, you’re just saying that it’s no longer true that the two candidates’ probabilities of winning are different enough that a statistician would be willing to say that one will win rather than the other. It doesn’t mean that a statistician would predict that the vote will be a tie. A tie in something like a Presidential race where there’s tens of millions of votes is extremely improbable.

Let’s go through it even slower.

  1. All this stuff about 2 standard deviations is referring to a case in which the statistician has decided to give the test a 95% degree of certainty. In some instances, 90% is the standard used. The 2 standard deviations rule is also only valid for the normal distribution (which is frequently used as an approximation for other distributions). Other distributions do not follow this rule.

  2. Wendell, you are erring in looking at the percentages of support for the two candidates as two independent variables, and giving margins for error to each individually. Actually, in a political campaign, the levels of support for the candidates are interdependent, and one variable (p minus q) will be tested. If this variable is greater than zero with a degree of certainty greater than 95% (or 90%, depending on the level of significance chosen), than the null hypothesis (that the candidates are tied) will have been disproved. Thus the chance that the candidates are actually tied is equal to the “p value” for the test. If the candidates are at the border of the level of significance for this test (as in 4% apart with a 4% margin of error in a 90% significance test) than there is a 90% chance (or very close to it) that they are actually tied.

dorkbro, you have to learn how to avoid these professors when they are attempting to beat things into your head. I am aware of the issue that you describe, but it is of interest only to statistics professors (and to those taking courses given by statistics professors). For purposes of this discussion, my terminology suffices.

IzzyR wrote:

> In some instances, 90% is the standard used.

I’ll take your word for it on the rest of what you say, but every poll I’ve ever looked at said that it had a 95% margin of error.

Since you understand the statistics quite well (it appears), why are quibbling in the OP about how the public understands the meaning of the poll? They know perfectly well (within their mathematical ability, that is) what the term “statistical dead heat” means. They know that it means that the two candidates are pretty close, so close that the pollster doesn’t wish to commit himself to one or the other winning. What else is there that the average person should know?

That there is an overwhelming likelihood that the candidate in the lead is actually ahead. this is not how it is presented by the media.

> That there is an overwhelming likelihood that the
> candidate in the lead is actually ahead. this is not how
> it is presented by the media.

Except, of course, considering all those “undecided”, those answering the poll who are not actually registered to vote (despite what they say), etc.

Unless there is some reason to believe that those who are undecided or misleading the pollsters differ in some way from the rest of the sample, then the poll is still valid. (Admittedly a poll showing a lot of undecideds is somewhat weakened bacause of the possibility of the undecided sample being different).

This brings up another interesting issue. People are always complaining about the low voter turnout…30% or whatever. However, statistitically IF the 30% who vote are randomly distributed (a very big if), a higher voter turnout would have a very low likihood of changing the outcome. This is assuming that the “new” voters are also randomly distributed

Seeing as how you are devoting an entire thread to a nitpick of others’ language, I don’t see why you are misusing statistical language so badly. First of all, as dorkbro alluded to, the probability, whether it is 90% or 95% or whatever, refers not to the probability that the true value lies within the indicated range, but to the probability of getting a range that includes the true value. If you don’t understand what the difference between these two statements is, then you really don’t know enough about statistics to criticize others’ discussions of it.

Second of all, you have to decide beforehand what probability you’re going to use. You can’t decide on using 95%, then find that you only know with 90% certainty, and say “oh, well, that’s close enough”. If you consider 95% to be overwhelming evidence, and 94.9% not to be, then 90% is not overwhelming evidence. Trying to decide what would be overwhelming certainty after you already know what the certainty is is very bad statistics.

Furthermore, I do not see anything inappropiate about the term “statistical dead heat”. Suppose I view a horse race from a viewpoint that leaves me with a 1m uncertainty about where each horse is. And suppose that the two fastest horse come within .9 m of each other at the finish line. It is quite accurate for me to say that as far I know, they’re in a dead heat. The term “dead heat” does not mean that the two horses actually arrived at the finish line at the same exact time, within one Planck time of each other. It just means that we were unable to determine with adequate certainty which one was first.

The Ryan

I shall attempt to respond to this rediculous post in simple terms, so that even you might understand it. But I have to request that you pay careful attention to what you are reading, something that you have evidently not done to the rest of this thread.

dorkbro, to his great credit, said nothing of the sort. The probability of a range including a value and the probability of a value being within a range are the exact same thing. What dorkbro was saying was that the probability is not ITSELF a range - it has one true value. For example, suppose a test is being performed on a biased coin. The results of the test indicate that the probability of it’s coming up heads are between 60% and 70% with 95% probability. This does not mean that the true probability is “somewhere between 60% and 70%”. It only means that our test has failed to pin it down to a narrower range than the one given. The true value is a single point, not a range. Statisticians are fond of pointing this out to those seeking to understand the science of statistics. It is not relevent to this discussion. It might have helped had you understood what this discussion is about. Then again, it might not have.

I can’t fathom where you might have seen in anything I, or anyone else, has written, that you could decide after the test which standard to use. Again, I urge you to pay more attention before typing out all sorts of nonsense.

This ties in with your earlier remark that the purpose of this thread was to nitpick other’s language. I will now explain the purpose of this thread to you, so pay close attention.

Statistics is a science. It uses rigorous mathematical tests to reach conclusions. In this context there are levels of significance assigned to tests to determine the level of conclusiveness that they give. A significance level is chosen, and if it met than the null hypothesis is considered to be “disproved” based on the test. If it is not met, then the test has failed to meet the scientific standard assigned, and from this standpoint is inconclusive.

In the real world, such rigor is not used. When a person listens to the radio to hear whether it will rain, he is not interested in knowing whether there is a 95% likelihood that it will rain or any other scientific standard. If the weather guy is 80% sure that it will rain, he does not go out and say “I have no idea if it will rain or not - my model show that it may rain or may not rain”. Instead, he says that it is likely to rain. It has not been scientifically proven - big deal.

So too in politics. Political handicappers are constantly predicting the outcome of races, based on an assortment of factors. None of these predictions are being made to a scientific level of precision. This is not the language used in politics. It is also not relevent to politics. No one cares if such and such a candidate is “proved” to be ahead by some scientific standard. There is no threshold at which a political poll becoomes significant.

Now, and listen closely here, when people hear the term statistical dead heat, many people are not aware that the term refers to some scientific standard of proof. People interperet this in terms of political language commonly used, as if to say that based on polling data one candidate is not more likely to be ahead. The media also present it this way as well. This is not a language issue - it is a legitimate issue of two different meaning being confused with each other. It is this issue which the post addressed.

Are you unaware of the meaning of the word “allude”?

Well, they are equally meaningless. If you had actually paid attention, as you hypocritically exhorted me to do, you would have noticed that I said “probability of getting a range that includes the true value”, not “probability of a range including a value”.

Yes, it is.

If you have failed to make your thoughts on this matter clear, how is that my fault?

In response to Wendell Wagner’s question:

You replied:

Sure sounds like you decide on the standard of certainty after knowing the level of certainty to me.

This is a completely different situation.

Okay, so in terms of politics, it’s not a dead heat. So I can see why you’d have a problem with it being called a “political dead heat”. What I don’t understand is why you object to the term “statistical dead heat”. This phrase makes it quite clear that it is the statisticians’ standards that are being used, and I think that most people are aware that this is not the same as knowing absolutely nothing.

The fact that people don’t pay attention does not mean that it is irrelevant.

Well, I do.

Sure there is: at the significance level.

I think that the vast majority of people are aware that statisticians have very different standards than the normal populace, even if they don’t know what they are.

Your OP did not make a clear distinction between the term itself and the use of the term. If, in your experience, the latter is flawed, then I will not challenge your subjective experience. But if you believe that the former is incorrect, then I disagree.

The Ryan,

Careful analysis of your latest post reveals that you are under the mistaken impression that your previous post was not completely silly. That’s great. Any further debate would only concern what I/you said/meant, a pointless exercise, in my opinion. So we can leave it here.

Tretiak,

I think much of the complaining has to do with the lack of civic awareness and responsibility, more than the possibility that the outcome would have been different. However, it is also thought that the turnout among blacks and other minorities is lower than among the overall population, which can affect the outcome.

I’m glad I read this, because it refreshed my memory of some of my old stats classes. OK, so if someone is shown to be ahead, regardless of the margin of error, they are most likely ahead. What would be really useful is if they told us how often their polls are correct.

PeeQueue

PeeQueue

It would be difficult, if not impossible to do this. A poll only measures public sentiment on the day it was taken. Public sentiment can shift over the last few days, especially in races to which the public has not been paying too much attention during the campaign. Therefore, when the poll fails to correctly predict the results of the election, it is hard to sort out how much of the discrepancy is due to the poll being off the mark, and how much due to last minute shifts in public opinion.

Part of the reason that a newspaper or TV network wouldn’t want to call a result that’s within the margin of error anything other than a dead heat is that they would have to explain afterwards if the candidate that they didn’t pick wins the election. If the probability that candidate A will win the election is 95%, they don’t consider that good enough to make an outright call that A will win. They don’t want to take the heat afterward in the 5% of the cases where they are wrong. They might say that A will probably win, but it’s too close to be certain. On the other hand, if the probability is 99.99% that candidate A will win, they’re willing to risk the .01% of the time when they’re wrong. In some sense, this is cowardliness on the part of the media, but think of how much people would complain if the polls were wrong some substantial portion of the time.

Hey IzzyR-maybe you can clear 2 things up for me.

  1. I always thought that a null hypothesis was ‘not accepted’ instead of ‘disproved’.

  2. Quote: “The probability of a range including a value and the probability of a value being within a range are the exact same thing.”

Introduction to Biostatistics, Sokal and Rohlf, 2nd Ed., Page 106:
“When we have set lower and upper limits (L1 and L2, respectively) to a statistic, we imply that the probability that this interval covers the mean is, for example, 0.95, or, expressed in another way, that on the average 95 out of 100 confidence intervals similarly would cover the mean. We CANNOT STATE that there is a probability of 0.95 that the true mean is contained within a pair of given confidence limits, although this may seem to say the same thing.”

Maybe I’m not reading you right, but it seems the guys who wrote this text would differ with you.

Uh oh. See what happens when you say something, then ignore a thread.

I think IzzyR and TheRyan were kind of talking past each other here. The basic point I tried to raise is that a statement about probabilities can only be made about the sample statistics. The statement “90% of the time we take a sample, a range constructed following these rules will include the true value of the population parameter.” is valid. (I think this is a long way of describing the notion of a “confidence interval”). If we say that “There is a 90% chance that the true population value is X”, we seem to be making the probability statement about the population parameter, not the sample statistic, and therefore most statisticians I have heard see it as invalid. It looks to me like both IzzyR and TheRyan basically see the important point, but are getting caught up in language games about how to phrase the statistically valid statement. The sane point is made again in 647’s quote.

On the whole, I am not sure how the picky language here really relates to the substance of IzzyR’s main point - or at least what I understood it to be. I would agree that “dead heat” to most readers probably looks a lot more like “we believe that the margin is VERY small” than “we only want to make a statement when the evidence is strong, and we aren’t ready to make a statement yet.” The range of circumstances when statistical theory leads us to the second statement is probably much wider than the range of circumstances when folks would expect people to use the first. And, of course, we know that there will be occasions when the first is true, but we don’t say the second.

(I will admit to some sympathy with the “one shouldn’t nitpick other’s language when you are sliding over a few things yourself” point - but use of language that is statistically legitimate is really awkward, and doesn’t really affect the main point above.)

Maybe we would be better off if the press would say something like – “Candidate A received the support of 47% of our sample, and Candidate B received the support of 44%, with 9% undecided. Since the margin of error for this poll is 4%, we are too hung up with our reputation to risk ruining it by saying that we are sure who is ahead.” But somehow, I don’t think that is going to happen any time soon.

In addition, I think a case could be made that the press is calling the race too easily in many cases. Most people are far more interested in the question “Who will win in November?”, than “Who is ahead today?” That is probably how an awful lot of the readers will interpret the story - and if that is the case, they are using entirely the wrong error value. Given some of the dramatic swings in these sorts of polls over a campaign season (see, in particular, the polls during the Dukakis-Bush campaign), the error associated with using any poll result from January-September as a predictor for the fall election is MUCH bigger than just the sampling error for the one poll. If the press is really worried about Wendell Wagner’s point, they should be using a whole different criterion.

And on PeeQueue’s point - we have a sort of answer to this question. For identifying the precise value of a population parameter (accurate to infinitely many decimal places), our sample statistic has a probability of dang near zero (got to love that technical language) of being right. In 90% of our samples, a range constructed by following the rules for the 90% confidence interval will include the true sample statistic. If we use a p=.10 threshold for our significance tests, in 10% of the cases in which the null hypothesis is true, we will reject the null hypothesis. We cannot determine, without further assumptions, how often we would falsely accept the null hypothesis.

647,

  1. The typical language used is “accepted” or “rejected”.

  2. Imagine a coin which has been altered so that the likelihood of it’s coming up heads is 60%. As we stand, ready to flip the coin, we can say that there is a 60% probability that it will come up heads. Suppose we have flipped the coin and not yet looked at the result. A bettor would give 60% odds that it is indeed heads. But suppose another person has seen the results and knows which one it is. From that person’s perspective, there are no probabilities - it is either heads or tails.

A statistician would say that even supposing no one else had seen the coin it is still not accurate to say that there is a 60% probability of it’s being heads. That terminology is reserved for the situation prior to the toss, when it could come up either way. After the toss, it is either one or the other - it merely from the perspective of the viewer who lacks knowledge of the facts that the question exists.

So too it is with confidence intervals. The true parameter is one number. It exists. It is not valid (from a statistical standpoint) to speak of the probability of the parameter being this number or that number, as if it may become any of these numbers. Thus, once the test is done, a statistician would not say that the parameter has this or that probability of being in the range - it is either in or out. For this reason, the concept of confidence intervals is phrased in terms of the percentages of such tests that will include the parameter value, or the prior probability of getting the parameter value in the range.

But as a practical matter, as with a gambler who gives odds on the results of a coin toss that has already been tossed, there’s no difference.