Survey Response Error: Variance v. Bias (Another Election Thread)

Adding further confusion - a researcher at Yale has published preliminary results of a study suggesting that the media’s ‘early call’ of Florida may have resulted in Bush losing as many as 10,000 votes on the Panhandle (I don’t have a cite - I heard it this morning on CNN). If true, then we have another bias with a definable cause.

And in Wisconsin, which is very close, there are a number of allegations surfacing regarding vote fixing by Democrats (for instance, there is strong evidence that Democrats paid homeless people in cigarettes to vote for Gore).

Any way you look at it, this election was screwed up. We’ll never know the will of the people because of the legal wrangling, the media screw-up, etc. Gore got lucky in that one of the screwups happened to be in an area where he was able to mount legal challenges, and where the electoral vote was very close.

Regardless of who is elected, we’ll knew know who the real choice of the people was. There are just too many irregularities surrounding a very thin margin for that to happen.

This sounds nuts. From what I’ve heard, the media did not call it until 7:49, about 10 minutes until the voting ended. Anyone planning to vote would have to have been at or almost at the polling place already. The idea that masses of people who already made the effort to get out and vote would head home and ignore both the presidential and (uncalled) local races is, to my mind, absurd. I’d be surprised if it cost Bush 10 votes.

Anyway, there is no comparison between bias in network reporting (which, as we know, is there anyway) and bias in election ballotting, which must be unbiased or lose the integrity of the voting process. (FTR, I don’t buy into the idea of overturning the PBC election on bias grounds. I am merely responding to your specific arguments).

Regarding the OP, I occupy a middle ground between Uncle Beer and RTF in that I think the ballot was not biased against Gore voters, but was biased against Gore himself.

I don’t think it can be counted as bias against the voters, because anyone who put a minimal amount of time and thought into what he was doing would figure it out. I don’t think a person can claim that the election board is required to make sure that he does not have to at all think about what he is doing when he votes.

But the ballot was biased against Gore. Because likely, an equivalent percentage of both Gore voters and Bush voters were likely to be confused or lazy. This ballot was designed such that the confused and lazy Gore voters would be misled, while the equally confused and lazy Bush voters would not.

Having said that, I don’t see how you could overturn an election on such grounds, as the bias was so small and was clearly not intentional fraud.

My point is that we are indeed being selective in our scrutiny. I have read elsewhere on this board that Duval County, a heavily Republican county, had over 26,000 votes disqualified in this election. This, like Palm County’s results when considered against its results in the '96 election, is not necessarily the statistical aberration it seems at first glance. But I don’t expect Jesse Jackson to be leading any rallies there anytime soon over the poor souls who are being deprived of their voice in the democratic process. And by ignoring Duval while ranting over the “injustice” obvious in Palm County, we are undoubtedly installing a bias in the overall results.

You can argue that it’s not Gore’s responsibility to pursue the correction of this bias, and perhaps you’d be right. But it still doesn’t support the notion that “fixing” Palm County to the exclusion of the rest of the state–where unintentional biases undoubtedly exist–applies anything but a false precision. Why wouldn’t probability suggest that unintentional bias is more-or-less evenly distributed? Why wouldn’t it be? You state that the known bias in Palm Beach County has no relationship to any bias in the rest of the state. You go on to state that the unknown biases likely cancel each other out–so long as we correct the Palm Beach bias. Please explain that bit of statistical magic to me.

Just to clarify–the 19,000 double votes in Palm Beach have frequently been compared to the 15,000 votes thrown out in 1996. In fact, the 15,000 was all of the votes thrown out, not just the doubles. There were 30,000 thrown out altogether this time. So it is an aberration.

Flip 12 coins. You see two of them that are heads and take them away. Are the remaining coins more likely to be mostly tails? No, they’re not.

Ideally, the unintentional biases do cancel one another out. But just because there’s a glaring error on one side doesn’t mean there has to be a corresponding error on the other side. That said, I would support a full state recount, and I wish Bush had called for his own recounts instead of trying to buy the pot early on.

Dr. J

That bit of statistical magic is known as ‘statistical independence’. Events A and B are independent if one happening doesn’t make the other any more or less likely: that is, if P(A) = P(A|B). There is no reason to expect that there’s any causal connection what happened in Palm Beach and the possibility of bias in any other FL county. So we assume independence, unless you can produce an argument to the contrary.

So let B = event of bias in PB county, and
let A1 thru A66 be the event of bias in each of FL’s other 66 counties. P(A1|B) = P(A1), P(A2|B) = P(A2), …, P(A66|B) = P(A66). Got it, so far? Now, we could even put some multipliers on here: Di = direction of bias (+1 = for Gore, -1 = for Bush) in county i; Xi = strength of bias in county i. So the total bias of the system, excluding PB county, is the sum S = D1X1P(A1) + D2X2P(A2) + … + D66X66P(A66).

Now, is there a reason in the world why the mathematical expectation of S, E(S), should be anything but zero at this point?

I mean, we don’t really know enough to have any reasonable discussion of expectation, but to the extent that we try, we have to have equal expectation that S will be positive or negative, because we don’t have a clue to the contrary. The only reasonable null hypothesis, if we know nothing about the possible biases in county i, is that E(DiXiP(Ai)) = 0.

Right now, assuming I’m correct about the bias in Palm Beach County, our expectation for Florida is negative: the expectation of the sum is the sum of the individual expected values, and all of 'em are zero except PBC, which is biased toward Bush. You need to fix that bias in order to bring the expectation back to zero for the state as a whole.

This is a total hijack (of my own thread, yet ;)), but I’vbe been wondering for awhile how the typeface occasionally shifts from the standard typeface (which appears on my screen as some sort of sans-serif typeface, perhaps Arial), and a very different one (which I see as Times New Roman or one of its close relatives - like the font I see in the little box where we type our posts).

Since it just happened in my last post, and I have no idea how I did it - anyone got any clues?

Now, back to our statistical debate. :slight_smile:

Firefly:

This isn’t as tough as you are making it out. If you toss the coin six times, and it comes up heads six times in a row , the chances for the 7th toss are still 50/50. We know this (most of us.)

But, if you are tossing the coin 100,000 times, sometimes you catch it, sometimes you let it hit the floor, the coin rolls under a desk so you have to use another one, your right hand gets tired so you flip with the left, etc etc. What you have is a crappy sample.

In analyzing the data you can’t just decide, well there were a lot of heads between tosses 5,500 and 6,000 and do them over.

If you do the tosses on a shag rug, and sometimes the coin is caught, but other times it lands in the rug kind of on edge, and you toss those results out, you can’t just decide that those were the reasons for the discrepancy and add those results back in just for tosses 5,500-6,000. You are contaminating the results.

When you work with large amounts of data you know that statistical error is built in. You cannot selectively wash the data. If you are going to apply a filter, or add something into the calculation you have to do it to the entire sample to maintain integrity.

Statistically, you probably shouldn’t play with the Florida data. Surely you can’t just play with the PBC data.

This is basic stuff.

Now the Gore suggestion to handcount everything in Florida is a better idea than just PBC, but undesirable to Bush, and also undesirable to the countrywide vote since the standard is not applied equally.

I must confess I’m puzzled about that one, Sam. I mean, poll closing time in FL was 7pm, and 7pm in the CST portion of the FL panhandle meant 8pm Eastern. They didn’t call FL until shortly before 8pm EST. For the call to have an effect, Panhandle voters would have pretty much had to hear the news on the way to the polls, and turn around and head home.

Any bias there was would have been pro-Gore, since as it was, the CST counties cast about 350K votes,of which 240K were for Bush. 10,000 votes represents 3% of what would have been the total, and I think it’s reasonable to assume that they may have had that many voters coming in, in the last 10 minutes before closing. But to lose 10,000 votes would presuppose a much larger number attempting to vote during those last ten minutes. I’m not saying that no votes were lost as a result; I’m just saying that, without seeing their work, I’d have to question that the effect was nearly as pronounced as these guys say.

But given the ratio of Bush to Gore votes up there, for every 1000 votes lost, that’s a net loss to Bush of about 400.

Scylla - of course you can’t decide, ‘oh, there were too many damned heads between flips 5500 and 6000’ and toss them, unless there were some clear factor that was causing the coins to come up heads more often during that sequence.

For instance, if someone switched coins with me at flip 5500, so that I was flipping a coin weighted to come up heads 90% of the time, I could certainly toss out the flips from 5500 to whenever I noticed that the coin was weighted. (Which I’d notice well before flip 6000, thanks. In 100 flips, 90 heads is 8 standard deviations above the mean for an honest coin.) Then I’d go back to my experiment, starting with flip 5501 again.

RTFirefly:

Even if you did what you say, you still risk further contamination. I’d probably accept the results though.

But, that’s not what’s being done in PBC Florida. There reexamining discarded results, to correct a perceived error that supposedly through off the data. While these two things are likely, you just can’t go adding in data.

Suppose 50 other people conducted this experiment, and only you tried to correct your data?

Some of those other guys may have done some flips with biased coins. Unless you reexamine those and add in their discards in exactly the same way your doing yours, you are throwing the final results off.

You just can’t do it, pal.

But we do know about the possible biases outside of Palm Beach County! Does not the existence of 26,000+ disqualified votes in heavily Republican Duval County at least suggest that E(S) is not zero? If the 19,000 disqualified votes and the percentage voting for Buchanan inclines you to believe a bias exists in Palm Beach County, how difficult is it to conduct the same level of analysis across all counties? If mathematical precision is the goal, why not re-count all the counties?

There seems to be only one reason why Palm Beach County is getting such attention from the Dems, and it ain’t because its bias is so marked in comparison to other counties (at least as far as disqualified ballots go). It’s because there are only particular biases that they’d like eliminated–a sound strategy, I think, but it doesn’t exactly make them the patron saints of statistical precision. Again, I’ll rethink my position when I see Jesse leading the rally in Duval County.

Yes, they are. Prior to taking out the two heads, there is about a 40% chance that there are more tails than heads. After taking out the two heads, there is about a 60% chance that there are more tails than heads in the remaining coins.

Math follows:

Possible different results with 12 flips: 4096.
Possible results with # of tails:
0: 1
1: 12
2: (12 * 11) / 2 = 66
3: (12 * 11 * 10) / (3 * 2) = 220
4: (12 * 11 * 10 * 9) / (4 * 3 * 2) = 495
5: (12 * 11 * 10 * 9 * 8) / (5 * 4 * 3 * 2) = 792
6: (12 * 11 * 10 * 9 * 8 * 7) / (6 * 5 * 4 * 3 * 2) = 924
7: (12 * 11 * 10 * 9 * 8) / (5 * 4 * 3 * 2) = 792
8: (12 * 11 * 10 * 9) / (4 * 3 * 2) = 495
9: (12 * 11 * 10) / (3 * 2) = 220
10: (12 * 11) / 2 = 66
11: 12
12: 1
Chance that there are two heads in the twelve flips: 4083/4096
Chance that there are seven or more tails when there are at least two heads in the twelve flips: 1586/4083
Chance that there are six or more tails after taking out two heads results from the twelve flips: 2510/4083

Getting back onto the original subject, I feel that it does appear that there is both anecdotal evidence and a statistically significant likelihood that there was a bias towards Buchanan in the PBC vote, and that the votes mostly came out of votes for Gore.

As for what to do about it, I feel Florida law should be followed, and I am content to let the Bush and Gore lawyers argue it out in the Florida State court system.

Personally, I am mostly disturbed by reports that, when questioned by voters, the election officials and/or volunteers at some polling places told them they couldn’t get a new voting form if they had made a mistake.

Maybe we did. Maybe ballots differed, one from another, in their placement in the spine with the holes. Maybe there was even ‘play’, with the relative positions of paper and holes not solidly fixed. I don’t know.

Why not?

We’re not arguing over whether the voters should have been confused; we’re debating whether they could have been confused. I take it you’re conceding that one.

Same point as above. Lots of people probably don’t know, right now, that you don’t cast one vote for each, but vote for them as a ticket - despite the fact that they did this just a week ago. There’s no law saying you have to be smart to vote.

How you correct it with ballot design is pretty obvious: you make it so it doesn’t appear that there are a pair of holes opposite the Pres/VP ticket. No huhu there.

I’ve heard (but haven’t checked, myself) that there was an unusually high Socialist vote count in PBC as well.

It’s quite possible that Bush voters were equally cavalier. The problem, as I explained in the OP, is that - unlike Gore voters - the ballot design didn’t make them pay for their lack of care. That’s the bias.

I’ve posted to this a half dozen times in the past five or six days, but once more into the breach. So what if a Democrat approved it? That wasn’t the problem. The problem was that there are apparently no standards in this field, so the lay people who approve this stuff, regardless of party, have nothing but common sense to help them understand whether or not they’re introducing an element of bias into the ballot. And once you’ve seen the incredibly minute things that can swing the responses in a questionnaire by 30% or so, you start to realize that common sense isn’t enough.

Well, no. It means it’s not the strongest evidence in the world. But fact is, we can remember stuff, and if any memory floating around our brains were equally likely to be true or false, then we wouldn’t be able to find our way to work in the morning without a map.

Anecdotal evidence is evidence. I’m willing to concede that it’s a weak form of evidence, but if it’s no good at all, then we’d better toss out our criminal justice system, which regards eyewitness testimony rather highly, and start over.

In this instance, we appear to have had at least a middling handful of people arrive, independently of one another and before the story broke, at the conclusion that they misvoted in a particular manner. (After that, of course, the numbers mushroomed.) Testimony of independent witnesses who all experienced the same thing means something. Now if we look more closely and find that one person thought this happened to him, and everyone else who claimed to have this experience overheard the first guy’s story before they told anyone else, then you’d have a point.

Bob, what you’re saying is that the 26K disqualified ballots in Duval constitute evidence of bias, by themselves. If you are, you need to explain on what basis you can make that claim. I am not saying that about the disqualifieds in PBC.

I’m saying that, given that we alrady have other evidence of bias in PBC, we ought to count the various permutations of double-punched, disqualified ballots in PBC to see if they produce a more definitive piece of evidence (which I’ve already described). That’s very different.

I believe it’s related to an excess </b> token in the text stream. When you’re editing the post, you’ll see the < and > replaced by the square braces.

I agree that that’s not what they’re doing with the manual recount. I’m not trying to defend that; I go back and forth about whether I can justify that to myself, let alone to sharp critics like you and Unc. But that’s another tory, too long for this thread when it’s way past my bedtime.

Same defense as with FL itself works: those other 50 guys may have had biased coins snuck in on them, but until we know of a particular for-instance of bias, we don’t know who the bias would favor, or how strong it might be. In the absence of any actual information, the expectation of the total bias in any given state is zero: could be positive, could be negative; we dunno.

Can too. :wink: :: D&R ::

Damn…I meant to check up on the formula for the Poisson distribution today at work and forgot to. Okay, well, I am confident about the first part in deciding how many sigma it is away:

What you have are N=431,000 trials with a probability of an event to occur of p = 0.0026. Now, the mean value number of occurrences is, of course, simply Np or 1120. The standard deviation for a Poisson-distributed event is sqrt[Np*(1-p)] = 33.5 (which, in this case, since p is so small, is basically just the square root of the mean…and, in fact, people often roughly talk of random events as being subject to sqrt[n] fluctuations where n = number of occurrences). Since 3400 - 1120 = 2280, then the event is (2280/33.5)sigma = 68sigma away from the mean.

Now, where I get a little fuzzy is converting this 68sigma into a probability. If we believe we can approximate things by a Gaussian, then we have some idea of how a Gaussian behaves (e.g., 3sigma event is 1% chance while 6*sigma event is 1 in a million). Clearly, this is increasing very quickly with the multiplier in front of the sigma.

However, I forget in what limit the Gaussian is a valid approximation to a Poisson and think it might not be so here where you get way out in the tails. So, I tried to remember what the Poisson distribution is and I believe it to be P(x) = (x/sigma)exp(-x/sigma) where x = number of occurrences. Thus, to find the probability of being more than 68sigma out in the tail of this distribution, you end up doing the integral of uexp(-u)du from 68 to infinity, which gives 67exp(-68) = 210^(-28).

I really don’t know what your point is here. Yes, some improbable things happen if you look at enough events. But people have analyzed the digits of pi and found that it behaves statistically like one expects a random sequence to behave. Statistics does not rule out improbable events…it simply tells you how often a specific improbable event is likely to occur. For example, you would expect to see an occurrence of the numbers 1-9 in sequence when you look at a string of 10^9 random digits. (I can’t quite figure out if you would expect to see it approximately once or approximately 10 times and won’t bang my head against the wall trying to figure that one out any longer!) Are you sure about the 27 eights in a row though?..That one does sound a bit outside the envelope!

Another thing you do have to be careful about is what you define as an improbable event. For example, I could calculate the probability of getting exactly 3407 Buchanan votes out of our sample of 431,000-odd voters and report that…But this would be an abuse of statistics because, there is nothing special about this 3407 number and the question I really want to know the answer to is how likely it is to get at least that many votes…not precisely that many votes! (Sort of like if you had a nine-digit lottery in your state and one day they picked the number “3178254891” and you said, “Wow…how funky…there is only a 1 in a billion chance they would pick that particular number and yet they did!” Of course, when the lottery is drawn, there are 10^9 equally improbable outcomes and one of them is going to happen!)

In the November 1996 general election, Palm Beach County accounted for 7.49% of the voters in Florida, but only 6.35% of the votes for Perot [Reform Party Candidate] in Florida. (To put it another way, Perot got 9.1% of the total vote in Florida and 7.7% of the vote in PBC.)

By the way, the similar figures for the March 12, 1996 Republican primary were that PBC had 6.40% of the voters but only 5.40% of the votes for Buchanan.

By contrast, in last week’s election, PBC had 7.26% of the voters but 19.6% of the votes for Buchanan!!!

All figures numbers above are derived from figures I downloaded from the Florida Department of Elections website. They are not second-hand.