Suppose that I wanted to be assured of the accuracy of electronic voting. If as a test, every nth person in line at a polling place were asked to fill out a paper ballot at the same time that they voted electronically, and assuming that they filled out the ballot honestly, what number would n have to be in order to have a reliable integrity check of the voting machines? In other words, how do you ascertain a valid sample size?
Let’s assume that 1000 people were voting today at this precinct.
This question is way too big and complicated as stated to have any real-world use. Dishonesty is a showstopper on its own. A surprising number of people lie about their vote in any type of exit poll.
The basic question seems to be how big a sample can accurately reflect a population of 1000 voters. In that case, the sample size wouldn’t need to be very big for most applications. However, it is very hard to get the margin of error down to an acceptable level for this purpose. Even a margin of error of 3% would be gigantic and render the study pointless because the outcome might hinge on a single vote and it is almost certainly small errors that you are looking for. The techniques needed to get down to say, a 99.5% confidence level are very different than that for a typical 95% or 97% and there doesn’t seem to be a good way to do it in this real-world case.
The only real way to do it would be to have some kind of double-voting that is blind to those that might manipulate it and then compare the results.
Because exit polling almost always differs from the actual results of the vote and we can’t just conclude that every election has fraud and methodological problems. It isn’t really surprising why that would be. For the poll to be accurate, it has to be done by random sampling and random sampling of this type has been conclusively proven to reliably turn up yahoos. Some of them just want to screw with the pollsters for seemingly being nosy, others feel guilty about their vote, and still others are accompanied by friends and family whose political opinions may secretly vary from their own. Others may have no idea of who they just voted for.
This type of thing isn’t that big of a problem with other types of studies because there is room for some error but what you are really looking for here is a very accurate comparison count which just can’t be done through polling.
More than the real proportion of yahoos in the general population? :dubious:
I dunno, your cite doesn’t actually provide any proof that people lie more on political polls than on any other kind of survey. Blumenthal lists lying as #5 on his list of problems with polling, but doesn’t give data or provide a citation for data that show how frequently people lie to exit pollsters.
I’m with you on being skeptical of exit polls, especially those produced by political parties and activist groups, but your citation doesn’t demonstrate your claim that lying by Joe Q. Voter is a huge problem.
I’m also baffled by your claim that we need a 99.5% confidence interval. Why isn’t 95% or 97% adequate? Many, perhaps most, elections (not recent ones for US President, of course, but that’s a tiny sample of all elections) are resolved by margins of greater than 3%.
Along these lines, I expect that you get a lot of people who don’t want to answer honestly because “their people” mostly vote for the other party. For example, a black person might be worried that they’d be harassed if they admitted to voting for the Republican candidate, white a white Southerner might be hesitant to admit that he voted for the Democrats.
The major issue here is that you need a standard to compare your inputs and outputs to and exit polling can’t do that accurately enough to be of any use in this context. Any true errors with the real voting equipment and reporting (the equivalent of a hanging chad problem) would be washed out by the margins of error that these types of studies would be looking to highlight. It doesn’t do much good to have a study with a 5%+/- margin of error looking for problems in the 0% - 4% range (simplified).
It could find large deviations in results versus polled in theory but that is only if it was conducted perfectly and that cannot be assumed. The poll has more potential weaknesses than any real voting system in use in the U.S. so the suspect in the case of a discrepancy will be the poll itself and will likely remain that way unless overwhelming evidence can prove otherwise.
This is a difficult problem in general and it is often encountered in lots of data problems like I work with every day. You can’t really prove that one set of outputs is wrong when it is taken from the place that has the best information available and compare it against another set that could potentially have many more types of problems.
This is a difficult issue and harder to solve than most people initially think. One workable solution is to have an electronic voting system that also prints receipts for the voter to review and sign after the electronic selections have been made. The receipt then has to be signed and submitted back into the machine for archival and auditing purposes before the vote is counted. That produces two matching data-sources that should always match in theory and can be audited at any time with near perfect accuracy.
But there’s also the statistical accuracy issue you ask about, and that does have an answer. I think you are asking about Student’s “T statistic”. This is most often described in terms of how many measurements you need to compare two groups and decide whether they are different, but a more basic application is to ask how many measurements you need to to get a certain level of confidence that the average of the entire population is within some range you can state.
I think we may be talking about different things here - the accuracy of the voting technique or process, and the accuracy of a particular actual election. What you’ve said above is correct for determining the accuracy of a particular election and the exit polling results therein - we are barred by law from directly observing the actual balloting, so we cannot establish the rate of lying to exit pollsters. However, if we want to establish the accuracy of a voting technique, it would be a non-trivial, but conceptually straightforward, exercise to create an experimental election in which you can observe the actual votes of individuals and compare that to their statements to exit pollsters to discover what rates of lying might be. The applicability of this experiment to real elections is debatable, but we do have very large sample sizes of real elections and their exit polling data to compare the outcome of the experiment to.
I’m uneasy with the accuracy of the results of many experiments in the social sciences (it’s so much easier with animals and plants that have no knowledge of what you might be doing to them), but statistically, there’s no reason this experiment would need a confidence level of 99.5% instead of 95% or 97%.
From what I’m reading of the OP, it seems like s/he’s asking about finding the accuracy of the technique, not of a particular election.
I’m certainly with you there, and it would be interesting to have two such data sets to compare. Has any voting area implemented this technique yet?