Claims of vote flipping in Republican primaries.

davidm · October 24, 2012, 10:03pm

This article claims statistical evidence of vote flipping in favor of Romney in the Republican primaries.

Wisconsin, for example, is represented in the graph below. Moving from the smallest to largest precincts, you can see Romney’s percent of the vote takes off and those of the others drop after about 7% of the votes are counted. Romney’s percentage of precinct votes goes up (the upward slope of the green line) while those of the three other candidates decline.

The steady increase in Romney’s percent of the vote and steady decline in Santorum’s represents a statistical anomaly. In this case, the anomaly is amazing according to the researchers. They argue that the probability of this happening by chance alone is so small it exceeds the capability of statistical packages to handle. Their software says Romney’s share of the vote, increasing with precinct size has zero probability of occurring by chance alone.
…
There were eleven states that showed this amazing anomaly, Romney gaining in votes and margins as precinct size increased. The chart below shows the estimated vote flipping for eleven of the fifty states analyzed by the group using precinct-level data.
…
Similarly, the vast majority of votes in Wisconsin are cast using electronic voting machines and central tabulators. However, eight counties still use paper ballots that are counted at the precinct level. The paper ballot counties have no central tabulators. They show no pattern of vote flipping.

I’m extremely skeptical of this analysis. I don’t like computerized voting and I’m uneasy about the possibility of undetectable manipulation, but the idea that this kind of manipulation could occur over a large number of precincts, in a number of different states, using a number of different voting systems, strains my credulity.

So, what’s the flaw in the analysis? Why would Romney’s percentage increase with the size of the precincts, except for those with paper ballots? Is it even true that that’s the case? Is this just a case of being selective about the data?

fumster · October 25, 2012, 12:26am

It seems to lack a little rigor. One would expect larger precincts, presumably in more densely settled neighborhoods, to have different voting patterns. It’ll be interesting to hear if anything comes of it.

I’d like to see a Federal statute calling for the death penalty for people who mess with voting or registration. It’s tantamount to treason.

ETA: This “They argue that the probability of this happening by chance alone is so small it exceeds the capability of statistical packages to handle” seems like total bullshit.

jshore · October 25, 2012, 12:40am

Well, I don’t think it is “bullshit” in a strict sense. I think it is almost surely true. However, that doesn’t mean that the deviation from pure randomness is due to some malicious factor. It could be due to some non-malicious factor like the possibility that there are perfectly valid reasons why larger precincts would tend to be more pro-Romney than smaller precincts (because larger precincts are generally less rural and Romney was less popular in the socially-conservative rural districts for example).

What it shows you is that there are statistically-significant deviations from the model that votes in the precincts of different size are completely random, i.e., that there is no correlation between precinct size and voting pattern. But, I imagine you would find such a thing to be true in lots of elections for the sort of reasons that I mentioned. I wonder if the original paper in any way tried to control for such things.

davidm · October 25, 2012, 1:48am

What about the claim that this pattern doesn’t hold true in precincts with paper ballots?

Enkel · October 25, 2012, 1:48am

What they question is why the size of the precinct seems to directly correlate to the pro-Romney effect. Buried within the links, there was a study that they did to test the theory of rural vs less rural areas that seems to disprove at least that hypothesis. In the material that I read, they ask for other theories and provide contact links.

The statement about a statistical probability of this happening by chance being so small that the statistical package (Excel) cannot handle it was about the direct correlation between precinct size and pro-Romney effect.

No other candidate in the primaries had this same effect in any of the states and precincts that they tested. Only Romney benefited from this anomaly.

The pro-Romney effect as the total number of votes counted rose did not appear in any primary precinct that was manually counted and tabulated. Those precincts experienced the same flat line effect as has been seen in the historical elections that they provided for comparison. Again, buried in the links is an explanation for why the flat line is the expected trajectory as the number of votes counted increases. They also provided an example from a non-US election to show that the flat line effect as the total number of votes counted is the expected norm even in other countries. Buried within the links is a discussion about how news organizations use this flat line effect to predict vote outcomes in advance of the final votes and how the high number of prediction errors cause some researches to look at the cause of these errors.

So, the symptoms that they show:

High (direct?) correlation between size of precinct and pro-Romney vote
Lack of the classic flat line as the total number of votes counted increase.
No candidate beside Romney experienced a positive effect from this anomaly.
This (the Romney upswing as vote counts increase) is a new effect , not previously seen in polling data.
These effects have a high correlation with the use of automated voting equipment and were not been observed in the manual precincts that they’ve looked at.

They’ve called this effect “Vote Flipping”

Enkel · October 25, 2012, 1:59am

From a purely technical (technology) perspective, the creation of a vote flipping ‘device’ would be trivial to design an manufacture. It could be a modified camera, an android device, … there are lots of possibilities.

The most complex part would be the statistics involved in the design of a central controller to ensure a low level effect over a wide group of states and precincts because they would need to ensure that they ‘tweek’ the counts enough to achieve the desired result while not taking it so far as to set off alarms.

With a good design, the ‘tweeking’ devices could be updated with new data targets through the course of the day over the internet, so the central controller could actually provide real-time response to vote tallies in non-target precincts.

fumster · October 25, 2012, 2:01am

It was “independent researchers” which is short for amateurs. I doubt it was a peer-reviewed paper in a reputable journal.

davidm · October 25, 2012, 2:01am

Enkel,
You’ve lost me. How could a camera or an android manipulate tabulors?

davidm · October 25, 2012, 2:04am

Another point. Why would vote manipulation result in this pattern? Why flip a greater percentage of votes in large precincts?

You’d think that someone sophisticated enough to manipulate different types of machines all over the country would try to be more subtle.

Enkel · October 25, 2012, 2:24am

DavidM - The data from these machines are stored on either SD memory cards or thumb drives (depending on the manufacturer). All you need is a device that can read and re-write the data on the device. With a properly designed “tweeker” device, it would take only seconds, maybe a minute, to alter the data. I’m saying a camera only because it is a small device, easily carried and the ‘guts’ could be ripped out and replaced with a simple processor and multiple storage reader devices (e.g. thumb drive, SD Card, etc), the processor can be triggered by a button and provide an LED readout of results (e.g. process complete).

Pseudo-codish-example:

Tweeker is deployed to Precinct X. They connect to the central controller to download the expected machine type and the target data levels (e.g. candidate X gets 43% of vote)

JoeBloe carrying the tweeker is working at the precinct and volunteers to log in the data cards as they arrive from the polling sites. For all purposes, he’s supposed to open an envelope, verify a serial number, write some stuff down, then provide the data card to someone who processes it. But, he is also slipping it into his tweeker, pressing a button, waiting for a green light before he passes it on.

Pseudo-ish processing description:
If candidate X has 43% or more, no action.
If candidate X has less than 43% {
determine number of votes needed (N) to put Candidate X at 43%
for 2/3 of N [
find vote for Candidate Y
flip vote to Candidate X
]
for 1/3 N [
find vote for Candidate U
flip vote to Candidate X
]
update totals on the storage card to reflect new totals.

A simple PERL script could process thousands of records in seconds.

The key in this scenario is that
(a) They need access to the storage devices at some point after the voting but before the ‘official tabulation’.
(b) They need access to the voting machine design specs to create the tweeker code.

IF they didn’t change the data on the voting machine storage media, that would leave them open to easy discovery of the manipulation. So I think that they would have to alter the data on the storage device itself.

Enkel · October 25, 2012, 2:34am

As pointed out, vote flipping has to potential to result in a negative vote count, so very small precincts are risky. Some of the polling stations they talked about had only 10 voters. It would be easy for them to discover at the next community picnic that no one actually did vote for Candidate X.

That is why the direct correlation is presented as a symptom of automated manipulation. Theoretically, the large the number of voters, the more they can tweak because the larger anomaly can be lost in the size of the voting group.

If the assertion is correct, there have been oddities in the past that may have been trail runs. Back in 2004 (I think) there were machines that counted more votes than actual voters. It was written off to random error or chance at the time. But, it would make sense that if you’re going to run an operation like this, you would start out with feasibility tests, then a slightly larger test run. The only problem is that you only get so many ‘live voting’ windows to test in. The results that are being highlighted in the articles are from the primaries. I would expect them to update the central controller to provide more randomness so that the direct correlation effect does not happen in the general election in November.

davidm · October 25, 2012, 2:45am

Well sure, if they have someone with physical access to the equipment. But how many people would that involve?

I could accept manipulation at a county or two here and there. I could even accept machines intentionally programmed for cheating at the factory (1 person replacing the standard drive image with altered code).

But being able to train and place that many completely trustworthy people in the right positions all over the country, or being able to sabotage a number of different companies’ software at the factory starts to sound like tinfoil hat conspiracy theory.

Some things I’d like to know:

How many different brands of tabulators were in use in the counties that showed this pattern?

Were updates applied remotely? Were these tabulators internet accessible, or accessible via dialup? Via WiFi?

Fotheringay-Phipps · October 25, 2012, 2:53am

Beyond the points others have made here, it’s worthwhile to consider the old saying about statistics: “if you torture the data enough, it will confess to anything”.

There are virtually unlimited number of ways in which you can cut and slice the data and analyze to find anomalies, and with the power of modern computers you can find them. Among those ways will be some which are extremely unlikely to have happened by chance alone.

This will be true of almost any data set.

Enkel · October 25, 2012, 3:25am

Here is a copy of the 2007 audit on Ohio machines. There were 3 vendors in use in Ohio. I’m not sure that there are very many more vendors out there.

The PDF file gives complete descriptions of the machines (including pictures) how they’re set up and their weaknesses. This document has been publicly available for some time… so everyone has had access to it if they were interested in the topic.

Measure_for_Measure · October 25, 2012, 5:15am

jshore:

Well, I don’t think it is “bullshit” in a strict sense. I think it is almost surely true. However, that doesn’t mean that the deviation from pure randomness is due to some malicious factor. It could be due to some non-malicious factor like the possibility that there are perfectly valid reasons why larger precincts would tend to be more pro-Romney than smaller precincts (because larger precincts are generally less rural and Romney was less popular in the socially-conservative rural districts for example).

What it shows you is that there are statistically-significant deviations from the model that votes in the precincts of different size are completely random, i.e., that there is no correlation between precinct size and voting pattern. But, I imagine you would find such a thing to be true in lots of elections for the sort of reasons that I mentioned. I wonder if the original paper in any way tried to control for such things.

The authors respond in their 2nd paper:

They also claim that John McCain seemed to have benefited from the anomaly both in the primaries and against Obama. During the 2012 primaries, the effect was seen in every state except Utah and Puerto Rico.

They claim the effect was observed in South Carolina, and subsequently re-appeared virtually everywhere else in a systematic manner. I don’t think this is a case of data mining.

It could be a case of fabrication or bad faith (or fooling oneself, by seeing patterns where they don’t exist and reporting the cases where the patterns are clear). Given the allegations, the matter deserves review by a statistician.

The authors maintain: This is not a large conspiracy involving a complex network of perpetrators. Such an alleged election fraud could be accomplished by only a single, highly clever computer programmer with access to voting machine software updates.

davidm · October 25, 2012, 2:34pm

Measure_for_Measure:

The authors respond in their 2nd paper: They also claim that John McCain seemed to have benefited from the anomaly both in the primaries and against Obama. During the 2012 primaries, the effect was seen in every state except Utah and Puerto Rico.
They claim the effect was observed in South Carolina, and subsequently re-appeared virtually everywhere else in a systematic manner. I don’t think this is a case of data mining.

It could be a case of fabrication or bad faith (or fooling oneself, by seeing patterns where they don’t exist and reporting the cases where the patterns are clear). Given the allegations, the matter deserves review by a statistician. The authors maintain: This is not a large conspiracy involving a complex network of perpetrators. Such an alleged election fraud could be accomplished by only a single, highly clever computer programmer with access to voting machine software updates.

He would have to have access to the updates for machines from a number of different manufacturers. Unless this effect is only seen in counties using one or a small number of brands of machines.

I haven’t had a chance to look at Enkel’s PDF yet, but he says that it discusses machines in Ohio. The authors are claiming that this effect occurs in a number of states. Are those same three manufacturers in use in all of the states that show this pattern?

I’m not trying to dismiss this out of hand. It’s extremely troubling. I work in IT and I DO NOT think that we should be voting in this manner. But the first step in investigating is determining whether or not what we’re seeing is real.

davidm · October 25, 2012, 2:35pm

I should have added: How are the machines updated? Is it done online? Maybe that’s discussed in the PDF.

PsychProfessor · October 25, 2012, 2:38pm

I do not want to believe that these findings are valid. I have been scouring the web for evidence that it is simply a tin hat conspiracy. But so far, no one has effectively discounted this disturbing information or the concomitant conclusion that there is some kind of tampering going on.

Many have suggested alternative explanations that the authors have already ruled out:
-they have already shown that the pattern is not due to differing demographics for big vs. small precincts
-they have already shown that precinct size is unrelated to “republican-ness”
-they have already shown that the pattern does not emerge in other elections–it is not some rare but still possible “glitch”–it only occurs when a central tabulator is used and never in elections involving only Democrats.

It seems clear that something weird must be happening, if the “N” remains the same for a precinct and just the distribution of the votes is changing systematically within that N.

I am desperately seeking some very smart person to point out the logical flaw in the conclusion that this is the result of intentional malfeasance. Even some benign alternative explanation for that weird effect would be welcome (but one that has not already been ruled out).

Essentially, my question is, if we assume that the data are real and the analyses sound (these might be big IF’s but…) what could explain this effect, other than vote flipping? Are there persuasive logical explanations that do not require intentional tampering?

Ludovic · October 25, 2012, 2:44pm

It could be a combination of underestimating the chance of it happening by chance, along with the hundreds, if not thousands, of chances of anomalies like this happening across America every year. If there is a .1% chance of something happening by chance, then each election season there will probably be an anomaly somewhere in America that looks awfully suspicious, but yet actually happened by chance.

Then again, we could eliminate even the appearance of this by implementing more secure systems. It would increase the electorate’s confidence in the outcome of the election.

davidm · October 25, 2012, 2:53pm

The problem I have with that explanation is that you’re essentially saying that, in cases involving large numbers of data points, statistically unlikely patterns aren’t indicative of anything.

I can accept that, but then how do we tell a real anomaly from a statistical fluke? What are the hallmarks, if any, of a real anomaly? What needs to be present before we decide that something is worth investigating? These are sincere questions, not attempts at rhetorical points.

Topic		Replies	Views
Was the U.S. election hacked? The BBQ Pit	33	1845	November 12, 2004
Here's a beaut: More votes for Bush than voters The BBQ Pit	51	2817	November 6, 2004
Voter Machine Fraud Starting Already Great Debates	60	6490	October 28, 2008
Survey Response Error: Variance v. Bias (Another Election Thread) Great Debates	129	5547	January 27, 2001
Diebold strikes again? The BBQ Pit	32	2606	February 27, 2008

Claims of vote flipping in Republican primaries.

Related topics