I have put up a simple website with estimates of the probabilities of Obama or Romney winning the election: prespredict.com
Basically, in 2008, I did some calculations to estimate the probability of Obama or McCain winning the election, using state poll results from electoral-vote.com, and plotted the probabilities vs time, right up to the election. The results were quite interesting. So, this year, I’m doing the same for the 2012 elections, and have put it all up on a quick website I put together.
I include both 2012 and 2008 results, and make an attempt to see how they correlate with campaign events and national news stories. There is also a section where I describe the methodology, in case you’re interested.
So far, the 2012 results are somewhat uninteresting (except for the fact that Obama’s estimated probability of winning is higher than I would have thought). For a more interesting “roller-coaster” graph see the 2008 results.
The biggest problem I see is that state probabilities are treated as independent.
Suppose an external event causes Romney to get a higher vote percentage in Pennsylvania than present polls predict. That event will cause Romney’s vote percentage to increase in Ohio and other states as well. Does your model consider this?
(ETA: The variance of the sum of independent variables is much less than that of the sum of dependent variables.)
As an example of a potential problem, I see where you wrote “…on May 9th, Obama came out in support of gay marriage, and this seems to have caused a strong decline in his chance of winning, going from 95% on May 9th to 82% on June 16.” Other than the fact that these events happened at about the same time, you offer no evidence that his gay marriage support is what resulted in his chance of winning going down. I’d be careful with broad statements such as this.
Your method of calculating the probability of a state going one way or another is kind of weird. Where did you get it from? Why didn’t you use the usual method of assuming the actual probabilities were normally distributed around the poll results?
The dependence between states is something that I definitely want to add to the model, but as is the main plots I think are not affected by it as much.
That is, if the polls stay as they are, and if we assume an error in the way the polls estimate the underlying percentages in different states, the this poll error is likely independent from state-to-state. So my model of translating the existing poll difference to a probability of winning a given state (shown in the second figure on the “Methodology”), and then taking that and calculating the probability of a candidate having N electoral votes, should hold.
So it should be OK as an estimate of the probability of winning, if the election were held today.
It’s only if we want to use it in building an estimate of the probability of winning on election day that we have to take into account potential changes in the per-state percentages, and when we do that, it is more accurate to take into account correlations between state results.
As I said, this is something that I am planning on adding to the model, but the current estimate should be quite accurate as a present-day estimate, and hopefully decent as an election-day estimate.
I agree, and that’s why I provided some caveats:
[ul]
[li]In the “Observations” section I said [/li]“We have tried to find events that were turning points in the election campaigns. If there are other events that you think caused some of the turning points in the election, please let us know via the contact page.”
[li]Also, in the quote you mention, I did say “this **seems **to have caused”, that is, I’m not stating it as 100% fact, but that it’s a correlation I have noticed.[/li][/ul]
I didn’t use the usual method of assuming the actual probabilities were normally distributed around the poll results because I don’t agree that they accurately represent the odds of winning the state.
For a comparison, in this image I plot both the Gaussian model of predicting a state winner (i.e. normally distributed poll errors) and my model. They’re similar, but my model has a flat region in the middle to signify the fact that if the poll results are very close, either of the candidates could win.