Probability and statistics is a very difficult thing for many people to wrap their minds around. I am a manufacturing engineer and I deal with them every day and I took an upper division math course on the subject when I was getting my degrees. It’s not intuitive. Sadly, a lot of engineers never grasp it.
The higher his confidence is, the more likely his model is to be “correct.” The ones that his models missed were ones were he gave a likelihood of something like 51%. His model is going to be “wrong” a great deal of the time with that confidence level.
You have to look at all of his forecasts as a whole to see how good his model is. He should be right most of the time (but not all of the time) when the forecast shows a high number and less of the time the lower the number. Looking at one or two cases does not tell you anything about how good the model is. You need a lot of data points.
This is wrong from start to finish. If you look at the front page of the 538 site you’ll see a map called State-by-State Probabilities. On it he predicts each individual state’s
chances of going to one side or another. Every single state has more than a 55% chance of going to a candidate. Only two have under 60% chances. That’s the real forecasts that he’s making. It’s not the case that he can be right because he’s calling a state 50.1-49.9. It’s absolutely not the case that he could be wrong enough in so many states that Romney could get 330 EV (a possibility that he gives a tiny fraction of 1% to) and still be right because he was close.
I read RickJay’s post as describing a hypothetical election in which a large number of states are near-tossups. In that case, even if you have a lot of data showing the probabilities for those states are all around 55-45, you’re going to get a fair number of them wrong just going with the most likely prediction for each state. In a case like that, the model can “call” a lot of states incorrectly and still be making good forecasts.
I believe RickJay was speaking about a theoretical race where many of the States were very close and not about the specific case of where the model is today.
To be fair, Colorado was at one point 52.something percent likely to vote for Obama and I’ve seen a couple of other states approach that number for one candidate or the other as well over the course of this thing.
There is no real world in which this is going to happen. Why even bring it up? You might as well invent a zombie virus that only affects Democrats that hits the day before the election.
OK, so maybe not in an election context, but in more general applications of probabilistic forecasting it can definitely happen. That makes it worth discussing, IMO.
I’m in the middle of chapter 4. Recommended.
Silver is actually sympathetic to qualitative analysis – as long as it adds value. He notes that following the publication of Moneyball the Oakland A’s increased their scouting budget. Today, the stat nerds and the old school baseball scouts work in tandem. What I found interesting is that the A’s say they don’t do anything on the basis of their gut. Instead they try to systematically blend information from their statistical models with insights from the scouts. It’s not numbers vs. instincts, it’s a model that provides a baseline vs. scouts who can report and evaluate a player’s medical records, throwing speed, mental toolkit, etc.
The problem with political pundits is that their blather is vapid: Silver notes that the predictions made on the McLaughlin Group, a Sunday talk show, are equivalent to tossing a coin (or worse -mfm). And there’s no accountability, no penalty for saying something crazy. Quite the opposite: there’s a larger market for ideological provocateurs than for analysts (IMHO). Ref: Morris, Dick. That said, Silver incorporates qualitative info into some of his modelling, specifically the House evaluations of Charlie Cook’s group.
Nate Silver tweets: Tetlock’s 20-year study of political “experts” found the more often they went on TV, the worse their predictions were. http://bitly.com/UbnAUz And here was an interesting exchange today: Dylan Byers @DylanByers
Honest Q: People are aware that RCP poll aggregate also predicted 49 of 50 states in 2008 & also missed Indiana, right? RealClearPolitics - November 2: RCP No Toss Up Count
Nate Silver replies
.@DylanByers: We have a lot of readers because we use data to cut through the drivel that you obsess over. Not because we make predictions. Brad DeLong summarizes:
(Duck Duck Go didn’t help here, but google did.) Nate Silver: (Whether you like the FiveThirtyEight forecast model or not, one advantage it has is that we don’t change the rules as we go along. The forecasts that you see today are from a program that we designed in the spring, before knowing how the election would play out.) Oct. 10: Is Romney Leading Right Now? - The New York Times
And yet I don’t recall liberals saying “well, he’s obviously biased if he’s predicting Republican gains.”
There are a number of possible explanations for this, not least that it happened and I’ve forgotten it, and not all of which cast Democrats in a positive light and/or Republicans in a negative one.
I think NS will take an unjustified hit if his predictions are off, but it’s based on an unjustified credit he’s getting now based on his predictions being accurate. As people have noted, there is a true variance that no models can account for in the short term, and so far Silver’s reputation is based on short term accuracy.
Also a significant point is that some of the accurate predictions that NS made - alond with all other posters - are really no big deal and shouldn’t count in the W column. If a modeler or pollster gets 49 of 50 states correct (as NS did in 2008) that’s pretty good, but not as good as it may appear. The vast majority of these results are not all that close, and most of the remaining ones could prbably be accurately predicted based on cruder methods. The only way to rate a pollster or modeler is how much of an increase in accuracy his polls or models provide, and I don’t see this being done, at least in Silver’s case.
This doesn’t seem correct to me. Silver’s model is supposed to account for the possibility that the polls are off. This is something that he explicitly says that he measures for, based on historical data.
What seems to me to be the bigger GIGO issue is that his model seems very much to be a black box. He describes in very general terms what he is doing. He says the model accounts for this and it accounts for that, but he doesn’t describe precisely how it accounts for these things, and his assumptions in general - and their basis - are very unclear. For example, he weights economic factors less and less heavily as he gets closer to the election. This makes perfect sense, but how much weight does he assign to economic factors versus polls at a given point, and on what basis? And so on, for any number of other assumptions and methodologies.
From my experience in my own field (actuarial) I’ve seen a lot of this. You can make a very sophisticated model with all sorts of complicated formulas and methods etc., but the answer can be very sensitive to some key assumptions that ultimately have very little basis (which is when the value of having letters after your name comes in handy ).
In sum, NS seems to be doing a very thorough job and he adds a lot to the field (& I agree that most of the criticism he currently receives is sour grapes from conservatives), but while I suspect that his model is more accurate than just averaging polls or educated guesses from political junkies or the like, I think the verdict is very much out at this time.
That only works by ignoring that his methods also pointed out the bad apples among the pollsters.
IMHO no one should be relied on 100%, Silver is just better than many of the people dealing with this, and does show how others can be pulling a fast one, the main point here is that he has the means to take others to task and has done so in a nonpartisan way.
No, I agree with F-P on this. Nate did really well in 2008 and 2010, but not significantly better than simpler models. I believe Sam Wang at Princeton and the RCP average were pretty much spot on as well in 2008.
In 2010 the simple polling averages were also pretty much right-on, and Nate only did marginally better than Sam did in predicting the House results.
I believe back in 2008 Nate even had a post about how his “special sauce” such as it is really doesn’t change things that much. In anything other than the closest of elections it’s not going to appreciably change the results. This may be one of those (particularly if the RCP average shows a Romney popular vote win but Sam and Nate both predict an Obama popular vote win).
ETA: I’d like to add the caveat that I’m only talking about “final” predictions. I do believe Nate and Sam both provide much better information in the earlier days of an election when “narratives”, “swings”, and “momentum” get a lot of play in the media. 538 has been invaluable in showing that there’s really only been two real inflection points in this campaign (the DNC and the first debate) and that the rest has been regression towards the stable O+1.5.
Look, I absolutely adore Nate Silver and the work he does. If this election was going the other way, with Romney showing a big lead in his forecast - I’d be pretty sad about it, but I’d trust it.
Look at Silver like a weatherman. You can be the best meteorologist in the world, and still have people pissed off when it rains on their bbq.
Uh, that reply did not deal with what I said, I only pointed out at other reasons (besides the ones you mention in the end) why he is more reliable than others and indeed, it is because of his previous efforts on dealing with GIGO that he is IMHO more reliable than others.
The point here is that this item is very important when comparing who are the more reliable sources out there: Do they make an effort to check how reliable their own sources are?
That may be true in a world where there were good pollsters and bad pollsters and the good pollsters only put out good polls and the bad pollsters only put out bad polls. (In this I mean “good” and “bad” to mean something tied to accuracy and bias).
But the reality is that good pollsters sometimes put out bad polls - it’s just part of the job. And, conversely, sometimes bad pollsters will put out good polls.
It’s not yet clear that attempting to correct for this makes much of a difference. This is a pretty young field, and a competitive one, so we just don’t have quite enough data yet, IMO, to say that something like (for example) weighting pollsters based on house effects or past accuracy is more robust than just averaging them together. It certainly feels like it should be, but so far it hasn’t been proven.
It’s also tricky because when we try to analyze pollster accuracy we, by definition, are looking at just one poll out of the potentially hundreds they release over the cycle (the last one before the election). House effects can be measure by seeing how far from the consensus they are throughout the race, but accuracy depends on a good pollster not having a bad poll right at the end of the race.
I object even to that. People are having more than enough trouble understanding the real-world example we’re pointing to. There may be some systems somewhere in which diametically opposed forecasts could both turn out to be right because of the closeness of large numbers of subsets, but the voting in individual states is not one of them. Even if some bizarre statistical anomaly struck the earth and caused it, then every other aspect of a Silver-type forecast would change accordingly. (Assuming that his system would even be possible.) You could not get 48 states with over 60% possibility toward one side in such a a world. (In fact, today it’s 49 with Florida at 59.3%.) In your example, the forecasts would look and behave totally differently.
Remember that your original post made explicit reference to the possibility of Silver’s predictions and the Colorado predictions both being right. That is not the same as the theoretical world that you switched to. Those specific predictions about our world cannot both be right.
You’re confusing a badly confused audience. It’s not helpful. A thread about the mechanics of forecasting and what could happen at theoretical extremes might be useful, but that should be separate from this one.
Ok, it is clear that you did not read the Esquire article and the links in the article, yes, good pollsters sometimes put out bad polls, but if you think Silver has not taken that into account or has not investigated who is more accurate and who is falsifying the data, you are ignorant of a lot of what Silver has done before.
Silver has a column today comparing his and six other aggregate/forecasting sites and they’re all pretty much on board where it counts.
Of course, Silver goes into lengthy exposition on the polls whereas RCP doesn’t. So critics go into lengthy exposition of how biased and liberal and fake Silver must be and content themselves to just “unskew” the polls presented on RCP until it’s an answer they like.