Trying to make sense of the polling

I trust Nate Silver and his predictive models. Over the past several election cycles he’s been anywhere from 95-100% correct in his analysis on how a particular election will turn out and how the states will end up voting in Presidential elections.

But last night he and 538.com was horribly, terribly, wrong. Trump had been trending upward these past few weeks, but topped out at a 28.6% chance of securing the victory. 1 out of 4? That’s not terrible odds. But even that doesn’t tell the full story.

538 was wrong about Pennsylvania (23%), Wisconsin (16.5%), Michigan (21.1%), Florida (44.9%), North Carolina (44.5%), and a portion of Maine (17.3%).

If we trust Nate’s numbers to be accurate, the odds that all 6 states would swing away from what polling data would suggest is .028%. That’s one time in every 3,500 Presidential elections.

So this leads to one of two conclusions:

  1. Nate Silver and the 538 team are really bad at analysis and we shouldn’t trust their reports. I don’t actually believe this, however, this would mean

  2. Nate Silver’s conclusions were correct. But the polls conducted, across the board, were shit. I’m kinda thinking it’s this one.

Look at Michigan’s data in particular. Michigan | 2016 Election Forecast | FiveThirtyEight

They have probably 50 sets of polling data from a dozen different firms spanning a year’s time. All of them, with the exception of 3, put Clinton as the winner. In the past two months there was one poll that put Trump as the winner. The rest went to Clinton. I’m sure if I looked into the other 5 states I’d find the same thing.

There seems to be a fundamental problem with just about every poll and I’m interested to know why. I’m sure Nate is as well. But this circles back around to the first question I have: If every poll got it wrong, then how can we trust the analysis, superior as it may be, that results from it? Garbage in, Garbage out?

It is also Princeton electronic consortium. They totally got it wrong too.

I quit believing 538 because they got several primary races totally wrong. I can’t remember them all, but they felt in my state Hillary had a 90% chance but Bernie actually won by 5 points. There were other states where the outcome was only a few percentage points.

Why was polling so effective in 2008-2014 but so terrible in 2016?

IIRC polling really underestimated the good night the Republicans had in 2014.

The danger in any sampling is the problem of does the sample represent the broader population in a general enough way that any conclusions reached from the sample will hold for the broader population. I’m not a polling expert by any means but I think something has entered the population (if I’m going to hazardous a guess I would say some unidentified effect from cellphones or social media) that is causing the method for obtaining the samples to no longer represent the broader population. Somebody will figure that out.

Before we point too many fingers at Silver, we should keep in mind he had been giving Trump a much better chance of winning than most other predictors were. Granted, he was wrong in predicting Clinton as a clear favorite but he was giving her odds of around 65% when most others were giving her 95% or higher.

This.

I work in market research (though thankfully not in political polling and not in the USA) in house for a large company (so not at one of the places that actually do the questionnaire design and administration - I commission those firms to address the business/consumer issues we have). The industry needs to address a few things but the most pervasive is that response rates have been falling for years, so that there is an increasing likelihood that the people who will participate in surveys are not representative of populations overall.

Put even more simply, consider this: the first question every respondent to every survey of all types has to answer is “Can I be bothered to do this survey?” The number of people answering yes to this question has been falling for a while and it has probably created a skewed population of survey takers who are unlike the population overall. What is worse, understanding how unlike the main population these people are - especially attitudinally - is essentially unknowable. Estimating who and how many people of different opinions/characteristics we’re missing is in the realm of guesswork.

There are things exacerbating this with respect to technology, being able to reach under-sampled populations, etc, but ultimately, if fewer and fewer people want to do surveys and those that do are “weird” what use is the survey, however it is administered? It’s the most worrying aspect of my job (and something that I have to compartmentalise otherwise I’m not going to trust anything I get back when I commission this stuff - at least we can track sales data after we do things that are recommended from our research, preferably doing small scale trials in small markets to see whether it’s going to work and verifying things back that way to mitigate risk; needless to say, you don’t get many opportunities to do that in political polling).

The problem with your math is that you’re considering those states to be independent of one another. But say, to use Snowboarder Bo’s argument about Trump’s win as the correct one, that the problem is that a lot of blue collar jobs have been taken out by competition with China. Well then, it’s a question of how many blue collar workers there are in a state and how easy they are to poll. If they’re, as a class, hard to poll and yet fairly common in certain geographies (which need not limit themselves to state borders), then if state A with a large body of blue collar workers goes Trump, other states with the same demographic will as well.

Basically, there was a fundamental issue that was afflicting a large percentile of the population and that issue is going to affect everyone that it affects. If it affects Pennsylvanians, then it also affects Wisconsinonians and Michigonians. Either all of them are going to go together or not, because it’s not the state that matters, it’s the economic conditions.

The way that Silver got to 1 in 4 was, presumably, by linking the states based on demographics and history of voting parity (and, possibly, by going even further and matching voting parity by demographics). So it may have actually been more likely that they’d all flip than that only a few of them might have, based on the demographics.

To his credit, he also acknowledged the possible distortion effect from significant unknowns. The problem then, though, is that he’s kind of saying “I think I’m right unless for unpredictable reasons I’m not,” and at that point, what value does he add?

Cumbrian has good points.

As for me, I’m surveyed out. EVERYBODY seems to want me to complete a survey, especially after I’ve received a service. (My physicians are the worst about this.) I simply reject all requests for surveys, especially by telephone. My wife and I have refused at least two or three dozen survey requests this election season.

Another issue I saw brought up last night is estimating how many people who say they are planning to vote actually vote. ISTR it was at 538 in the swamp of stuff I read last night but I’m not digging. They pointed out that many polls heavily discount people people who didn’t vote in the last election when looking at who is actually a likely voter. When you’ve got a candidate who’s got a lot of appeal to disaffected voters, like Trump, that’s a recipe for a poor estimate.

You should have paid more attention. Silver has said over and over and over again that (A) there was more uncertainty than usual, and (B) if one of the Rust Belt states went for Trump, that others almost certainly would, too.

From last week:

In general, I’d argue that the issues that concern those from Michigan might not be the ones that affect those from Florida or North Carolina as they’re in three very distinct areas of the map. But overall, your point still stands and it’s one that I agree with. There was something that fundamentally shifted what the polling found from what the results were.

My argument, though, wasn’t about whether the states were linked in some way. It was that if we assume the polling was accurate, then the odds of this result happening in these six states were 1 in 3,500.

Given that this is exactly what occurred, we can either conclude that this was the fluke 1 in 3,500 chance…or we can conclude that the polling was horribly off.
I choose to believe the latter. Now it’s just a matter of determining WHY the polling was off and how we can ever trust “good” analysis again when it’s based upon crap polling.