I’d made a post last week in a MPSIMS thread on the topic, which I’ll repost here.
(Professional market researcher here)
Exactly so. All of the major political polling companies are market research companies; they do all sorts of other survey research in addition to political polls. They use political polling as a way to build awareness of their brand names, and demonstrate that they’re good at what they do.
Gallup has been the “name brand” in political polling for decades, but, they decided to sit the 2016 elections out entirely. Why? Because all of the conventional wisdom on how to conduct political polling (and how to interpret it) has gone out the window. And, no, it’s not just a function of Donald Trump.
In the past, political polling was a pretty exact science, with some well-established norms on how you did it, and how to model it. It was exclusively conducted via telephone interviews, and home (land line) telephones, at that.
In a poll (or in any market research), it’s a known fact that you generally don’t need a very large sample of respondents, to get fairly accurate results – but there is a very important assumption in that statement. The assumption is that you (the researcher / pollster) are able to construct a sample which is representative of the entire population – that is, you’ll have the right proportions by demographics (age, sex, income, etc.), as well as by political leanings (liberal, conservative, etc.)
If you’re able to get that sample correct, then you only need to interview 300-500 people to get a very reliable result. In political polling, you’ll often see them with sample sizes of 1500 or so, but this is because the pollster wants to be able to show their results among sub-groups within the sample (men, registered Democrats, etc.), and they want to have sufficiently large samples within those sub-groups.
Over the past 15-20 years, market research, as a whole, has really struggled with how to ensure that their samples are representative of the entire population, for a variety of reasons:
-
We used to use telephone interviewing a lot (and, as noted above, the political polling models were all based around it). Caller ID is now close to ubiquitous in the U.S., and many people won’t answer a call from someone they don’t know.
-
Fewer than half of U.S. households now have a landline phone. The demographics for people who don’t have landlines generally skew younger, as well, meaning that only surveying people with landlines (as political polls have traditionally done) will yield a sample of people who won’t necessarily look like the broader population.
-
Surveying via cell phones is certainly possible, but it isn’t necessarily as simple as substituting cell phone calls for landline phone calls. One big issue is cell phone number portability – if I’m a pollster, calling a landline with a 312 area code, I know I’m calling a household / respondent in Chicago. Someone who has a 312 area code on their cell phone could literally be living anywhere – they may have gotten that number years ago, when they lived in Chicago, but they can now keep that number, regardless of where they move to. This makes creating a representative sample (from a geographic standpoint) with cell phones very challenging. In addition, polling on landlines had decades of validation testing behind it, something that cell phone polling doesn’t (yet) have. And, cell phones have the same problem that landlines do, from the standpoint of people screening their calls.
-
Most market research (i.e., for consumer products, which is what I work on) has shifted to online research. This has its own problems on representativeness regarding people who aren’t online (particularly senior citizens), but more broadly, the bigger issue is that the demographics of people who are willing to participate in online research likely isn’t representative of the entire U.S. population, and the way in which many market research companies get people to participate in online studies (i.e., offer them some sort of points/incentive) undoubtedly attracts a particular type of respondent (i.e., those who are easily incentivized in this way).
For political polling, as I understand it, as with cell phones, there hasn’t been much validation work done to assess how accurate online political polling is, and political polling has traditionally not used respondents who receive incentives to participate.
The people who’ve built predictive models around polling (Nate Silver, Sam Wang, etc.) have done so looking at decades of polling data, which was (nearly) all done using the old paradigms and old methods of political polling…all of which are now up for grabs.
I should also note the L.A. Times poll, which has been mentioned. They took a different approach to a traditional poll, in that they recruited one sample of respondents, and then surveyed them repeatedly over the course of the campaign. This is called a “longitudinal sample”, and it’s not how traditional polling has been done (most polls recruit a new “one and done” set of respondents with each wave of a poll). The L.A. Times poll consistently looked different than most of the other polls (it was nearly always higher for Trump), and folks like Nate Silver had a hard time fitting it into their models (because of its different methodology). As it turned out, in this case, that poll may have been closer to “truth”, at least this time out. It’ll be interesting to see if their approach becomes more widely used.
And, yes, there may well have been a phenomenon of “bashful Trump voters.” But, with Clinton also widely disliked, it’s entirely possible that there were “bashful Hillary voters,” too. I’m not sure that we’ll ever really understand how much of a factor they played in the polls.
This went on longer than I intended! In short: political polling is really, really difficult to get right today, for several reasons. I don’t think that polling is going to go away; I anticipate that there’ll be a lot of analysis that gets done, and a lot of experimentation, in an effort to address the issues. But, if there were easy fixes to be made to get polling “right”, they likely would have been done already.