To expand on that…here’s how a poll is basically conducted.
Call person in XXX area code
No answer…next number
Person hangs up…next number
Person answers…ask if they live in precinct XX, if no…hang up, next number
Person is in the target precinct, ask if they plan to vote, if no…hang up, next number
Person is a likely voter, are they in a demographic that they have already maxed out, if yes…hang up, next number
Repeat until you’ve successfully reached a complete, statistically representative sampling of the state.
So basically anyone who doesn’t fit, including unlikely voters, gets excluded so they don’t poison the sample. If they would represent an over-representation of a given group, they get excluded so they don’t bias the sample.
There’s only a couple ways that polling can fail.
Respondents lie in a unaccounted for way
The sample is not actually statistically representative i.e. they are missing or undercounting a cohort
The sample sizes are too small
The questions are asked in a biased way
High turnout should not, by itself, have any impact on the mix of voters assuming the polling is done correctly. If there’s abnormally high turnout in one cohort, that can mean that #2 was wrong. But that’s not the same thing as misidentifying a “likely voter”.
Also, this is clearly not true. As an example, it’s easier to view the caller on a mobile phone than landline, and therefore easier to screen calls. So mobile phone owners may be more likely to hang up based on seeing essentially a spam call. If there is a political bias to mobile phone/landline users (which there is), then it will affect the results.
I think that pollsters try to correct for these effects, but either way they’re affecting the projection.
No they aren’t. It is much harder to get a response from a mobile number, this is true, but it doesn’t bias the result because they never make it into the result. Assuming a pollster is doing his job, they must get a representative sample of all voters, this includes a representative mix of mobile and landlines. If mobile numbers are harder to reach, that just means they need to call WAY more of them in order to arrive at a representative sample. If they ae lazy and cutting corners, then sure, they would undercount the voters with mobile numbers but there’s no indication that this is happening. Mobile numbers certainly do not swing red or rural.
That’s the very assumption that’s being questioned. My question is whether the LV questions themselves might be a greater source of bias.
The Gallup LV questions are on a scale (one of them just being a 10-point “are you likely to vote?”). Even if the people being polled are telling the absolute truth, it may be that there’s bias to how they interpret the question.
Also, voting is ultimately a binary decision, and so is the LV calculation. The pollsters have to set a cutoff, but they might pick the wrong point. If you have a bunch of 10-pointers vs. 5-pointers, then it matters a whole lot if you set the LV threshold at 6 vs. 4. An actual difference in enthusiasm may not matter much if it’s as “sufficient” levels either way.
Treating them as separate (and specifically including mobile numbers) is something they had to start doing, specifically because not doing so would bias the results. Differential hangups were one such way of affecting results.
High turnout can effect the accuracy of the polls. Most pollsters don’t do a sample and then print the raw data. They manipulate them to try and fit the demographics of voters (I think likely voters). For example, if for some reason their sample is 50% black they’ll adjust their numbers to reflect the polling skew. One of the big post-mortems of 2016 was that pollsters didn’t adjust enough for education level, something they thought they fixed this time.
So–assuming they adjust numbers by expected voters–if a lot more people vote then that could have some impact on the voting demographics, which will impact the polling accuracy.
Important to mention that this method is specific to Gallup. Other pollsters will have methods closer to binary. Personally, I think asking a voter a simple Yes/No question about their propensity to vote may be as reliable as this more complicated method.
The only way your thesis plays out is if a huge percentage of voters who said they support Trump also said they were less likely to vote. And because “less likely” is a weighted response that proportion would need to be very high to move the overall numbers by that much. A much simpler explanation is that they just lied on the top-line question of who they would vote for. Remember this is a “high turnout” election but that boost is probably only on the order of 4% over 2008. It wouldn’t explain the polling variance.
In either case, there’s only two likely causes. Respondents lied about who they were voting for or the sampling was really poor. I think it’s probably the first one.
This requires that you know all the characteristics that might make one more or less likely to make it through the survey process and that you know to correct for them. And there’s no possible way to know them all.
You can poll rural middle-aged church-going left-handed Jets fans, but if you don’t realize that people who like to eat chicken sandwiches with pickles on them are biased toward the Republican party and also less likely to answer their phones (all that pickle juice on their fingers), then you’re not going to account for that in your demographics.
That’s percentage points, not percent. Since the baseline is ~60%, that’s really more like 6%, and might be more like 8% when it’s all said and done.
Going back to my original question: did polls predict the high turnout? Because regardless of possible bias, if their LV data didn’t show any difference between 2016 and 2020, it hints at a problem in their LV methodology.
What I think you’re missing is that it’s not always obvious which factors can influence votes, so the pollsters don’t always adjust for all of them. Not just that, but it may not be possible to properly adjust for all factors, since if you adjust the percentage of one, then you’re also adjusting the percentages of others which happen to be correlated in your particular sample.
I’ve been polled numerous times over the years. (Of note, I live in NJ - not a battleground state.)
But in recent years I’ve taken to declining to answer. Not because of any “shyness”, but because IME these pollsters take a long time. It can literally be 15 minutes on the phone (they ask a lot of questions about you, and then a lot of questions about your opinions) and I am not motivated enough for that.
I was polled in 2016, on a cell phone even, (mostly about local ballot measures), and because I happened to be waiting for soccer class to end and had nothing better to do, I went through with it, and it was the longest 35 minutes of my life.
“How much do you support Proposition 25? Not at all? A little bit? Some? A lot? Very much? completely?”
“How likely are you to change your support for Proposition 25 in the future? Not at all? A little bit? Some? A lot? Very much? completely?”
“How likely are you to vote? Not at all likely? Somewhat likely? Very likely? Definitely?”
On and on, with the person insisting they read EVERY SINGLE valid response to EVERY SINGLE question. It was the worst and I would definitely never do it again.
I’ve already opined that I’m sure there are plenty of liar/troll respondents out there, and I’m sure the fever swamps are encouraging people to troll anyone who polls them. But I’m not sure how much an effect that is.
And it also doesn’t comport with what we’ve seen from Trump voters. The whole gimmick of Trump, and Republicans, and conservatives in general, is to puff themselves up and make themselves look bigger than they are. Even at this late hour with the election almost called, you put a mike in front of them and they’re cheerily proclaiming that they have all of America behind them.
The adjectives “shy” and “retiring” don’t fit Trump voters and never have. They’re trolls, but history suggests they’re pretty stupid and ineffective ones.
That’s fair, but IMO there’s a big spectrum of legitimacy. In my experience, pollsters give correct caller ID information. They also exist for a legitimate purpose. The nasty spammers spoof their numbers and the only purpose is identity theft.
This is one way to go about it. You can fudge the numbers to try and account for people you couldn’t reach…or you reach out to more people until you find a representative sample. The former is far more circumstantial and would represent a “bad” poll.
I don’t know that I’m missing it. I’m basically arguing that polling is so broken that it’s useless. Clearly there are factors that no one properly understands. That’s sort of the whole argument.
The minutiae that we’ve gotten into is around more esoteric aspects of the projections, and not about what may be a case of bad data in, bad data out. I don’t think the pollsters are simply applying the wrong corrections…I think they are being actively undermined.