So will anybody ever trust the polls again?

Doesn’t this invalidate the “shy Trump supporter” hypothesis? If anything, shy Trumpers would be more prevalent in blue states (since they’re surrounded by people who would presumably loom down on them for supporting Trump) and so the polls would be least accurate in blue states and most accurate in red states.

The fact that you found exactly the opposite means that the polling errors are due to other reasons, not shy Trump supporters

Not sure about that. I think the “shy” Trump voter may be somewhat overstated, but in every poll there’s a block of presumed undecideds that broke for Trump. Are they “shy”, I don’t know. I do think there’s an actively trolling Trump voter who intentionally fucks with the pollster in the reddest of districts.

I was, once. I answered honestly.

It’s not that I don’t think there are “shy” Trump voters, it’s just that I don’t think they are located where they make much difference.

New York went way more heavily towards Trump that anticipated, about 60-40 IIRC. I think that is a shy Trump voter effect.

But in less deep blue areas, I don’t think the effect is that strong and I don’t think it’s limited to Trump. I think it’s mostly people keeping their opinion quiet when they conflict with the prevailing opinion of whatever group they are with at the moment. I don’t think a Biden voter in a MAGA workplace is going to be any less shy than a Trump voter in a liberal workplace.

And I really don’t think Trump voter are such fragile snowflakes that they’d lie to a pollster because they were afraid the pollster would yell at him or scared the pollster would think he was a racist. I’m more likely to think they are lying just to lie. I may be wrong and maybe they really are that fragile, but if you’re scared to reveal your beliefs to a random stranger you will never have contact with again because you think they will think you are horrible, maybe you should re-examine your beliefs.

But I’m in the camp that thinks there is just something wrong with the methodology. It wasn’t just a couple of polls that were off by the margin of error. This was not random variation. There were hundreds of polls that were ultimately wrong in exactly the same way - even if they were all predictions of the same event.

Well, it looks like I owe a small, proverbial apology to 538. Assuming things hold up the ECV count will be in their window. A number of states were off by quite a bit but others, like GA, were pretty darn close. It looked much worse on Tuesday night.

But the point is that unless you know to check for that characteristic, there’s no way to know your sample was not representative until after the polled event.

trump supporters lie to pollsters so that they can bitch about the polls being wrong. It’s that simple.

I agree. I know people think there was no organized effort, and I never saw evidence of a highly organized one. But I used to join/follow some of the Trumper Facebook groups and hash tags out of curiosity and there were plenty of people bragging about lying to pollsters.
I was never sure whether I believed them, most of the stories sounded bullshitty because they always talk about how the pollster cheered them when they said Biden. I never saw an organized movement to say “let’s all do this”, but the idea was out there and firmly planted,

I think the missing ingredient for the polling error this year will be first time white male voters. I can’t imagine pollsters gave them enough weighting, and Trump really seemed to squeeze rural counties for every bubba vote out there. That said, given that partisanship has now included all media and by extension pollsters as enemies of the people for at least 40% of Americans, I’m not sure traditional methods are going to work any longer. It might be time to turn to data analytics (Big Data) for a more accurate assessment of voter opinion and participation. Those tools seem far harder to deceive and more granular than traditional polling.

It looks a LOT like the "what if everything is off like it was last time? " chart. The only real exception is GA and I think the answer there may be Stacy Abrams and her whole campaign. She shifted the needle on who the likely voters were, and the polls didn’t.

One thing that stood out to me reading on FiveThirtyEight before the election was the statement that the pandemic itself could wind up messing up the polling, but I didn’t catch anything where he elaborated on that.

Sure, anything door-to-door would be absent, but I thought most polls of any import were done over the phone even in normal times.

I think it was more about the pandemic throwing off the adjustments for who was a likely voter.

I’m personally done with polls. I thought of their methods as antiquated already but recently some of the misses have been huge, and even if the excuses are correct, it just means next time it’s “polls say X but they could be completely wrong because reasons”.

I think next cycle I’ll pay more attention to the betting markets plus whatever deep learning looks most promising at that time.

Why do you ask this every time? Do you understand how statistical sampling works and therefore why your question is completely and utterly irrelevant? If not, I minored in statistics and would be happy to explain.

Also, yes I was polled. But again that is totally irrelevant.

How accurate are the betting markets? I would think there would be some bias there, as the people who participate in the betting markets are a self-selected group of people with certain traits that might not be reflected in the population as a whole. But maybe that group of people are good at analysis and picking who the winner would be.

ISTM that the betting markets (at least PredictIt, which is the one I followed) were pretty bad this cycle. They consistently overestimated Trump’s chances.

They had Biden/Trump as approximately 65%/35% on election day, which was much more favorable to Trump than any other analysis. But you could defend that based on ultimate results. However, later in the evening, they had Trump as the big favorite, while Nate Silver was saying that assuming Biden held a couple of state he was likely to hold, he thought the likelihood was 88%/12% for Biden. Struck by the contrast, I looked at some individual states and it seemed to me that Silver was at least closer to the mark, but the fact that people were putting money on the other side of the coin made me think perhaps I was missing something. Turns out the betting markets were.

To elaborate on what @Babale is saying, the number of people you need to get a good sample is essentially independent of the size of the population. To get a margin of error of a few percent, you need to survey about a thousand people. So if you’re doing a good poll of the mayoral race in a town of 10,000 people, you’ll need to call 1,000 people, and a sizeable percentage of the population will end up being called. And if you’re doing a good poll of a state of 10,000,000 people, you’ll need to call 1,000 people, and only a very tiny percentage of the population will end up being called.

I put together a pretty long post on this in a different thread. PredictIt was a superior predictor of electoral vote awarding entities the entire cycle. PredictIt in May was better than 538 in November.

If you looked at the lean of all 56 PredictIt markets for electoral vote awarding entities at the end of May and then went into an induced coma until mid-November, you would be mildly surprised by Georgia. The other 55 were correct.

The flaws with the betting markets have already been shown. Deep learning, on the other hand, requires a ton of data to be accurate. Presidential elections happen every four years, so we have few data points, and the number of features that affect the outcome likely change drastically quite often. You might see some sort of machine learning applied on top of polls, which isn’t drastically more advanced than what 538 is doing already, but I don’t see deep learning coming into play any time soon. As an example, deep learning models aren’t any good at identifying cats from looking at 10 cat pictures, it becomes good by looking at thousands or often millions of cat pictures.

Polls remain the best way to do this, because a poll is supposed to be nothing but a miniaturized dry run of an election, but we currently suck at polling. As I’ve stated elsewhere, I think the primary problem is correctly picking the subpopulations to account for when doing stratification. The problem with that is if you forgot to include a necessary feature to identify these new subpopulations, you get to wait four years to try again. If Biden lost Florida due to second generation American-Cubans and those within two degrees of separation from them (just a hypothetical here), then we need to know how large that group is and account for it when sampling. But then in four years, it turns out that one of the candidates is allergic to dogs and that dog lovers have a negative response to that. Modeling is hard. Modeling rare events (like major elections) is harder.

To paraphrase someone from my Facebook feed: I don’t know about all those fancy numbers, but I don’t know any Democrats who live in Arizona, so the election was stolen.