So will anybody ever trust the polls again?

AIUI, that data was considered something of a win for the betting markets.
A week before the election PredictIt was giving Biden only 60% odds to win, while 538 was giving him 87%. The closeness of the subsequent election suggests the former probability was probably the better estimate (yes I know, any outcome is consistent with either probability. But if I were to say the chance of an event is 1 billion to 1, and you say the chance is 50:50, and the event happens…AIUI the chance is that your estimate was better than mine).

Yes, and I don’t know. I especially don’t know how this plays out in the US, where it may be categorized as illegal betting. But, in general, it seems that these prediction markets often perform well.

Mean squared error across all 56 electoral vote awarding entities (50 states, 5 congressional districts, and the District of Columbia) over time. Lower error is better.

Google Photos

PredictIt in mid-April, and at every other time, was better than 538 on it’s best day.

This is the problem, isn’t it? Even if we throw aside all privacy concerns, if we could give a deep learning network absolutely ludicrous amount of data - say, all the info Google, Facebook, Twitter, our credit card companies, etc collect on us…

Theoretically there should be more than enough info there to predict how you’ll vote. Vastly more.

But if we only have an election every four years, how is the system supposed to build a model and refine it?

I guess we could split the electorate into a bunch of smaller, randomly assigned groups and let the model practice on those. But then the data it is learning from is limited – and that has consequences. Maybe you could train a neural network to look at the purchasing habits of voters during 2015-2016, and come up with a fairly accurate prediction of how they voted in the 2016 election. Given the purchasing data of voters in 2019-2020, the model is likely to become very confused, because purchasing habits will have shifted radically without a corresponding shift in politics. Thanks, pandemic!

I think you are right, elections are too infrequent and too dependent on too many factors that are unique to that election for a machine learning model to accurately predict. Maybe we can continue doing polls and train the model on those polls, but then you’re back to inaccurate polls causing problems.

My broader point was that PredictIt also showed Trump with much higher chances than 538 when that prediction was incorrect. This suggests that the higher odds on Trump during most of the race wasn’t due to more accuracy but rather to a general tendency to over-rate Trump’s chances. Trump coming closer than expected seemed to validate that, but that’s not real accuracy.

What do people think of the idea I’ve seen floating around, that the pre-election news reports of record (mail-in) ballots favoring registered democrats motivated republicans to double down on turning out?

PredictIt is not better than the polls. The bets skew towards cult candidates. The only reason PredictIt did better than the polls this time around was because the cult candidate exceeded expectations.

Right now, it will cost you 8 cents to buy “Will Trump Win the Popular Vote?”. To buy Biden as the presidential election winner will only cost you 89 cents at the moment. If you thought last month that Trump would get 280 electoral votes MORE than Biden, it would cost you 4 cents.

Bernie Sanders also had higher prices than he deserved. So did Hillary this time around, but from the Hillary haters and conspiracy nuts. “Will Hillary Run?” was close to 10 cents all the way up to October. PredictIt is more a gauge of what the extremists are thinking.

There was no point in this cycle when 538 had a lower mean squared error across all electoral vote awarding entities than PredictIt. 538’s best day was worse than PredictIt’s worst day.

This is certainly possible. Got anything to back it up?

What polling error?

538 said Biden would win, and he did- with a comfortable margin. He said a landslide for Joe was unlikely.

538 said the Dems had only a 60% chance of taking the senate, and altho that is unlikely, it is coming close.

He said the Dems would retain control over the House- and they did.

Yeah, the people complaining about poll accuracy seem upset that the polls didn’t predict that Trump would lead early but that this lead would fade away. But they did predict exactly this.

Plus, as others have noted, Republicans were listening to the news too; they heard about record turnouts and, because the news continously talked about how early votes would lean D, put two and two together, realized this means a D lead, and went out to vote.

While all the (preliminary count) errors seem to be within “normal” polling error (using that term both statistically and generally), they were also correlated (in one direction). Such correlation was expected by the aggregate models of 538 and the Economist, but it’s worth asking why, and whether we’ll always have to add/subtract 3% in a certain direction to get a truer picture. Especially since it’s the same error direction as the last presidential election that the polls supposedly corrected for this time.

It is pretty clear- racists dont answer pollsters.

The rest of my post. Right now Donald Trump to win the presidency is at 13 cents. Six cents to win the popular vote. These numbers are not realistic.

I have a dataset of over 200 million people, updated monthly, that is supposed to consist of all adults in the US and I have hundreds upon hundreds of data points (including things like ethnicity, primary language spoken, interest scores in certain hobbies, number of grandchildren under 12, whether you have a premium credit card, etc.) for each individual, including the political party you belong to. What I don’t have is how you voted in any single election. Even if I did, the next election could have different factors. For instance, my database doesn’t have a score for how racist someone is, their level of misogyny, how long they’re willing to stand in a line, what flag they are waving on their front porch, etc.

That’s fair. Does it hold true for state level data, other elections, etc? If so, I’m on board with testing it out historically (assuming the data is readily available without having to do a ton of data entry).

Here’s the post from Nate Silver that I’m recalling. From the election night live blog thread 11/4 12:34 AM:

A lot of states have been called on some networks and not on others. If you plug in all the states where any network has called the state for Biden or Trump, it shows Biden at 88 percent to win the Electoral College, Trump at 6 percent, and a 6 percent chance of a tie. But that depends on Biden winning Arizona and Minnesota, which ABC News hasn’t projected yet.

You could quibble with the methodology of “any network has called”, but networks tend to be cautious, and if using the “any network” approach you’re at 88%, then you’re clearly a very solid favorite. At that exact time (right when I went to sleep :slight_smile: ), PredictIt was showing Trump as the solid favorite (over 60%) and the contrast was striking.

So I think PredictIt was simply skewed to Trump.

This is incorrect. 538 gave the Ds a 75% chance of taking the senate.

Yeah but my point is that even if you linked each record in your database to their voting record (obviously there are plenty of reasons to make this impossible. But ignoring those – let’s say we so desperately want to fix polling we are willing to do away with privacy), I don’t think we’d have enough info for the model to work. The factors that matter to people each election cycle are just too unpredictable.

That is correct, which is what I was trying to say with the end of that post starting with “Even if…”.

Ah, sorry, yes we are on the same page :slight_smile:

I wonder if the one thing that might grant enough data, especially as the electorate is getting younger, is internet data? You could potentially do this anonymously, looking at say search trends among unique users in a given geographic area. A deep learning network with access to, say, Google’s data in real time, given a few elections to practice, MIGHT be able to keep a pulse on the electorate…?

Potentially through some variation on sentiment analysis (i.e. positive vs negative tweets about Trump) or search trends, but that’s tricky as well.

If the nurse at the hospital tweets “Kate Beckinsale is fine”, that means something different than if I tweet “Kate Beckinsale is fine.”