Yeah, I don’t think the stupid fox mascot’s gonna help.
538 has clearly stated they realize vote suppression and cheating is a possible factor and that they can’t accurately account for it, so it’s not as if they haven’t admitted this deficiency.
But the reason he doesn’t say that is because that would be false.
It doesn’t tell me anything I can’t see just by looking at the poll data itself and it doesn’t move any differently than the poll data itself. It’s superfluous.
I have posted this in response to you before, but I’ll try again since I guess I’m a slow learner…
There are well known methods of evaluating probabilistic forecasts. The Brier score is one such method. One can easily compare 538’s forecast to any other forecast with this method to determine which had more value.
Again, that’s only if you look at each race as a “Yes/No” outcome. Look at say, the race for Pennsylvania:
He’s predicted Biden to win, with a range of votes. If the final vote is 60% Biden, he’s still wrong even though Biden wins because it’s way outside the range.
Of course he can. You can simply compare the probabilities he comes up with against performance. You can compare his vote total estimates against performance.
There’s some truth to this. I think of 538 as a sophisticated, well-informed means to help me process the polls as they come out — to better understand what the polls mean, especially regarding what’s likely or not likely to happen on November 3.
This is so incorrect, I don’t even know where to begin.
Sigh. The human brain just isn’t very good at digesting probability.
ETA: I think I misunderstood your point. Let’s try to be clear about this (and to RickyJay, too): Best not to use “right” or “wrong” in this context. If Trump got every single vote in Pennsylvania (or Biden did), Nate would still not be “wrong.”
Such a bizarre outcome would be a point against him though.
The effectiveness of Silver’s models absolutely can be judged to be right or wrong, but not based on one result; they can be judged based on many results. For instance, suppose you took all his state-level predictions for Presidential elections and then judged them against results, and found that almost all of them fell within his predicted confidence interval (which in fact is the case.) That demonstrates his models are right. If instead you found that his predictions in most cases were far off, that would demonstrate his model is wrong. ONE miss wouldn’t show that, but many misses would.
AlsoNamedBot’s example is actually quite right. If on Election Day, Biden is predicted to get 51% of the popular vote to Trump’s 46, and instead it’s 60-36, Silver’s model didn’t work well at all. That would be way outside his confidence interval. (I think you erroneously believe Also was referring to a 60% probability of victory. He was not.)
Or else how would YOU judge the effectiveness of his models?
DraftKings should get into this biz and set up a competitor site touting a buncha ‘big data secret sauce derp derp’ and just take 538’s probabilities and change them by 1 point.
No, I got it.
It comes down to semantics, I suppose. I agree with your wording — that such an outcome would show Nate’s model to be “problematic,” or “have issues,” or maybe even “faulty” — but still not WRONG. Maybe it’s just me, but I prefer to reserve that word for things that simply do not apply in any probabilistic approach.
I like it! (Even better, soften it to “SOME modeling assumptions might be incorrect”… but even that’s not quite right — a lot of it is about how much UNCERTAINTY does the model express to be likely…and his models always account for at least SOME …so, a very small degree of uncertainty could be one of those “incorrect assumptions,” but not this year — due to COVID and other factors, Biden’s chances in the model will never be super-high, almost whatever the polls say, even on Nov 2)
I agree that this year is a very bad year to evaluate Nate’s predictive accuracy. His model by necessity ignores some potentially very large effects that can’t be accounted for because they are unique to this year and can’t be accounted for (even Hari Seldon couldn’t forsee the Mule). But putting up a website whose title is “We don’t know squat” is not going to generate a lot of clicks, so he wisely gives us what he can and acknowledges his limitations.
Regarding the use of the Briar score, it is a good way of comparing competing models, but it can’t necessarily be used to say whether an individual model in and of itself is correct. A pure “we don’t know squat” model, that leaves all outcomes as equally likely ,can’t be disproved no matter what the data. However, it can be shown to be inferior to a model that actually takes the risk of making some probabilistic predictions and having them pan out.