Doubting the statistical analysis at 538

This is not certain at all. Covariance is a rigorously defined mathematical concept. It’s a fairly straightforward calculation.

If it is only covariance that we are concerned about, I can add that into my model, as I mentioned before. What would be a reasonable swing … 10 points in either direction? I am guessing even with covariance factored in, Trump’s chances won’t improve much. I’ll give it a try.

What? I am no statistician but I have a hard time believing that. Being rigourlessly defined does not equal a straightforward calculation in this situation. A reasonable calculation of a covariance between states and the national polling would look at how the ratios of white/black/latinos/old people/young people/educated compare. Istm anyway.

Here’s a simplified covariance calculation using the RCP average margin of Clinton over Trump in four way polling in OH and PA in the month of August.


	OH	PA	OH * PA
8/1	1.40	3.50	4.90
8/2	1.40	3.50	4.90
8/3	1.40	8.30	11.62
8/4	1.40	8.30	11.62
8/5	1.40	8.30	11.62
8/6	1.40	8.30	11.62
8/7	1.40	8.30	11.62
8/8	2.20	8.70	19.14
8/9	2.20	8.70	19.14
8/10	2.20	8.70	19.14
8/11	2.20	8.70	19.14
8/12	2.20	8.70	19.14
8/13	2.20	8.70	19.14
8/14	2.20	8.70	19.14
8/15	2.20	8.70	19.14
8/16	3.30	8.70	28.71
8/17	3.30	8.70	28.71
8/18	3.30	8.70	28.71
8/19	3.30	8.70	28.71
8/20	4.00	8.70	34.80
8/21	4.00	8.70	34.80
8/22	4.00	10.00	40.00
8/23	4.00	10.00	40.00
8/24	4.00	10.00	40.00
8/25	4.00	10.00	40.00
8/26	4.00	10.00	40.00
8/27	4.00	10.00	40.00
8/28	3.20	8.80	28.16
8/29	3.20	7.60	24.32
8/30	3.20	7.60	24.32
8/31	3.20	6.80	21.76
Average	2.75	8.42	24.00
			
Covariance	0.80		

There’s a lot of data there, but the calculation is straightforward. I’m pretty sure Nate Silver does something substantially similar. It would be very unNatelike to cram a bunch of extra variables in there instead of just calculating how states are correlated by using state level polling as a proxy.

Trump is still only winning 1% of the scenarios in my model, even with an additional variable pulling all state polls up to 10 pts in either direction. The covariance seems to hurt him just as often as it helps him. I’m not sure how to accurately model the covariance without undermining the randomizing that determines the winner in each state.

I upped the covariance to ridiculous levels just to see if that would make a difference, and with swings of 30 pts in either direction, Trump now wins 18 scenarios out of 400, so about 4.5%.

Ok, so the covariance is straightforwardly calculated from the demonstrated polling variance? I guess I was misled a little by hearing how similar demographics in different states reflect similar polling shifts.

The only way to get up to 538 levels of over 20% chance for Trump to win is to put all states into play AND assume all polls could be off by a large %.

I’m going to need to see the details of your model. You’re using the word covariance, but I’m pretty sure you’re actually doing something else.

Here’s how my simple model would work.

  1. Start with a random variable that models the national popular vote margin of Clinton over Trump. This requires calculating a mean and variance of such a beast. Using a normal distribution is not perfect but probably good enough. Call this x.

  2. For each state do a linear regression between that state and the national popular vote. Call this a_i * x + b_i.

  3. For each state calculate mean and variance of the error between your state level regression equation and the national popular. Use this to create a normally distributed random error variable for each state. Call this e_i.

  4. Randomly roll up x and all the e_i’s. Then calculate each state margin as y_i = a_i * x + b_i + e_i.

  5. High five the next person you see.

This is not a perfect model and I haven’t tested it in any way, but it should give a general picture of these things work.

Also, I thought I should add that I just didn’t make this up. Here’s a link.

Yeah, I’m not actually modeling the covariance. I’m simulating it with an additional random variable, which applies to all the states. It’s like saying “Let’s assume all polls are underestimating Trump by X%” and X is the same in every state.

“Black swans” are not unpredictable Acts of God. In fact the classic examples of “black swans” are ordinary events, but with a dispersion greater than expected under Gaussian models. IIRC the very term was coined in a very simple context when it was realized that certain financial parameters were not Gaussian, but were better modeled with a power law.

If you’re unwilling to give examples of the most likely “black swans” you envision, your comments cannot be taken seriously.

Not sure where you get your concept of “Black Swan events” but *by definition *they are NOT ordinary events. They are extraordinary events, without precedent, rare, unpredictable, with extreme impact, that makes sense only retrospectively The metaphor refers to that which is felt to not exist before occurs. Yes those sort of unprecedented events are better modelled with a power law than the normal curve but the key aspect of them is that no one sees them coming or knows what they will be.

As the WSJ describes them:

The very latest national poll from CNN shows Trump 45, Clinton 43, Johnson 7, Stein 2. Clinton is up only 1% in the latest Virginia poll. BetFair shows 30% as the chance for GOP White House.

I certainly hope you optimists are all right and Hillary wins in a landslide. It’s very hard to imagine where Trump can be getting the votes to lead 45-43. But this is the most important election of our lives; let’s not fall victim to wishful thinking and complacency.

I’m glad you mentioned the fall of LTCM as the source of the “black swan” analogy. Expecting a complex financial variable to fit a simple Gaussian model was quite short-sighted — I don’t care if Nobel Prize-winners fell victim to this blunder.

In the decade before the collapse of LTCM I had several friends who were top experts in information analysis. Many of them were well aware that many real-world series should be modeled with power series, not Gaussian-fitting.

Of course this is all tangential to some extent. As shown in my last post, Trump may emerge victorious without any swan, black or otherwise.

I’m an optimist, but already in polls and other posts I predicted that the election was going to be close. Still a Clinton win, but a very sad lesson that the Republicans will learn. That they came close to winning by stopping to pretend that they are not just only tolerating their bigots and ignorants. And that the establishment will become more willing to normalize the bigotry and ignorance of the ones they used to just pander before.

It will not be just the members of the Republican party the ones that will suffer a lot in the future.

and now she’s down to +2.4 in a two way and +3.3 in a three way.

How about before we slam Silver’s model we first predict where the race settles. Is this the settling point? Or should we expect a little more tightening first?

Now TPP and Clinton’s pretty obvious lying about her position is back in the news thanks to the President pushing it hard. She’s had some pretty vicious news cycles lately while Trump has been unusually quiet.

You mean “quiet” ever since his stunning diplomatic triumph in Mexico?

Well I’m on record: Polls showing Clinton +5 with swings +/- 3 around it. Any aggregated results outside that range regressing towards that mean fairly quickly.

Do you have a prediction to share?

Relatively speaking he’s been very quiet, and his trip to Mexico was clearly a big success. That combined with Clinton not running a sensational campaign is why he’s doing better in the polls lately.