Polling: Unskewed Polls and comments on polling (moved from Harris Thread)

Of course it isn’t arbitrary. It is the product of a statistical formula. If anyone has the formula currently used by an A-rated pollster, please post it.

It seems to me that weighing by known demographic characteristics improves on what you would get applying a simple margin of error formula from an intro statistics textbook. Weighting certainly makes the polls more accurate, and perhaps that can be calculated with some complex formula.

But such a complex formula is blown out of the water when you have to call 100 registered voters to get one or two respondents. This makes it basically an opt-in sample. If we thought that whether or not you opt-in doesn’t make much difference in which candidate you prefer, no biggie. But opting in to a survey likely DOES have a significant relationship to which candidate you prefer, making these non-probability samples. And calculating the margin of error for a non-probability sample, even a really good non-probability sample, is impossible.

P.S. To be a little more concrete about this non-probability sample thing: Suppose you get a call from Rasmussen. To most, the name means nothing. And the great majority will hang up on any pollster. However, a few Republicans will have heard good things about Rasmussen and stay on the line. This could explain the GOP bias of Rasmussen polls without assuming any lack of professionalism on their part. It neither increases nor decreases their margin of error – it just makes it incalculable.

The statistical margin of error does not capture the true margin of error of polls. I’ll quote from the Pew Research article I linked at the beginning of this thread:

The real margin of error is often about double the one reported. A typical election poll sample of about 1,000 people has a margin of sampling error that’s about plus or minus 3 percentage points. That number expresses the uncertainty that results from taking a sample of the population rather than interviewing everyone. Random samples are likely to differ a little from the population just by chance, in the same way that the quality of your hand in a card game varies from one deal to the next.

The problem is that sampling error is not the only kind of error that affects a poll. Those other kinds of error, in fact, can be as large or larger than sampling error. Consequently, the reported margin of error can lead people to think that polls are more accurate than they really are.

There are three other, equally important sources of error in polling: noncoverage error, where not all the target population has a chance of being sampled; nonresponse error, where certain groups of people may be less likely to participate; and measurement error, where people may not properly understand the questions or misreport their opinions. Not only does the margin of error fail to account for those other sources of potential error, putting a number only on sampling error implies to the public that other kinds of error do not exist.

Several recent studies show that the average total error in a poll estimate may be closer to twice as large as that implied by a typical margin of sampling error. This hidden error underscores the fact that polls may not be precise enough to call the winner in a close election.

I see it as more of an issue with the modeling.

A model can give you a 63.2% chance, but what does that mean given the relatively large margins of error on your inputs (the polls themselves) and depends on your model parameters being accurate (even a small error can swing a state or two)?

We give weather forecasters a hard time for doing a very hard job but they’ve got vastly superior data and modeling but still try to limit rain forecasts to the nearest 5% or 10% for many of the things we see here - they realize there’s only so much detail they can really offer down to that level but people try to read a lot more into the numbers than is prudent or realistic.

Right now, the modeling tells us that Harris probably has a slight edge but it’s far from certain and nowhere close to being comfortable. You can say “there’s a difference between 63% vs 55%” but this is the classic “accuracy vs precision” issue you should learn in school.

Yes, model error and parameter error are always more important than stochastic error in situations like this. The polls “margin of error” is just stochastic error.

One thing that stands out to me in contrasting Silver and 538 is that Silver stands by his fundamentals favoring Trump, and 538 is at least popular vote H+3.2 fundamentals alone. As they each ratchet down the value of the fundamental factor they should converge.

I found it interesting is that Silver keeps his model secret, but oddly and strangely, he had Harris UP, before Republicans bought him, and now he says Harris down. hmmm.

IMHO, Silver was bought and sold. His forecasts are as valuable as trumps own.

I keep thinking Silver has a financial advantage in continuing to promote horse race angle.

I signed up for a month to be able to see his forecast and frankly I will probably unsubscribe before it renews. I get his reputation as the “polling guru” but frankly I don’t see what 538 is doing that means their model should be discounted just because Silver has issues with them.

FWIW I only do the free Silver Bulletin but today states converging back to 50
50. 538 is 61 Harris. Not a huge difference and one that will, as stated above, diminish as their difference in fundamentals view decreases as a factor. I wouldn’t pay for his punditry. But I see these bought and sold bits as not much better than they are eating their dogs. YMMV.

What I find interesting about 538s numbers is that they have Harris overall at 61%, but only at 55% for PA. This does go along with there having PA as having only a 16.9% chance of being the tipping point state. They seem to factor in some amount of flexibility in states moving in different directions. In other words, if I understand their model correctly, they are saying there’s a 6% chance Harris can win without PA. My guess is that states moving separately is probably a small effect, if present at all. That leads me to suspect that the 55% number is closer to the actual picture than 61% (assuming that 55% is accurate for PA).

Laying aside the “who bought who” issue for the nonce and invoking DSeid’s point about the way Silver accounts for “fundamentals”:

In mid-August, Silver had Harris up to 59%. What were the fundamentals then, did they strongly favor Trump then, and if not – are those fundamentals so different one month later?

As an aside, Silver also like to drops mention of “She should’ve picked Shapiro as VP” with annoying frequency. Enough already.

Given that we’re in very novel cycle with the Biden-Harris switch, it seems legit that there would be more disagreements between aggregators on aspects of fundamentals, like whether Harris is an “incumbent” or a “challenger”. Many of the “fundamentals” analyses of past elections (e.g. 538 in previous elections) were based on regressions that factored in time-to-election, convention bumps etc. and none of those conditions hold this year. I think this puts most of these decisions on fundamentals this year far closer to punditry/opinion than analysis (even if all of the analysts have “pure” motives) so personally I’m looking at “polls-only” outputs much more this year.

Yes, the fundamentals of a Million $ + of GOP money to Silver.

I don’t do the paid Silver Bulletin, does he do a polls only output now?

Yes, and Silver’s polls-only output is always free to view. At this link (which should keep working for the duration), scroll down to “Who’s ahead in the polls?” and you’ll see Silver’s polls-only chart. You can also click on several individual battleground states to view single-state results.

FWIW, Silver and 538 are generally very close just looking at their polling analysis – Silver has Harris ahead 2.9 points, 538 has her up by an even 3.0 right now.

Thanks. That’s helpful and is very consistent with my belief that their pundit guesswork fundamentals assessments were what was driving the difference. And who knows on that?

Salon, quoting MeidasTouch News’ Brett Meiselas, recently offered a more considered take on Nate Silver’s involvement with Polymarket. They take it as not Republican puppeteering, but as a clear conflict of interest for Silver:

Silver’s now being scrutinized for a potential conflict of interest after joining the crypto-based gambling company Polymarket as an advisor in July, and pushed his model while promoting election betting opportunities.

“Feels like it should be a bigger deal that Nate Silver is employed by Polymarket, a site that allows you to bet on political outcomes, and also runs a “prediction model” that has the ability to directly affect betting behavior,” journalist Brett Meiselas wrote on X.

It may appear to be a science, but polling is really an art.

I’m pretty miffed that people think 538 is more credible than Nate. 538 had some very internally inconsistent numbers under Biden. During the time Biden dropped out, they rebuilt their model. They have not been forthcoming at all (reportably at ABC’s direction) with what went wrong, what they changed, and how things work now.

Nate Silver has always been very transparent with his model. A lot of is behind paywalls now, but it is all there. One of the nice things about Nate is you can see his assumptions, so you can mentally adjust if have different ones. For instance the model is relatively low on Harris, because she lost ground in polling at the time that the model expected her to get a convention bounce. If you think that the weird circumstances of this election meant their would be no bounce than you can add a point or two back to Harris. The model used 6 economic indicators for fundamentals. They have generally been about neutral with real disposable personal income being the main drag and S&P as the main positive for Harris. The problem for Harris she loses a neutral environment. She likely needs to win by at least 2 or 3 points nationally to take the EC.

I’d be interested to read more about this. Can you check through your browser history and link to what you’ve read?

While Biden was still in the race, FiveThirtyEight’s G. Elliott Morris did explain why his model had Biden as a very slight favorite as of mid-July. Succinctly, it was because Morris’ model was heavily – very heavily – weighing a set of fundamentals (state of the economy, incumbency, etc.) over aggregate poll numbers. If interested, here’s Morris’ July 14th breakdown in detail.

Now, at the time of that explanation (7/14/2024), Morris had also already publicly explained that aggregate polling numbers would gradually overtake fundamentals in FiveThirtyEight’s model (discussed in this subforum in other threads, can find link upon request) as the election drew closer.

If Morris rebuilt FiveThirtyEight’s model, perhaps it was to adjust the weighing of fundamentals against polling. Would be interested to read more.

Here you go:

"…Our model now gives more weight to polls

At a high level, the version of the model we published before Biden dropped out of the race allowed for a dynamic rather than static, explicit weighting on the polls. When we launched the forecast in June, this was not an apparent issue; the model was generating estimates that were only slightly closer to our “fundamentals” indicators about the election — such as economic growth and whether an incumbent president is on the ballot — than to the polls, which had a lot of uncertainty in early June.

In July, however, the two sets of indicators started to diverge significantly. As the polls moved away from the fundamentals, the model did not react as expected to the new data and therefore generated estimates that deviated from the polls…

Our model’s overemphasis on the fundamentals stemmed from the way we explored future uncertainty in the election…"