Stats Question About Polls

OK, help a guy whose statistics skills are old and decrepit
If we have a large number of polls, all showing (say) 5-8% lead for one candidate, but all having margins of error in a similar range, to what extent can we aggregate these results and draw a conclusion without actually knowing the details of how the polls were conducted and with the caveat that we do know they all used different methodology?

In stats, it’s called “meta-analysis,” where you can (with some safeguards) treat a number of polls as one really big poll. This can reduce the overall expected margin of error…but not a whole lot.

(The procedure is controversial, of course. To begin with, the different small polls may have phrased the question slightly differently, or have used different methods of choosing participants. We can’t be sure that the individual little biases cancel each other out. There are competing mathematical models of how to sum up smaller studies and how to aggregate the expected errors. It can get incredibly messy.)

IMHO the most reliable example of this is fivethirtyeight. Nate Silver writes extensively on his methodology here:

He doesn’t really delve into the specific math involved, but he does discuss necessary adjustments in general terms. It’s a fascinating read.

It is not quite the same as a meta-analysis though. In a meta-analysis you need to define ahead of time what methodologies you accept in order to be able say they are of similar enough study design in order to pool the data sets. The case of polls is more like a variety of experiments testing a hypothesis in a variety of manners all leading to the same conclusion. They cannot have their data sets pooled together but the fact that different designs come to the same conclusion lends extra strength to the conclusion.

Three of the more popular aggregators illustrate the different approaches.

538 use a complex system weight the reliability of each polling house and following trendlines within each pollsters results serially. His current polling-only popular vote margin is 5.8% using three and four way polling results when available.

Realclearpolitics just uses a rolling average. In a two-way they currently state Clinton +5, 4.7 in a three-way and 4.5 in a four-way.

PEC (Wang) prefers using a median of recent polls which he feels automatically discounts the outliers. The method relies much more heavily on state data than on national polling figures. He believes that the errors within the houses will neutralize each other. His advantage Clinton (labelled “metamargin” in his technique is Clinton +5.4.

So only a 1.3% range amongst the three methods of aggregation.

If they could actually be handled as a meta-analysis then the MOE of the report of the state of the race on average for the time periods looked at would be modified (in which direction depending on the results). But I do not think they can.

It’s worth noting that the margin of error for most polls is actually greater than the advertised number. This is because the nominal MOE possibility that chance alone would produce the result given that the true number is some other number, but assumes that the only variable is random fluctuation. To the extent that the poll itself is based on some sort of flaw and/or shaky assumptions or data manipulation, then those in turn have their own margin of error, which is pretty much an unknown quantity and can’t be accounted for.

And if a bunch of pollsters all share the same flaw and/or shaky assumption, then they could all be off for the same reason.

Another point - courtesy of 538.com - is that polls are not necessarily completely independent of each other. Since, as above, pollsters have some leeway in manipulating their data, it’s thought that they tend to gently guide their results in the direction of the consensus. As a result, a whole bunch of polls showing similar results are not independent validation of each other, but rather rely on each other to some extent (in some cases).

Bottom line is that while a whole bunch of polls showing roughly comparable results have a much smaller margin for error than any one poll, it’s not as much as one might suppose, and there’s no real way to quantify it mathematically.

Thanks everyone

The part that was tough to wrap my head around was the fact that different polls would have wildly different methodology. I mean, how do you mathematically account for questions worded in different ways, even when ostensibly measuring the same thing?

So I was wondering if, given a vast enough quantity of data, those things start to recede into irrelevance somehow

And of note today’s 538 shows deals with the circumstance that different designs do not seem to be coming to the same conclusions: live telephone calls and online polls are giving systematically different results but not all favoring one direction: it depends on the state and nationally it translated to a two point difference for Trump when online polling is used as oposed to live-interview surveys, which conversely favor Clinton. Which one is “right”? For now they split the difference.

Both these things have some validity to them.

Basically, the more polls telling the same basic story, the stronger that story becomes. But - due to the issues outlined - not in any mathematically quantifiable way.