How can a poll of 1,000 people accurately reflect which candidate is in the lead?

There are about 300,000,000 people in the US. I don’t see how 1,000 people can possibly represent an accurate representation of “who would win if the election were held today.”

If those 1,000 people were spread out evenly (geographically and by population), they’d be spread so thinly that I think it would be useless.

I would even seriously doubt that a poll of 1,000 people could accurately reflect the feelings of voters in a mid-sized city, much less an entire nation.

Can someone explain this to me?

The math behind statistics is actually very beautiful. Sampling, as long as one gets a good representative sample, gives mathematically a nigh perfect interpretation of the whole.

And although the sample size does affect the confidence levels, the sample size is an absolute, not proportional to the size of the universe.

A sample size of 1000 works about equally well no matter whether the universe is 10,000, 10,000,000, or 10,000,000,000.

Getting a good representative sample is extremely difficult, and most real life polls will weight the results to better reflect the real world population. And segments of the population may be inadequately represented in a small sample - black female independents, e.g.

But while sampling is imprecise, statistics are mathematically defined. I’ll let the resident mathematicians come in with the formulas.

It takes a whle course in statistics to really truly grasp how we know that a poll accurately represents nationwide opinion, but the principle is not hard to grasp.

It is true that when attempting to select participants nationwide, you get a rather thin spread. For instance, if the population of the USA is about 300,000,000 then in a poll of sample size 1,000, a city such as Lexington, Ky (pop. 300,000) would be expected to be home to only one participant. So how can that one participant tell the pollster what to expect from the voters in Lexington? He/she can’t.

But now consider all the cities in the country with demographics similar to Lexington. Let’s estimate that one fifth of the nation’s population lives in such cities. Then 200 of the 1,000 participants would be in such cities. And that’s enough to give a reasonable estimate of how people in such cities will vote.

Gallup, Zogby, and CNN place tremendous effort into making sure that their samples are truly representative and don’t discriminate by any measure.

I don’t think a statistician would say that the poll accurately reflects the views of the population.

If 48% a random sample of 1000 in the population says that they would vote for candidate A a statistician would say something like: “At 95% confidence the vote in favor of A is between 45% and 51%.” And this means that one such statement out of 20 like it would be wrong.

It is all based on having random sample of a well mixed population, i.e. no concentration of items having certain characteristics in certain places. If you have such a conditions, such as machine screws coming off a manufacturing line, then mathematicians can show that with a certain specified level of confidence in the results a random sample from the population will have the same probability of containing X failures as does the population but with a tolerance on the populations failures. That’s why the example above said that the vote would be between 45 and 51%.

The difficulty is in getting the correct sample and lots of test runs are made to try to make sure that they are not getting their sample of the US population from a particular place, economic stratum, social group etc., etc., etc.

If it is well done such polling will give results accurate withing the stated 3% plus or minus but it must also be rememberd that you know going in that you will probably be wrong in 1 out 20 such statements.

While I realize it does seem counterintuitive, it’s still true. 1,000 people is a HUGE sample, assuming you pick your sample correctly. Just do the math.

It works if you look at the math involved. Let us suppose you have a situation where 55% of all people want to vote for Smith, and 45% want to vote for Jones.

If you were to poll ten people, you could not get an accurate sampling at all - no fraction of 10 is 55%.

If you were to poll TWENTY people, you could get an accurate sampling (11 for Smith, 9 for Jones.) However, it is quite obvious that you could by random chance happen to get an inaccurate sample. As a matter of fact, the odds of hitting 55% right on the dot are not good - only about 17%. It’s 83% likely your poll will have 10 or fewer Smith voters, or 12 or more… either being 5% or more off the real picture. It is thus 83% likely your poll will be at least 5% off.

But the likelihood of being far from 55% out of a thousand people is just absurdly remote. It simply cannot happen. If 55% of the population wants to vote for Smith, and you get a proper sample of 1,000 people, you are not going to get 77% of them voting for Jones by random chance, not if you hold that poll every day until the Sun burns out.

In fact, it is EXTREMELY likely you will get very, very close to 55%. The odds of being within two percent of Smith 55 - Jones 45 (e.g. no closer than 53-47 and no further than 57-43) is just under eighty percent. The odds of being within five percent (e.g. no closer than 50-50, no further than 60-40) is pretty much 100% - it’s like a hundredth of a percent likely you’ll miss by more than five points. You are about 95% likely to be within about three percent of the correct number, which is why they so often say “this poll is accurate within three percent nineteen times in twenty.”

The odds of being WAY off - say, ten percent off - is so small that I crashed the HTML calculator I used to come up with these numbers. If you ran polls every day for a million years it would still probably not happen.

It doesn’t matter if the population is 300 million or 3 million, because that doesn’t matter. What DOES matter is that the odds of polling one thousand people and being far from the true overall percentage are one in a zillion. You just cannot get that unlucky.

You can do a quickie thought experiment to prove this. Take a quarter out of your pocket and start flipping it. There’s a good chance that early on you’ll have a funny split - you’ll start off with two heads and eight tails or something. But after 50 flips you’ll be a lot closer to even. After 100, more even still. After 1000, you sure as hell are not going to have 600 heads or 600 tails, unless you’re cheatin’ - you will be getting close to 50-50 all the time, and by the time you hit 1000 flips, you’ll be within three percent. The odds of being near 60% heads or tails after 1000 flips are insanely small.

When polls do fail, it’s because either

  1. The population was in fact not solidly decided and many people change their minds, or

  2. The sample was badly drawn.

Here is a site that has the formulas for calculating this stuff, and a calculator that does it for you:

http://faculty.vassar.edu/lowry/VassarStats.html

[Aside]
A quick question - why do polls often have not exactly 1000 participants? You’ll see polls in the newspapers that involved 1007 or 1012 people instead of a round grand.
[/Aside]

I suspect it is because they send the pollers out to more than 1,000 people, since some won’t be available or are otherwise invalid. Since they can’t coordinate with each other, they won’t stop at exactly 1,000, and throwing away some of the answers introduces the possibility of bias.

I have my trusty Sampling Theory book at work which I saved from being discarded from a Bell Labs library. I love sampling theory - I can post the equation tomorrow if anyone is interested.

Well, is Statistics 101, certain assumptions are made. One of the more important assumptions is that an arbitrarily large population will form a “bell curve” in which most people are concentrated in the middle, with fewer and fewer people placing extremely high or extremely low. The key has always been to take a good guess at the highest point of the curve, or the “average”.

The most common tool for doing so is the so-called t-distribution. This method consists of taking a sample consisting of some fixed number of observations and using them to draw a bell curve that (hopefully) is a good approximation of the actual. Ideally, the sample would consist of the entire population (i.e a census), but the amount of effort involved is prohibitive, and it turns out that if an approximation is enough, a sample is sufficient. In fact, increasing your sample size more than necessary adds expense and difficulty without singificantly improving your result.

One of the standard calculations for a t-distribution sample size is as follows:

n = Z[sup]2[/sup]p(1-p) / e[sup]2[/sup]

where:[ul][li]n is the sample size[/li][li]Z is the critical value from the normal distribution[/li][li]p is the proportion of success[/li][li]e is the sampling error permitted[/ul][/li]
For more detailed explanations:
[ul][li]Z: Statisticians have tables, much like logarithms, where they can look up critical values. In this case, you have to decide how “confident” you want to be in your result. A typical value is 95%, meaning you believe that the average value of your t-distribution will be, 95% of the time, close to the actual value. You can go for stronger confidence, say 99%, but this increases the required size of your sample. On the appropriate table, the critical value for 95% is 1.96. We will be 95% confident that our t-distribution average will be plus or minus 1.96 standard deviations of the actual average. The standard deviation of a population is a measure of how much varaition we can expect. If every member of a population had a value of 5, for example, the standard deviation would be zero. If individuals have values ranging from 0 to 10, the standard deviation would be larger.[/li][li]p: For the purposes of the question at hand, the goal is to determine, say, what percentage of the population will be voting for Bush in November. We don’t actually know the proportion, so we must make a guess. It is critical that we not underestimate the proportion, but being too generous increases our sample size more than may be necessary. The safest move is to set p at 50% (0.5), but if we were confident (based on previous surveys and educated guesses) that Bush would get 60%, we would use that as p and the end result would be lower. If we believed he would get 45%, we could set that as p. For this example, I’ll go with 50% since this roughly matches other polling data and the 2000 results, more-or-less.[/li][li]e: since we know our sample isn’t going to paint a perfect picture of the actual population, we have to decide how much “error” is acceptable. Let’s say we want our approximation to be plus or minus 1% (0.01) of the actual value. Naturally, the smaller we want the error to be, the larger our overall sample size will be.[/ul][/li]
Plugging in these values, we get:

n = 1.96[sup]2[/sup] * 0.5(1-0.5) / 0.01[sup]2[/sup]

= 3.8416 * 0.25 / 0.0001

= 9604

This is a pretty large sample, too large to be practical. As a result, we’ll allow a little more “error”, and be satisfied with plus or minus 2.5%:

n = 1.96[sup]2[/sup] * 0.5(1-0.5) / 0.025[sup]2[/sup]

= 1536.64, or 1537. This is more managable.

The end result is this: If we sample 1537 random people, we can be confident that our result will be with 2.5% of the real value, 95% of the time (i.e. 19 times out of 20). From my observation, these are the typical figures for surveys published in magazines and newspapers. If a poll doesn’t state the error and the confidence level, critical information is missing.

We can find ways to tweak the values, and be less fussy. We could decide:
[ul][li]9 time out of 10 is acceptable (a confidence interval of 90%). This would drop the sample size in the second example from 1537 to 1083[/li][li]A 3% error is okay (reducing 1537 to 1068)[/li][li]We estimate p as something other than 50%. If p was 40%, for example, the result of p(1-p) would be 0.4*0.6 = 0.24. In the second example, this would drop our sample size from 1537 to 1476[/ul][/li]
It all comes down to how much time and energy you have for gathering samples and how accurate your want your results to be. These decisions are arbitrary, but the math is solid.

On reflection, the above math isn’t really reflective of a t-distribution. The t-distibution uses tables for samples smaller than 150 or so (when the sample size is small, the t-distribution tables contain a “correction” factor which ends up increasing the sample size but gives better results) . Anything higher than 150, though, and the correction factor becomes insignificant. The results are indistinguishable from the “normal” distribution and that’s when the Z critical values are used.

Surely it matters WHERE you take the poll?I’d have thought a poll of 1,000 people in a strongly Republican area would produce a very different result to a poll in a Democrat state or to one in a ‘floating’ state?

Of course it does. That’s where the idea of a representative sample comes from. You wouldn’t expect an election poll taken at either of the national conventions to be accurate, would you? Neither do statisticians. A big part of statistics is learning how to construct representative, random samples.

If I want to find out how voters from Illinois feel about the upcoming election, and I decide to poll 1,000 people living in Chicago and no one from downstate, that will probably not be an accurate measure of how the people from the state of Illinois feel. Likewise if I were to poll 1,000 people from downstate and no one from Chicago.

The point is that you first must determine what, exactly, you’re trying to find out. If you’re wanting to know what the people of Illinois feel about the upcoming election, you’d need to have a representational mix of people from the entire state of Illinois in your sample.

There are several ways of selecting samples. Using my example, you could use simple random sampling, in which you would, say, instruct a computer to randomly generate 1,000 social security numbers originating in Illinois (and then generate more as needed to replace those who have moved out of state, who are too young to vote, etc.). This would (assuming your random number generator is sound) generate a purely random sample of citizens living in Illinois who have a social security number.

Another method is called “cluster sampling.” In this method, again using my example, you would divide the state up into parts. You could use counties. You would then take a random sample of those counties (by, say, assigning each one a number and generating ten or fifteen numbers), and in the counties you selected, you would do your survey. The disadvantage to this is that the distribution of counties may not be representative of the distribution of people. If there are 5 heavily populated counties and 95 rural counties, odds are I would randomly pick few if any heavily populated counties. Thus the sample would not be representative.

Yet another method of sampling is “stratified random sampling.” In this method, you again attempt to break up the population into meaningful bits which would likely have different opinions about whatever it is you’re trying to find out. Perhaps for my example we could split Illinois into low-income, medium-income, and high-income groups. In each of these groups we could do a simple random sample, and that way we could be sure that each group is equally represented. Aha! You say, “What if each group is not equal in the overall population?” Then you should proportion your sample to reflect that. If Illinois has a 20% high-income, 50% mid-income, and 30% low-income population, your sample should reflect that. Randomly select 200 people from the high-income group, 500 from the mid-income group, and 300 from the low-income group. In this way, each of those three groups would be accurately represented within your sample.

Finally, there is “Multistage sampling.” This is exactly what it sounds like: You combine the previous methods to refine your sample further.

For example: Determine whether each county in Illinois is high-population, medium-population, or low-population (stratified sampling). Randomly select 20 counties, in proportion (so if 3/5 of Illinois counties are medium-population, however you define that, 12 counties should be medium-population, etc.). In each of these counties, determine income levels and randomly sample 50 people from each county, in proportion with their income (stratified sampling again). This method of sampling would ensure that your final sample was proportionally distributed among heavily-populated and sparsely-populated counties, and that within those counties, your sample was proportionally distributed across income levels. Basically, people from both dense and sparse areas are fairly represented, as are people of various income levels. That’s an example of multistage sampling using stratified sampling twice. There are other ways of doing it.

As I said, a big part of statistics is learning to construct representative samples. This is a bit simplified (and possibly wrong on some details, as I’m not a statistician, I’ve just studied it), but the general idea is there.

To sum up, yes, it matters a GREAT DEAL where you sample from, which is why great efforts are undertaken to ensure representative, random sampling.

Also, the fact that political polls are generally voulantary makes the statisticians job a whole lot harder. Different segments of the population are likely to reject polling at different rates. There was the famous case of a presidential election a long time ago where all the newspapers were predicting Candidate A who looked like a shoo in. In the end, Candidate A lost because all the polls were done via the telephone and a significant number of poorer people did not own a telephone which skewed the sample.

Ummm, no. You can’t use a bell curve when computing confidence intervals for things like presidential polling. You need to use a binomial distribution, or more accurately a distribution of n classes. Cochran[1] p. 60 notes that you can do this by computing confidence intervals when the n classes are split into 2 - for instance Bush vs Nader and Kerry, neglecting don’t cares and don’t knows for the moment.

The equation for confidence interval (Cochran p. 57) is (how come no equation formatting option?)

p +/- [ t sqrt(1-f) sqrt(pq/(n-1)) + 1/(2n)

n is the sample, N is the population, f is n/N, p is the percentage of observations in the smallest class, q = (1-p) and t is the normal deviate corresponding to the confidence probability, 1.96 for 95%.

You see that for large N n/N goes to 1, so sqrt (1-f) goes to 1 - so the big modifier is the 1/(2n) term, which is small for n > 1000.

This of course assumes a “fair” sample,.

[1] Cochran, William G., Sampling Techniques, John Wiley & Sons, 1963.

Except, it can happen. That’s why you have the “We’re 95% confident that…”.

I’m uncertain which election Shalmanese is referring to, but the most famous example of the polls, press and pundits getting it wrong is Dewey Defeats Truman.

Shalmanese was referring to the Literary Digest’s prediction that Franklin D. Roosevelt would lose the 1936 presidential election to Alf Landon. One of the worst polling mistakes of all time, caused, as he said, but a horribly biased sample.

Ed

Not really, and not even according to the very page you cited.

The Chicago Tribune was a stridently (this being GQ we’ll leave it at that) Republican paper, and was backing Dewey to the hilt. The famed headline was an aberration even at the time.

Once probability sampling started, it has been extremely rare for the polls to not accurately predict the winner, state by state, office by office, except in the most exceptionally close races with swing voters making up their minds at they enter the voting booth.

Not entirely true, as may be seen by examining an extreme case. If I have a population of 1000, and I survey a random sample of 1000 individuals (chosen without replacement), then I can be certain that my results have no margin of error whatsoever. However, if I have a population of a million, and I sample 1000, then I will have a nonzero margin of error. So clearly, the margin of error is not entirely independant of the population size.

Incidentally, RickJay’s coin-flipping example is correct, but it’s worth noting that although the ratio of heads to tails will approach 1 with many flips, the difference between the numbers of heads and tails will not approach zero. That is to say, the difference between the numbers of heads and tails can be expected to grow with time, but the total number of flips grows much faster, so that the relative difference becomes insignificant.

If you have an election where only one of two outcomes is possible, you could poll just one person. He/She would have a 50% chance of being right. It doesn’t matter how large the overall voting population is, the margin of error is still + or - 50%.

Isn’t one of the problems in the last presidential election the fact that the difference in votes for the two major candidates fell within the margin of error? An election is really just a poll of a very large sample. Regardless of how we think our elections are air-tight, there will always be errors. If you do the same think 100 million times, a certain percentage will be done wrong. There is no way to get it right 100 million times. (The pop of the US may be around 300,000,000, but not all of those millions are eligible to vote – children, non-citizens, felons, etc.)

Another way polls can be wrong is if they affect the outcome. That is, what if a poll predicts that Mr. Smith will win over Mr. Jones. The margin projected is 75-25 in favor of Smith. Therefore, armed with this knowledge, a large number of Smith voters decide to stay home.

I would think that in a country spanning several time zones, it would be very hard to motivate voters the farather west you go. The election will truly be decided before Californians even get home from work. Our citizens in Hawaii and Alaska must really feel excluded around election time.

Actually, for large N, n/N goes to 0, which is why sqrt(1-f) goes to 1.

In any case, when a random sample is large enough, it takes on characteristics comparable to the population as a whole. To make meaningful statements with reasonable accuracy, the minimum sample taken from an arbitrarily large population is over a thousand, but not much more.