Margin of error = 4.5%

Why does there have to be a margin of error? And how do they know what it is, exactly? Furthermore, if they’re able to calculate that this poll will have a small margin of error and that poll will have a large margin of error, why can’t they do whatever is necessary to insure that there will be no margin of error (or at least an insignificant one)?

As an example, I have polled the people on my immediate office, and 18 say they will vote for candidate A and 2 say they will vote for candidate B. That gives A 90% of the vote with no margin of error. Did I do something wrong?

Margin of error is based on sample size. The only way for a poll to have no margin of error would be if every single voter was polled.

It is based on sample size, and it is expensive to increase the sample size. Think how much time would it take to poll a hundred office’s like yours?

Also note that unlike most physical measurements, margin of error from sampling is not a hard and fast metric. While it is highly likely that with a sample size of x, and a total population 10,000x, the margin of error will be y%, it is possible (but unlikely) that your sample is totally unrepresentative of the general population. I am betting that if you took a random 100 people at the Gore campaign headquarters, the poles would show him far ahead of Bush.

So is the sample size is compared to the size of the general population or to the number of people who are actually going to vote? And how can they possibly know whether or not the person polled is going to vote? Seems like a few steps away from reading tea leaves.

I wonder who wants to see polls anyway. If I were a polling firm, I’d be worried that I’d lull a certain section of voters into a calm by announcing thet their candidate has a firm lead in the polls, resulting in lower voter turn-out.

I’m glad this only happens every four yeas.

Actually, it’s sample size, period. Not in relation to anything. Obviously, if your sample size is 100% of the population, there’s 0 margin of error no matter what. I believe, that a poll of 1100 RANDOMLY SELECTED people yields a margin of error of +/- 3%. That’s what most polls shoot for. In order to double the accuracy of the poll (i.e. +/- 1.5%) you need to quadruple the sample size. These margins of error are accurate if your population is 150,000 or 150,000,000. It doesn’t matter.

Jman

I had a tough time with this concept before I realized that margin of error has nothing to do with whatever it is you are trying to find out.

Read that again - I had to.

What the margin of error in an opinion poll tells you is that, if we ask n people a question, we will be within x% of the answer that everyone in the population would give, if we could:
[ul]
[li]ask them all,[/li][li]get the truth from every one of them and[/li][li]record that answer properly.[/li][/ul]

When people do opinion research, margin of error is only one of the major factors that determine sample size. The other major factor is the confidence interval (Z), a section of the normal distribution which depends on the level of confidence desired.

So, to use your office poll example, you might say:

“How many people do I have to ask to be 95% sure that the results will be within 2.5% of the total population’s answer?”

This means that your margin of error is 95% and your confidence interval is that portion of the normal curve that has 2.5% above and below Z… 1.96. (I know that I didn’t explain that so well… I am sure someone else will one-up me.) I was an English major - I just work with statistics. :slight_smile:

Your original question has to do with a poll in a limited population - the people who work in your office:
**

You did nothing wrong… except you didn’t think “big-picure” enough. What good is it to know what the 20 people in your office think? Your methods don’t allow us to use your findings to project to a larger population!

A good rule of thumb is that any sample smaller than 30 will produce useless data. So, go hire 10 more people!

**

What you are describing are known as “non-sampling errors.” In other words, you’re right, but there isn’t anything math can do about it.

All we can do is hope that the same percentage of non-voters that exists in the population is represented in our sample.

And, in theory it is!

See why a large sample is better than a small one? As your sample grows, so do your chances of getting good data!

(Whoops! Didn’t see that post!)

Jman:
I don’t think you’re entirely accurate.

Let’s say we’re discussing what percentage of a population support a particular candidate (or are left-handed, or have big noses, etc…). Let P represent that percentage of population members and let Q be everyone else (1-P).

If p (little ‘P’) is the part of a sample (n) that support candidate A, the formula for confidence interval is:

p+/- (Z * sqrt(PQ/n))

Got that? :slight_smile:

Of course, we don’t really know what P is - that’s what we are trying to determine! Statistics tell us that we can best-guess it by setting P and Q equal to each other… but that’s another story.

The point is that n (sample size) is in the denominator. This obviously means that, as n gets larger, your margin of error (the part after the ±) gets smaller.

So, your last point is not true.

Just to add a little more detail, the plus or minus 4.5% forms what is called a confidence interval. That means that some percentage of the time the general population will follow the sample within that interval. So if 45% of the sample say they favor Gore, you can guess that in the general population 40.5% to 49.5% favor him. Since you’re only talking to a sample though, this is only a guess. The larger the sample, the more confident you are that the guess is right. (The sample must be truly random for any of this to work.)

Confidence is expressed mathematically as a percentage. That 4.5% figure might form a 90% confidence interval or 95% or 98%. Without the confidence figure (which for some reason they never report), the margin of error is basically meaningless. It’s like saying it’s 25 degrees today without saying if it’s Farenheit or Celsius.

See also a lengthy (and somewhat heated) discussion of these issues here

BTW sdimbert, you are not actually disagreeing with Jman, as far as I can see. Seems to me that he only claimed that size of population is irrelevent - not that sample size is irrelevent.

The “margin of error” is almost a scam, because it only measures one source of error. There are at least three types of polling error:

  1. Sampling error, which is well-described in some of the posts above.

  2. Error in the selection of the sample.

E.g., in sampling for an upcoming election, if you use All Registered Voters, you get more Democrats. (This problem affected a poll taken by a major news magazine just before Labor Day, which showed Al Gore with a sudden 10 point lead.) If you take Likely Voters, you can get a more accurate prediction. But, how do you define Likely Voters? Different pollsters have different criteria.

Also, consider the non-response rate. A significant percentage of those sampled do not respond. You don’t know how they’ll vote. For all we know, those who do respond may be more likely to be one party or the other. Pollstgers make efforts to adjust, but these involve judgment.

  1. Finally, what percentage of respondents lie?

IMHO, #3 is probably small, but #2 is quite large, and #1 is in the middle. The pollsters’ “Margin of Error” refers only to #2, so it is a deceptive measure of accuracy.

Note that Sampling Error can always be reduced by using a larger sample, but the other two errors stay just as large. This may be one reason that samples are so small (often about 1000). Why spend a fortune reducing Error #1, when Error #2 is already much larger?

By comparison, consider the Exit Polls. These have the same problems with errors #1 and #3, but less from #2. They are highly accurate. The accuracy of Exit Polls is evidence that the biggest error is #2. So, a “margin of error” based on #1 only is pretty useless.

opus,

For a full-treatment of these issues that doesn’t get too bogged down in complex mathematics, I recomned “the little red book:” Darrel Huff’s How to Lie With Statistics.

It’s fantastic.

I’m sorry Engineer Don. :o

You posted ‘I am betting that if you took a random 100 people at the Gore campaign headquarters, the poles would show him far ahead of Bush.’

So Gore is popular in Eastern Europe!

I’m 99% (plus / minus 1%) sure most people won’t find this funny…

The Margin of Error is, I believe, what scientists and mathematicians call the Variance. It is, in the limit as the population approachexin infinity, the Standard Deviation of a population. You can find these terms and their formulas in any good book on statistics. (A good math-based book is Philip Bevington’s “Data Reduction and Error Analysis in the Physical Sciences”, but there are a LOT of texts, at varying levels of complexity available. There’s the wonderfully named “How t Lie with Statistics”, for instance.)

The Variance is a measure of how much the individual elements of a population vary from the average value. If your distribution of measurements forms a “Bell-Shaped curve”, the classic distribution (called a Gaussian by scientists and mathematicians), then 2/3 of the results will be within one Standard Deviation (= "Margin of Error) of the avergae (the actual width is plus or minus a Margin of Error, so the total width being considered is two Margins of Error). 95% of the results fall within two Margins of Error, and 99% within three margins of Error, a total with of six Margins of Error. The symbol used for the Margin of Error is the Greek letter “sigma” (The Green “S”), so virtually all results fall within a band six Margins of Error centered on the average result, called “Six Sigma” in the jargon of statisticians.

Here is an interesting article from the New York Times on a related issue.

CalMeacham: “The Margin of Error is, I believe, what scientists and mathematicians call the Variance.”

I’m afraid you are confusing two types of measurements: measurements of population means (e.g., height of 11th grade students) and population proportions (e.g., proportion with driver’s licenses). Variance and standard deviation apply to both population and sample means but what is being discussed in this post are population and sample proportions.

Greg Charles is correct, the term “margin of error” is meaningless without quantification. They are presumably using it to mean the confidence limits but they don’t tell you if them mean the 95% confidence limits, 90%, or whatever.

Just for fun (aren’t some of us weird?) I ran a few numbers to illustrate some points:

If the true proportion is around 50%, a sample of around 1100 will give you a 95% confidence interval of + or - 4%. Here are the sample sizes needed for various population sizes:
sample of 1067 for population of 10,000,000
sample of 1066 for population of 1,000,000
sample of 1056 for population of 100,000
sample of 964 for population of 10,000
sample of 516 for population of 1,000

For a 95% confidence interval of + or - 5%, a sample of only 384 will do for a population of 1,000,000.
For a 95% confidence interval of + or - 10%, a sample of only 96 will do for a population of 1,000,000.

The sample size and confidence limits are smaller if the proportion is not close to 50%. For example, if you wanted to estimate Pat Buchanan’s vote and you found that just 4 people out of a sample of 124 supported him, you could say that the proporion of people supporting him was 0 to 6% with a 95% confidence interval.

No, my last point is true…that’s why I said that in order to double the accuracy, you need to quadruple the sample size. Note my last point says it’s accurate whether the POPULATION is 150,000 or 150,000,000 not the sample size. Obviously, increasing the sample size would increase accuracy, as I noted earlier in my post.

Jman

CalMeacham wrote:

Not exactly. There’s an easily-grasped way to calculate the variance (which is related to the standard deviation, but not the same). You know how, in order to get the average, you add up all the measurements and divide by the number of them?

To get the variance, you first find the mean, then you go back, and for each value you take the difference between it and the mean, and square it. Then add up all these squared differences, and divide by the number. That’s the variance. It gives you an idea of how all the values differ from the mean, because the differences are squared, making them all positive, and making the greater differences matter more.

But the variance isn’t very good for getting an intuitive feel, because its units are the square of the base units. If you take the square root of the variance, that’s called the standard deviation. With normal, bell-shaped distributions, around 70% of the values will be within one standard deviation of the mean.

So far, this has all been about standard deviation and variance of entire populations. If you are just taking a sample, and want to know how close you can expect the real population mean to be to your sample mean, then you can use the same kind of calculations, except for the little trick of dividing by the number of samples minus one, instead of the total number of samples. It turns out that this is more accurate, although for large samples there’s negligible difference.

Also, december missed an important source of polling inaccuracy in her list. You need to consider how the question was asked, and what questions might have been asked just before. For example, if you first ask “Did you know that excerpts from Al Gore’s book Earth In The Balance are virtually indistinguishable from excerpts from the Unabomber’s Manifesto?,” and then follow that up with “Which candidate will you vote for?,” then you’ll get different results that you would if you just asked the second question by itself.