How is margin of error calculated for polls?

Law of large numbers

My doubt has less to do with the statistics, and more to do with our ability to word a survey correctly and to truly find a random sample.

Do I believe that a flipped coin has a 50% chance of coming up heads, even if the last 10 flips were also heads? Yes, as long as the coin is truly unbiased.

Do I believe that I can pull 1000 tokens out of a bag of 150 million mixed tokens, and the distribution of colors will be close to representive of the whole? Yep.

Do I believe that I can walk into the local shopping mall, survey 1000 people and get a good idea of who the next dog catcher is? Not so much.

But then of course the results of polls that are published are not the raw outcomes of the polls, but rather are weighted to match the population more accurately. This weighting is based on information that is available for all of the population (income levels, age, gender, ethnicity/race, etc.). Of course, it is only an assumption that these factors in fact contribute to explaining political choice - if they don’t, you’re shit out of luck. In other words, if it so happens that all blond people vote A and all non-blonds vote B, and you have no data on the distribution of blonds over the population, then you can weight all you want but it is not going to make it more accurate.

An additional problem is that people saying they are going to vote might not actually do so in the end, and the probability that this is going to happen is not necessarily equal for all groups in society. Unfortunately, non-voters are also notoriously known as non-participators in electoral research, so that political scientists and pollsters know fairly little about them.

Another thing that we know very little about is the actual weighting formula used by pollsters. This is typically a very well guarded secret because if it were revealed, everyone could just pick up the phone, start calling randomly selected people (*random only to the extent that they do all have a phone and have to agree to pick it up and answer questions) and publish results. The weighting formula is the special ingredient that makes the results that pollsters publish stand out from just any poll, ostensibly (hopefully) because it makes the results more reliable.

That’s why no poll in the world is based on data that are collected in this way. It seems that your mistrust of polls is the result of a lack of understanding of how pollsters actually work.

There’s a margin of error calculator online here– I can’t vouch for the accuracy, just something that came up in a google search. It doesn’t explain the calculation, but it’s fun to plug in numbers and see what the result is.

But that is NOT what you want.

You don’t want “every member of the population”, but of your target population. So in most electoral polls, you are only concerned with voters, not the whole population. And you want actual voters, not ‘claimed’ voters. So you check the voting records, and see if they voted in the past, and don’t believe what they say.

Which is one of the reasons polls of ‘approval ratings’ are often far off – they tend to use the whole adult population, instead of just voters.

If you are doing commercial polling, like for a new laundry detergent, your target leans more toward women & housewives. If it’s a new beer ad, you want a target of men, especially younger men, and middle class or lower economically.

I was using “population” in the statistical sense, which is what you meant by “target population”.

But, how do you get to that population can be a problem. For example, how can you limit your sample to only those who will vote in a particular election? Do exit polls? But that’s too late: you want to estimate the result in advance.

So, in practice, you’ll sample a larger population, e.g., those who might be eligible to vote, and discard:
(1) Those who are not eligible, e.g., because they aren’t citizens.
(2) Those who say they don’t intend to vote.
(3) Those who intend to vote, but who haven’t decided who to vote for.
(4) Those who don’t answer the question, either because you can’t reach them or because they refuse to answer.
And that might mean you need a sample of 2,000 from a larger population to get a real sample of 1,000 from your target population – and the people in category (4) undoubtedly introduce a bias into your sample, but you hope the bias is small enough not to matter.

We have strayed very far from the point of the OP. The point is that margin of error is a well-supported statistical concept with a very definite meaning.

Like any kind of measurement or experiment, it rests on assumptions. Here the assumptions are about good experimental design, where the statistical process is arranged to give a meaningful answer. That includes sampling randomly from the target population, or as close to it as you can arrange.