Statiiticians: Can We Replace general Elections With Sampling?

The cost and time involved in the US presidential elections is enormous…and we are subjected to the candidates traipsing around the country, giving endless, meaninless speeches, consuming forests of trees, and generlly pandering to everybody.
Since the science of statistics is firmly established, I don’t see why we couldn’t replace the current election with a judiciously sampled poll. This wouldhave many advantages:
-it would be fast…perhaps a sample of 1 million voters would be enough
-it would be accurate: the laws of statistics permit an extremely LOW error rate…probably BELOW the errors caused by hanging chads, misread ballots, etc.
-it would be honest: no more dead voters in places like Boston, or Cook County (in a recent election in Boston, more than 100 people who died before 1925 were found to be still on the voting rolls).
For you statistics experts: what would be the maximum error expected in a presidential election, if conducted by a sample of 1 million? :confused:

A survey by sample, no matter how careful your sample population is defined demographically, does not take individual opinion into account well enough. I’ve seen a lot of samples where I was a part of the target population though not one of the persons sampled (only twice have I ever been a part of the sample on any issue, and both times were telephone polltakers) – and whenever such a sample is created, I find that it does not adequately reflect the POV of myself or people in the target population whose opinions I know.

In addition, Ralph, expensiveness and irregularities are not grounds for barring the single function that keeps us a democratic republic. And, of course, general elections do not just choose the President, but also a plethora of state and local officials. Do you think that it’d be cost effective for a village of 58 registered voters to conduct a survey to choose their mayor and city council, as opposed to having them vote?

Unless the indentities of the members of the sample were public knowledge in advance, how would it save in campaign costs?

You would need a random sample of the population, so you would still need to keep electoral rolls of all the possible voters (you’d need those rolls for state, county and local elections as well, of course).

And the sample would have to be random – partisan officals should not have a chance of tilting the sample towards votrers more likely to vote their way.

I would also say that this question (while interesting) sounds more like a Great Debates question.

How would you select such a sample? We’d need assurance that the selection process was uncorrupted and above reproach. Then, how would you get the “votes” of those you selected? Wouldn’t this be subject to many of the same problems as accurately collecting and counting the votes in a general election (misread ballots, etc.)—and perhaps some additional ones involved in making sure it was only the individuals sampled who voted? For instance, if you can’t keep dead voters’ votes from being counted in an election, how can you be sure to keep them out of your sample? Plus I don’t see how this would affect the amount of time candidates spend campaigning, “traipsing around the country,” etc.

If your sample is randomly selected from among all citizens, or all registered voters, it will reflect the preferences of this group, which are NOT necessarily the same as what you get in an election: the preferences of the subset who bothered to go out and vote. (It is, of course, a matter of debate whether this would be a good or a bad thing.)

Last but not least, there are the psychological objections to such a system. People would say, “My vote wasn’t counted—I didn’t get any say in selecting my leaders!” In fact, the way we have now of choosing the president via the electoral college is kinda sorta like the sampling process you propose, and plenty of people have objections to that.

To answer the GQ part of your question, let us simplify by assuming the election is a yes/no choice, with no third party candidates. The poll respondent must choose A or B. Assume the “true” chance of the voter choosing A is “p”—that is, of the full electorate, p percent prefer A. If you poll “n” people, the number of respondents choosing A will form a Bernoulli distribution with expected value np and variance np(1-p).

In your example, you are proposing to poll a million people, and we can assume that p is fairly close to 0.5—that is, the electorate is roughly evenly split. The variance of your output distribution will be 250,000 and the standard deviation will be 500, or 0.05% of one million.

For large samples, the Bernoulli distribution approaches the normal. For a normal distribution, 95% of the time the output will be within 1.96 standard deviations of the “true” value. So if your poll shows, for example, that 49% of respondents favor A, we can be 95% sure that, of the full electorate, between 48.9% and 49.1% prefer A.

You could cut the size of the poll down to a thousand, and the confidence interval would widen only to 6%.

The remainder of your post is more appropriate for GD, but I will point out a few of the obvious flaws: the impossibility of constructing an unbiased sample, the impossibility of assuring impartiality among the poll takers, issues of polling procedure (how often do you try if somebody isn’t home?), the fact that the candidates would still have to campaign (is this a bad thing?), and the fact that voting in an election requires that you not only hold an opinion but be motivated enough to do something about it.

I don’t see how that could be possible – the sample would have to be large enough that measuring the opinions of the sample members would run into the same logistical issues as conducting a general election (and would therefore have the same errors in addition to the errors introduced by the sampling process).

Ever read “Franchise” by Isaac Asmov? :wink:

The main objection is that it’s a basic right, the defining difference between democracies and tyrannies, kingdoms, oligarchies, etc. Many have died for this right alone.

Heck, there were lots of objections when the Census Bureau proposed going to sampling. There’s no chance sampling will replace voting.

I don’t know the size of the target groups you refer to, but I respectfully submit that you and people you know (or I and people I know, for that matter) are not a random sample and might not be as representative as you think. In fact, I sometimes hear objections from the losing side in an election that their “voices weren’t heard”, which is similar to what you contend.

Of course there can be problems with sampling. Elections aren’t totally foolproof, either.

  1. Sample selection would be impossible as explained above.

  2. In a significantly close enough election (such as the last one), you’d need the entire population in order to select the winner. The confidence intervals on a sample can never be significantly tight enough to catch an election like Bush/Gore. Do you really want to be only 95% confident in your results for an election. You can never set your power to 100% - that means a complete population count.

[QUOTE=ralph124c]
-it would be fast…perhaps a sample of 1 million voters would be enough/QUOTE]Where are you going to get these voters from?

Are you advocating doing away with the electoral college distribution and simply going for a “popular vote” type of approach to sampling? Or do you advocate taking 50 separate samplings (one for each state) and then allocating electoral college votes accordingly?

Also, as others have suggested, you system doesn’t address the issue raised by the American system of voluntary voting. Would the people sampled by your system only be those who expressed an intention to vote, or would you just do random sampling among the whole population. In a country like Australia, where voting is compulsory, sampling might not make much difference. But in a place like the US, where who turns up to the polls is actually more important than which candidate is the most popular, you might run into problems.

And how do you prove that the person being sampled is even eligible to vote? People don’t generally carry their registration cards or US passports around with them. I’ve only been in the US for a few years, and i’m not a citizen, but i can pull of a pretty passable American accent. And plenty of American citizens don’t have American accents anyway. What happens if one of your samplers stops me (or some other non-citizen) on the street and asks me for my opinion?

The Census bureau does use a form of sampling called imputation. When the 2000 census data came out the state of NC barely beat out Utah for the right to add a Congressman. (NC added 1 and Utah did not) Utah sued over the imputation issue and lost the case. Then Utah turned around and tried to sue over the issue of not counting Mormon missionaries who were living in foreign countries - and they lost on that issue too. At that point they gave up.

I take your point, rowrrbazzle – but it was not the “idle bitch” that “my voice ain’t heard” – rather, it was a practical statement that sampling on political issues is far more difficult than it might at first seem. I’m thinking of my former home town, with a population of 25,000 and a probable maximum of 12,000 eligible voters (many residents are military stationed at the nearby army base who have kept their legal residence at their home town elsewhere, and there is also a disproportionate number of under-18s, for a variety of reasons). “I and people I know (well)” would constitute a base of around 250, or just over 2%. I am fairly certain that none of that 250 has been sampled in polls, and that the results of such polls generally does not show the particular perspective of that 250, which is not a particularly odd subset of the population. In other words, that we (the 250) might be 80:20 in favor of a proposition would not be a fair sample of the general population, which might be 70:30 against it – but when the sample of the general population shows something like 95:5 against, that small sample group is obviously not being adequately represented in the main population, because it’s not an extreme set, and fair elements of the general population appear by other, anecdotal means (casual conversation, letters to the editor, etc.) to be divided in a loose similarity to the 250.

All of which is a long way of saying that the sampling methods are skewed, because, while that group I indicated may not be particularly representative, neither is it an outlying set at the extreme, and therefore some approximation of its views should show up in polls.

In many situations one can get equal or more accurate results by sampling a portion votes cast than by trying to count every vote. That is, in many cases, human or machine error will be greater than sampling error.

Any sample however would have to come from the population being sampled and that’s where it would get really sticky. When we vote, we vote for congress, for local officials and president all on the same ballot . All congressional votes would have to be in the congressional sample, all local votes in the local sample and presidential votes in a particular state would have to be included within that pool.

This is simply impossible based upon our present and perhaps any future system.

Asimov’s Franchise is a pale shadow of Borges Lottery of Babylon

Even assuming perfectly random sampling, unless the sample size is a significant fraction of the population, the BEST we can do at a 95% confidence interval is + or - 3%. Given the number of swing states that hinge upon far less than this percentage, sampling cannot give us enough certainty to be able to conduct an election. Furthermore, knowledge of the outcome of the sample may affect the outcome of a vote if the sample is deemed to not be accurate enough so I don’t see a way around it.

Irrespective of merits it’ll never fly.

However, we have elections by sample now. It is a self-selected sample and anything but random. In many off-year elections only a relatively small fraction of eligible voters bather to vote. The percentage of eligible voters who voted in the 1996 presidential election was 62% according to this site. And in 199222% of eligible voters on a nationwide average elected the House member.

And as we saw from the last election, vote counting is anything but an exact procedure. I’m pretty sure that Florida was more typical of the actual process than it was an aberration. The thing that magnified it eas that the Florida election was exceedingly close, and crucial to the outcome.

I suspect a properly selected stratified sample might very well reflect the views of the country better than the self-selection that is the present mode. Of course one can always argue that if your don’t care enough to vote you shouldn’t be represented.

But - merits be damned! This particular pig will never fly.

Not usually true.

Take for example the 2000 vote in Missouri. Bush beat Gore by about 3.5% with total votes cast of roughly 2,359,000. Bush’s percentage was 50.42.

If we had taken a random sample of 16,524 votes(about .7%) 99% of the time we can expect our sample results would have been between 49.42 and 51.42. The vast majority of the time Gore’s upper limit would have been less than 49.42.

While this isn’t exactly what would be needed if we were to sample ballots to determine winners, it will give you an idea of how sample size relates:

http://www.surveysystem.com/sscalc.htm

In 2000, it’s likely sampling of a very small percentage of total vote in each of 40+ states could have accurately determined the true winner within each.

I don’t bet on things, but I’ll bet that if you have 20 million marbles, some red, some blue, some white, some green, you will come a lot closer to the true distribution by proper sampling than by trying to count them all.

There is an illusion that actually counting things gives the right answer. When the things to be counted run into the millions and the counting is done by heterogeneous and relatively untrained temporary counters this isn’t really true.

“Proper sampling” is the imprtant thing here. If I had a deep container, such as a well, and someone had dumped 5 million red marbles into there, then 5 million bluie, then 5 million white, then 5 million green, then when I started counting them, the first 4,900,000 would all be green. At that stage you might say: “We’ve taken a 24.5% sample, and they’re all one colour – doesn’t that mean there is a good change that the rest are all the same?”

So in an election, you’d have to have a sample that represented the whole population. It would have to include some voters from every precinct, proportionate to the total from each precinct. You’d want each precinct’s sample to be unbiased by every relevant criterion: not just political affiliation, but also age, gender, race, social class, etc. I think it would be harder to get an unbiased sample reflecting the whole electorate than it is to conduct the actual election.

I might be digging up this data from my admittedly rusty memory of statistics here but it was to my understanding that a sample mean p[sub]hat[/sub] would always give a 95% confidence interval of 3% (ie: alpha = 0.03) assuming that the sample size is over 100 and much smaller than the total. Thus, leading to the ubiquitous disclaimers on nearly every poll that “this poll contains a 3% margin of error.”