Statistical Validity of an IMHO Poll

Say I make a wildly popular poll in IMHO that garners hundreds of replies.

What are the deficiencies of this polling method that make it “unscientific” or statistically invalid?

I know we have a non-random group of people, so the poll is no good for a broader generalization (“68% of people on the internet do not support the Bush administration’s policies in Iraq!”) How big a deal is the self-selection of replies to an innocuous question like “Are you wearing shoes right now?”

No one knows, and that’s why it’s a problem.

The only valid scientific polls are based on random samples from a well-defined population. The probabilities of selection for the respondents are used to produce an estimate that may be used to cast inference about the source population.

Self-selecting polls of the type in IMHO may arguably refer to some sort of population of paying SDMB members and guests, as well as undetected sock-puppets, but what is not known is the probability of selection for the respondents.

Bias is problematic in this case precisely because it is unmeasurable.

I’d say it’s only about 8% valid.

Another problem with Polls in IMHO is that responses are seldom independent. In this context, independent means “not influenced by the responses of other people”

In the case of a poll on who is wearing shoes, posts about how all shoes are icky, or posts like “I’m wearing the cutest little 4-inch heeled sandals in teal with ankle straps” are likely to encourage people with interested answers to contribute.

This is especially a problem with questions like “what kind of shoes are you wearing?” rather than “are you wearing shoes?” but can still be a problem with the latter type of question.

As the saying goes, “the plural of anecdote is not data.”

Cerberus has it right. You’re starting with a highly self-selected population. From that you’re reaching those self-selected members who post in IMHO. (I never do so you would never be able to include me in your poll.) From that group you get those self-selected individuals who happen to be active at the current time, and open the thread, and decide to participate and, and, and.

You cannot even extrapolate the results to the general set of IMHO posters, let alone any larger group.

The result is pure anecdotal information that, in this case, has no larger meaning.

Most anecdotal, self-selecting polls are like this. There are times in which anecdotal information is valuable: if I want to have someone pave my driveway I’m likely to talk to a few friends or neighbors for a recommendation. There are no larger scientifically valid surveys that I could turn to if I wanted to. But mass information can’t be judged by self-selected reports, however numerous. Only the best approximations of a random statistical survey can hope to be used for an approximation of current reality.

Of course this is rounded up from 7.89%.

And, as it as been earlier noted, the ability of potential respondents to view other responses can affect new responses.

And having said that, the underlying model employed in the mathematical modelling of polling data is the Bernouli/Binomial Model, which requires independently and identically distributed trials. So, the underlying idea in a Bernoulli model is that there is an event E with probability P, based on trials from a population T. Observing the sample proportion pE of event E in random samples drawn from T can allow estiamtion of the probability P.

The wrinkle in this is the typical use of sampling without repleacement, which can be dealt with mathematically, provided that a sufficiently small sampling rate is employed.

But yes, allowing the running totals of a pool to be viewed by prospective respondents can induce bias, by, in part, encouraging more people to “pile on” on a popular response, or to “rescue” a less popular response.

<Spock> 8.142973% to be precise, Captain.</Spock>

Besides what others have answered, a poll in IMHO is not anonymous, and is likely to attract only Dopers with a special interest in the subject.