Statistical Validity of an IMHO Poll

MysteryFellow63427 · August 12, 2006, 8:03pm

Say I make a wildly popular poll in IMHO that garners hundreds of replies.

What are the deficiencies of this polling method that make it “unscientific” or statistically invalid?

I know we have a non-random group of people, so the poll is no good for a broader generalization (“68% of people on the internet do not support the Bush administration’s policies in Iraq!”) How big a deal is the self-selection of replies to an innocuous question like “Are you wearing shoes right now?”

ultrafilter · August 12, 2006, 8:05pm

No one knows, and that’s why it’s a problem.

cerberus · August 12, 2006, 8:18pm

The only valid scientific polls are based on random samples from a well-defined population. The probabilities of selection for the respondents are used to produce an estimate that may be used to cast inference about the source population.

Self-selecting polls of the type in IMHO may arguably refer to some sort of population of paying SDMB members and guests, as well as undetected sock-puppets, but what is not known is the probability of selection for the respondents.

Bias is problematic in this case precisely because it is unmeasurable.

Rigamarole · August 12, 2006, 9:01pm

I’d say it’s only about 8% valid.

Eureka · August 12, 2006, 9:12pm

Another problem with Polls in IMHO is that responses are seldom independent. In this context, independent means “not influenced by the responses of other people”

In the case of a poll on who is wearing shoes, posts about how all shoes are icky, or posts like “I’m wearing the cutest little 4-inch heeled sandals in teal with ankle straps” are likely to encourage people with interested answers to contribute.

This is especially a problem with questions like “what kind of shoes are you wearing?” rather than “are you wearing shoes?” but can still be a problem with the latter type of question.

Exapno_Mapcase · August 12, 2006, 9:33pm

As the saying goes, “the plural of anecdote is not data.”

Cerberus has it right. You’re starting with a highly self-selected population. From that you’re reaching those self-selected members who post in IMHO. (I never do so you would never be able to include me in your poll.) From that group you get those self-selected individuals who happen to be active at the current time, and open the thread, and decide to participate and, and, and.

You cannot even extrapolate the results to the general set of IMHO posters, let alone any larger group.

The result is pure anecdotal information that, in this case, has no larger meaning.

Most anecdotal, self-selecting polls are like this. There are times in which anecdotal information is valuable: if I want to have someone pave my driveway I’m likely to talk to a few friends or neighbors for a recommendation. There are no larger scientifically valid surveys that I could turn to if I wanted to. But mass information can’t be judged by self-selected reports, however numerous. Only the best approximations of a random statistical survey can hope to be used for an approximation of current reality.

Zeldar · August 12, 2006, 9:35pm

Of course this is rounded up from 7.89%.

cerberus · August 12, 2006, 11:50pm

And, as it as been earlier noted, the ability of potential respondents to view other responses can affect new responses.

And having said that, the underlying model employed in the mathematical modelling of polling data is the Bernouli/Binomial Model, which requires independently and identically distributed trials. So, the underlying idea in a Bernoulli model is that there is an event E with probability P, based on trials from a population T. Observing the sample proportion pE of event E in random samples drawn from T can allow estiamtion of the probability P.

The wrinkle in this is the typical use of sampling without repleacement, which can be dealt with mathematically, provided that a sufficiently small sampling rate is employed.

But yes, allowing the running totals of a pool to be viewed by prospective respondents can induce bias, by, in part, encouraging more people to “pile on” on a popular response, or to “rescue” a less popular response.

Voyager · August 13, 2006, 4:56am

<Spock> 8.142973% to be precise, Captain.</Spock>

Besides what others have answered, a poll in IMHO is not anonymous, and is likely to attract only Dopers with a special interest in the subject.

Topic		Replies	Views
Under what conditions would an internet poll be statistically sound? Factual Questions	13	894	March 13, 2004
Need help interpreting polls Factual Questions	10	1062	June 5, 2007
Suppose a Poll of Dopers on X -- how many before you trust the results? In My Humble Opinion	7	920	March 11, 2007
How to Make a Point Regarding Statistics/Surveys Factual Questions	16	2994	August 20, 2014
How accurate are internet polls? In My Humble Opinion	10	1284	November 16, 2009

Statistical Validity of an IMHO Poll

Related topics