Under what conditions would an internet poll be statistically sound?

erislover · March 11, 2004, 5:05pm

Part of the polling process involves a random sampling, and the presumption is (for internet polling) that those polled are not a random sample, they are a subset of the sample who reads this webpage and volunteers their opinion.

But is this to suggest that an internet poll can never be sound? Yahoo, for example, sometimes pops up a survey of their services when I’m surfing my email. If this only comes to those with Yahoo email accounts (i.e.-this is the group they want to poll), is the voluntarily-polled aspect still as significant, or can they rely on their data better than MSNBC could for their “should gay marriage be legal” poll?

ultrafilter · March 11, 2004, 5:49pm

Nope. With no information about the people who chose not to respond, there’s no reason to say that they’re not different from those who did.

aahala · March 11, 2004, 5:57pm

Straw polls are worthless, actually less than worthless because some people who see the results get some impression from them.

95% of those who responded said yes. What can we say about that? Only that 95% of those who responded said yes. This also assumes, which isn’t always the case, no one voted more than once.

ftg · March 11, 2004, 6:07pm

You can’t collect data from people over the Internet in any reliable way. Period.

In addition to the non-randomness of who responds to a poll, there is also a far greater chance that people will lie.

There was a great quote from a web site admin a few years ago along the lines of “the majority of our web site visitors own their own corporation and make over a half million dollars a year.” (And yet more and more sites require people to provide info before access. It ain’t “data” folks, it’s all garbage. I don’t live in Schenectady, zip code 12345.)

“Trust” and “from the Internet” don’t go together.

wevets · March 12, 2004, 2:35am

Perhaps it could be considered statistically sound if your target population is the readership of a certain website, and if you state that you’re measuring “activity” rather than “people” or even “respodents.” Note how mealy-mouthed your results become…

In essence, ftg is correct.

RealityChuck · March 12, 2004, 3:20am

So why does the Harris Poll, among others, do it this way?

Granted, you can’t have people just visiting your web page and have a valid polls, but there are ways to have a perfectly reliable poll over the Internet. The Harris Poll, for instance, has lists of people who have signed up to receive polls. Depending on their demographic data, they will poll them. The polls are good enough for them to be selling the service to companies. It’s no different from the usual screening that any phone poll or face-to-face poll goes through.

micco · March 12, 2004, 3:56pm

I’ve set up polls on the Internet for some psychological studies. In these cases, we set up a secure site which required an authenticated login. The sample pool was defined offline and members of the pool were provided with the information required to access the poll. The polling software was designed to allow each member of the pool to complete the poll exactly once. The only difference between this application and physically handing a questionnaire to each individual is that the individuals in the pool might have allowed someone else to complete the form in their place. However, in dealing with a pool of several thousand people, the authentication provided by the Internet site was comparable to what you could expect to get from human pollsters without taking extreme measures, and it allowed the poll to be completed in a much more economical way.

ftg’s warning that this is impossible is a good rule of thumb to force you to think through your design, but it really isn’t true in all cases. You simply have to accept and deal with the specific issues raised by using the Internet, and the kind of polling your want to accomplish may or may not be possible.

muttrox · March 12, 2004, 4:53pm

You can validly administer a poll via the internet. The OP didn’t specify, but they probably didn’t mean that.

Any poll that is self-selected (you choose to answer or not), you can vote multiple times, there is no particular reason to be truthful, is targeted at the wrong population (the demographics of people who visit that particular site define a population that usually doesn’t match the general population or the population that particular question is targeted at), or any combination of those is worthless. Absolute garbage. Most internet polling has most of those issues.

erislover · March 12, 2004, 5:03pm

Thanks for the responses so far, but no, I didn’t mean the simple administering of a poll, that is, replacing a phone call with the internet or something like that.

It is the heart of the self-selection criteria that I don’t understand. Because, let’s face it, if someone calls and asks if I want to do a poll, aren’t I selecting myself? People can own multiple Yahoo email addresses, ok… but suppose that my bank, which has online banking, conducted such a poll over the internet to ask about the online banking experience… still unsound? – If so, why?

This is what I’m trying to see… under what conditions is a sound internet poll possible?

micco · March 12, 2004, 5:30pm

As do most offline polls. I’m not disagreeing with you at all, merely pointing out that most of the points raised in this thread apply to all polls whether or not they use the Internet. Designing a valid survey is hard, and interpreting results with various design problems like self-selection in mind is important.

Whether you want to rule a poll sound or unsound depends a lot on what you want to do with the data. If your Internet bank runs a poll of customers to get feedback, the statistical soundness of the sample is largely irrelevant. It’s unlikely they’re going to use their results to say something like “82.3% of users think we’re the bomb”. More likely, it’s just a method to solicit feedback and they’re not interested in the aggregate results at all. They don’t care whether the majority thought the interface was pretty, they care about the one or two respondents who said they were baffled by the options.

On the other hand, if they are trying to get statistically relevant aggregate results, then self-selection is a huge problem. Even if you do limit the possible pool to your customers, you still only get the ones who bother to respond. In most cases I’ve seen, this type of poll results in some response which is a fraction of the possible pool and you can usually justify saying that they do not represent a good sample for that very reason. But again, this problem has nothing whatsoever to do with the fact that the poll used the Internet.

viking · March 12, 2004, 5:37pm

Sure you can. In your Yahoo example, what it would require is that:
a) Yahoo is only interested in Yahoo users as the population about which they want to say something.
b) Yahoo knows who they’ve popped up the possibility of responding to the poll to

Knowing b) means that they can calculate what % of people responded to the poll. They can also then pick a small number of those who didn’t, and poll them more intrusively by phone or snail-mail, to see if non-respondents to the initial poll differ from initial respondents. Of course, if 90% of 90% still refuse to answer the followup poll, then you can’t say much. But if your follow-up poll gets a decent response, and doesn’t differ from your main poll, then the fact that you have the option of closing the pop-up window and not responding is not novel to internet polls.

If the poll is ``just there’’ for each visitor to respond to, or not, and there’s no control against multiple responses by the same person, then you can’t calculate non-response rates, and you can’t do a follow up test for non-respondent bias, so you’re pretty hooped.

And you do need to keep in mind that honesty varies by poll administration; I believe studies show that in person > phone > mail > internet, but I don’t have those studies on hand right now, so I could well be wrong about the details

John_Mace · March 12, 2004, 5:54pm

The point is not that the sampling might or might not be random, it’s that you have absolutely no way of knowing one way or the other. You might, by chance, open an internet poll and get a radnom cross section of respondents (although the probability would be extremely low). But how would you know that?

Read about Gallop polling procedures here.

erislover · March 12, 2004, 8:03pm

How interesting you link to Gallup, John. Check it out, scroll to the bottom.

Wendell_Wagner · March 13, 2004, 1:48pm

A point that I don’t think anyone else has made yet is that people who use the Internet are not a random selection of the entire population. Even if you polled every adutl American who ever used the Internet and asked them who they were going to vote for in an upcoming election and they were honest, etc., this would not tell you who would win the election. The adult Americans who use the Internet are not a random selection of all adult Americans. Lots of people never or rarely use the Internet.

Topic		Replies	Views
Statistical Validity of an IMHO Poll Factual Questions	8	1032	August 13, 2006
How accurate are internet polls? In My Humble Opinion	10	1269	November 16, 2009
do most internet users lean hard to the right? Factual Questions	2	879	December 18, 2000
Where can I request participation for an academic survey? Factual Questions	3	809	February 9, 2009
Take the MSNBC Poll: Iraq, George W. Bush and the 2004 Campaign Great Debates	18	1212	July 13, 2003

Under what conditions would an internet poll be statistically sound?

Related topics