# What is involved in finding "variance" in a survey?

I’m creating a small survey, mostly made up of ratings questions similar to this: “How much do you like this ______: 1 not at all, 2 somewhat dislike, 3 neither like nor dislike, 4 like somewhat, 5 like a lot”

I envision the output of the survey for this particular question to be a list of the ratings followed by the number of people who answered a specific rating perhaps followed by a mean:

5 - Like a lot - 20
4 - like somewhat -5
3 - neither like nor dislike - 10
2 - somewhat dislike - 9
1 - not like at all - 5

Mean: 15.8

When I am asked to give the variance of answers, what exactly does that mean? And what other useful information could I show on these set of numbers?

The mean is, of course, the average of all the numbers. To find the variance, you do the following: take each reply, subtract off the average, square the result, take the average of all of the squared results, and then take the square root of the average.

So, for example, if you surveyed ten people and your data turned out to be

(54 + 42 + 30 + 21 + 1*3)/10 = 3.3

sqrt(((5-3.3)[sup]2[/sup]*4 + (4-3.3)[sup]2[/sup]*2 + (3-3.3)[sup]2[/sup]*0 + (2-3.3)[sup]2[/sup]*1 + (1-3.3)[sup]2[/sup]*3)/10) = 1.75.

Really, though, if it’s a short survey and you’re only using five categories for each question, you might almost be better off just showing a histogram (bar chart) for each individual question. This kind of “rank things on a scale from 1 to 5” question is subjective enough that doing averages & variances on it probably isn’t too meaningful.

Thanks a Lot MikeS, that as extremely helpful!

But what exactly does the variance represent? What does 1.75 (in your example) tell me about the information gathered?

Actually, the previous answer was not the variance. It’s the standard deviation. Remove the square root and you’ll have the variance. Also, you should divide by 9, not 10 since the results are obvously sample data.

As to your question of what the variance represents, it is a measure of dispersion in your data, i.e., how the responses are spread out. The mean measures the center of your data. The smaller the variance, the closer all the responses are to the mean.

That’s because n is 9 is (n-1), where n is 10.

That is not interval level data (it’s ordinal), so you can’t really accurately perform an accurate picture of the data using the variance or standard deviation as the analytical tools.

The variance and standard deviation tells you a lot about the data. Say the average is 3. That could mean:

b) the answer were perfectly spread out (10 1s, 10 2s, 10 3s, 10 4s, and 10 5s)
c) There were 25 1s and 25 5s.

Variance helps you distinguish between those.

Mechanically, it’s not even worth it to learn the formula. Just put it in Excel and use the built-in statistical functions.

I calculated the standard deviation for scored questions in a resident survey I conducted. Here’s how I described it.

I thought there was more (easy, useful stuff) to say about standard deviations … if there is a normal distribution, isn’t it the case that some portion of data points (1/3, maybe?) falls within 1 standard deviation of the mean, and another portion (2/3 ?) falls within 2 standard deviations?

(It’s been 25 years since I studied statistics.)

Yes, 68% will be within 1 standard deviation of the mean, 95% within two, and 99.7% within three.

This is a very significant issue for this sort of data analysis. You can make the mean and standard deviation come out to anything you like by picking the right numbers for each response. What you really want is the coefficient of variation, which is the standard deviation divided by the mean.