I’m creating a small survey, mostly made up of ratings questions similar to this: “How much do you like this ______: 1 not at all, 2 somewhat dislike, 3 neither like nor dislike, 4 like somewhat, 5 like a lot”
I envision the output of the survey for this particular question to be a list of the ratings followed by the number of people who answered a specific rating perhaps followed by a mean:
5 - Like a lot - 20
4 - like somewhat -5
3 - neither like nor dislike - 10
2 - somewhat dislike - 9
1 - not like at all - 5
Mean: 15.8
When I am asked to give the variance of answers, what exactly does that mean? And what other useful information could I show on these set of numbers?
The mean is, of course, the average of all the numbers. To find the variance, you do the following: take each reply, subtract off the average, square the result, take the average of all of the squared results, and then take the square root of the average.
So, for example, if you surveyed ten people and your data turned out to be
Really, though, if it’s a short survey and you’re only using five categories for each question, you might almost be better off just showing a histogram (bar chart) for each individual question. This kind of “rank things on a scale from 1 to 5” question is subjective enough that doing averages & variances on it probably isn’t too meaningful.
Actually, the previous answer was not the variance. It’s the standard deviation. Remove the square root and you’ll have the variance. Also, you should divide by 9, not 10 since the results are obvously sample data.
As to your question of what the variance represents, it is a measure of dispersion in your data, i.e., how the responses are spread out. The mean measures the center of your data. The smaller the variance, the closer all the responses are to the mean.
That is not interval level data (it’s ordinal), so you can’t really accurately perform an accurate picture of the data using the variance or standard deviation as the analytical tools.
I thought there was more (easy, useful stuff) to say about standard deviations … if there is a normal distribution, isn’t it the case that some portion of data points (1/3, maybe?) falls within 1 standard deviation of the mean, and another portion (2/3 ?) falls within 2 standard deviations?
This is a very significant issue for this sort of data analysis. You can make the mean and standard deviation come out to anything you like by picking the right numbers for each response. What you really want is the coefficient of variation, which is the standard deviation divided by the mean.