I don’t know if anyone’s asked this before, if so, :smack: . I searched and didn’t see it.
What statistical method do sites like that silly “hot or not” site use to normalize their scores? In other words, how do they compensate for people who habitually vote in the 1-5 range, or people who only vote 1 or 10?
I frequent what could be best described as a parody of that site, and their real problem is that (since they’re just using a regular mean) the scores are very volatile because some people will just be jerks and vote 10 on a decent picture.
Any advice, including cites to resources I could use, would be greatly appreciated.
I don’t know what they actually do. But it would be foolish to normalize on the inputs, what the voters are doing. A reasonable approach would be to take all the outputs (every single vote cast for every single person – all your data) and try to normalize on that.
That’s if you want to normalize it, which I see as a very questionable assumption.
I don’t know what they actually do. But it would be foolish to normalize on the inputs, what the voters are doing. A reasonable approach would be to take all the outputs (every single vote cast for every single person – all your data) and try to normalize on that.
That’s if you want to normalize it, which I see as a very questionable assumption.
Look down the bottom of the IMDB’s top 250 page for what they do to get what they claim is a true Bayesian estimate. I don’t think it would solve all your problems, but it would be a fair start.
The short answer is that they shouldn’t be using any statistical technique at all. It’s disingenuous to do so, garbage in, garbage out.
The things you are apparently referring to is “volunteer data”. One can certainly chop and dice it, but it’s still only that, no more a sample than me recording my own views on something several times a day, adding up the number of times I did so and posting this frequency on the top of some list. So what?
Check out the vote curves for Troll 2 and The Master of Disguise. Clearly, the trend from bottom to top is disrupted by a disproportionate number of 10 (highest) votes.
aahala has it right that rigorous statistics should not be done on this data because it is ‘volunteer data’. The data is not collected properly for this.
If it was however, one way to eliminate people giving different ratings would be to ‘standardized’ (mean of 0, std dev of 1) their scores. In a nutshell, this would mean that someone would have to rate a question above their ‘average’ in order to have a ‘positive’ score.
In practice, many types of analysis are robust without standardizing but, in my experience, any analysis that segments respondents, like cluster analysis, is better with standardization.
I am fully aware that it’s volunteer data, and that there really is no way to give it statistical accuracy in the most rigorous sense. However, what I was endeavoring to find out was what some of the other sites that use a self-selected sample happen to use as a form of statistical correction. I’m no statistics expert, but I’ve taken enough stats that I’m not naive enough to assume that such a data set, even with some correction applied, is inherently un-representative.
Lemming, I’m not exactly sure what you are trying to do but if you are worried about ‘smart-alec’ answers, then you could eliminate the lower and higher 5% or 10% of each question and analyze the remaining 90% or 80% left of the question.
I know that statistics and sampling have a bad rep but I’ve been doing this for several years and I can testify from my experience that sampling, done properly, is very robust and can take many ‘hits’ and still give accurate/reproducible results. That is the real key in all this – are the results reproducible.
If you suspect that many people are giving bad/wrong/smartalec results then you are in trouble. However, a few jerks will not mess up your study.
If the poll is more-or-less “just for fun”, they probably don’t do anything other than take the mean. Why would they bother?
If I were tasked to try to get some semblance of usable data from these polls, my first thought would be to look at the data from a large number of previous such polls, to get the relative fraction each number 1 through 10 is selected. I could then weight each selection by it’s “desired” expectation (say from a bell curve) divided by its actual expectation.
Another possibility, if you’ve got a lot of 1’s and 10’s, would be to count those as being from a two choice poll. You could then redistribute the 1’s in with, say, the 2’s, 3’s and 4’s, and the 10’s with, say, the 7’s, 8’s, and 9’s, possibly with a lower weighting (since the resolution is lower). You might leave some of them in the ten-choice poll (e.g. 5 percent of the 1’s and 10’s count in the ten-choice poll, and the other 95 percent count in the two-choice poll).
Looking at the actual data might give you some other ideas.
For an example of where someone might really want to bother with this, I used to listen to Spinner internet radio, and they’d let you rate the songs you were listening to. Presumably, they would use the rating to help determine their programming. So there’s a case where they have volunteer data, but they really would like to get some usable information from it. I usually selected 1 or 10, on the theory that would affect the mean most. Now I’m thinking maybe I should have stuck with 2’ s and 9’s, but who knows.
You know, that’s what I would have thought, too, but my friend has an account (with hotornot), and they show you a histogram of all your votes, as well as your overall score, and it’s definitely not the mean. From memory, I’d say his chart looked to have a mean around 5 or 6, and a standard deviation of 2.5. The mode was 4. His overall score? 8.7.
Apparently, they base their normalization on the voter’s distributions, not on the “voted upon’s” distributions. Perhaps they rescale the minimum and maximum votes given to correspond to 1 and 10. They could also do what I discussed above, but for the voter instead.