There are plenty of sites out there that let people vote on things by rating them, then ranking the subjects by rating average. Many of these play out ok over time because they’re open-ended and every entry will eventually get a large number of votes (I guess). But what about closed-ended systems, like a contest, where it is not required that every entry receive the same number of votes, but that will end in a short period of time.
Basically, what is the statistical method to ensure that an entry that gets one vote of “10” (for a 10.00 average) doesn’t win over an entry that has a 9.8 average but had 1,000 people vote for it?
Do you only allow entries to be ranked that receive a minimum amount of votes?
Do you pre-seed all entries with “X” # of average scores, so one vote won’t skew it overly (every entry automatically gets assigned 20 votes of 5.0, for instance)?
Some more sophisticated method based upon total aggegrate votes, or average # of votes per entry (not ratings, but raw votes), that then determines the minimum votes needed to be eligible, or applies a bonus or penalty to low-raw-vote entries?
This math stuff be way over my head
Could you give an example?
Sure, for the web project I’m working on (note that this isn’t exactly it, but a close enough facsimile).
Contest entrants create a song, and upload it. Site users can listen to the songs. There’s a one-month period for people to enter contest, followed by a two-week voting period. Users can (in theory) vote only once per song, but they can vote or not vote for any of the contest entries. They vote on a one-to-five scale.
Or if you want a real-life example, think
Am I Hot Or Not. Except there you are sort of forced to vote, but with my project you won’t be forced to.
I see two ways. If r is the average ranking of a song, and n the number of votes it received, sort by the quantity max(1, r - 5/sqrt(n)). Or, sort by the sum of the votes per song.
ah, i see, sort of. Not sure about the basic sum of the votes, since then you may have the opposite of what I first wrote about happening - a song that is bad, but because of certain circumstances gets a lot of votes (even be they ratings of “1”), so that a bad song with an average rating of 2 but with 1,000 votes beats out a good song with a average rating of 4.5 but only 300 votes.
Problem is I can’t guarantee pure randomness in how things appear on site. For instance, the latest entries will be published on home page. If they get published on the weekend, then those songs may get more viewers (who knows) and thus more votes.
Trying to walk a thin line here, I know.
Personally, I’m going to say that you have to switch your paradigm. (Heh heh, I used a buzz-word.)
You are never going to be able to claim statistical validity from such a survey, so drop that line of thought completely. What you are looking for are vote aggregation schemes, although that isn’t the technical term (if there is one), and I can assure you, without fear of valid contradiction, that the only guaranteed problem-free method of doing this is to have one person do the ranking. Google for Arrow’s Impossibility Theorem or Arrow’s Paradox for the reason’s why.
I suggest that what you want is the method of aggregation known as approval voting. In approval voting, each voter is allowed a number of votes up to the number of candidates. The voter then gives one vote for every candidate for which she approves. If she votes for all candidates or no candidates, then her vote is essentially null, because she has failed to differentiate between any of the options. The winner of the election is the one with the most votes of approval.
For your program, you define approval as being a score of X or higher, where X the number on the scale from 1 to 10 that indicates, for example, that a person would go out and buy the song or album. How you choose X will ultimately be up to you—since you don’t have a market research department, X is going to be fairly arbitrary.
Suppose you decide that a vote of 8 or higher qualifies as approval. (Or you could go with 6 or higher since it is above the midpoint and, I suppose, that would suggest that the song good rather than bad.) The song with the most votes of 8 or higher is the song that wins the competition.
This method obviously has flaws; however, a flawless system is well beyond your grasp. With the notion of approval voting, it may be better to just ask “Would you buy this?” and a yes indicates approval. Alternatively, you could ask Did you enjoy this song?, Would you like to hear this song on the radio?, or something like that.
That’s my suggestion.
Did you enjoy this post?
If I’m understanding you correctly, it sounds like a Bradley-Terry analysis might fit your bill with people not ranking a song as the lowest ranking.
I am unable to explain it right now but it gives you a name of an analysis to research.
Check out the bottom of this page at IMDB, who use Bayesian estimates to overcome such problems. I’m sure there is a better explanation on the site but I can’t find it.
The formula they use is (according to the same page):
weighted rank (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the Top 250 (currently 1250)
C = the mean vote across the whole report (currently 6.8)
This eliminates the problem the OP seems concerned with, which is different candidates getting a widely different number of votes. Seven Samauri got less than 30,000 votes while Shawshank Redmption received over 135,000.
Wow, thanks for all the helpful replies today.
My first take on the true Bayesian estimate was that that would be exactly what I was looking for, but I ran some sample numbers (granted, on very low vote totals) and it really seems to skew towards those that receive lots of votes, regardless of what their true average is.
Tr Avg. Bayes # vote cum score
Entry A 3.833 4.607 12 46
Entry C 3.000 4.138 11 33
Entry E 2.545 3.939 11 28
Entry D 3.500 3.909 6 21
Entry F 4.500 3.846 4 18
Entry B 5.000 3.174 2 10
3.391 46 156
What’s weird is that the entry with the lowest True Average (Entry E) actually comes in 3rd place with Bayesian, which I don’t follow.
So I’m a little hesitant to use that, esp. since I really don’t know how many votes these entries will be receiving. Very short window of voting, small target audience.
Leaning now towards just a plain True Average with a minimum # of votes to be eligible.
Guess there is no perfect system.
I would assume that people rank the songs which they think are particularly good or particularly bad, and don’t bother with the ones they think are just mediocre. So if a person doesn’t vote at all for a song, that could be interpreted as a vote of 5.5. So you could pad each song with enough votes of 5.5 that they all have the same number of votes as the most-voted song.
To get a little more sophisticated, it might be that people are more likely to vote for a good song than for a bad song, or vice versa. In this case, you could pad with an amount that would bring the overall average to 5.5. That is to say, if you have a lot of votes of 9 and 10, but few of 1 or 2, then that indicates that most of the un-votes indicate that a song is pretty bad. So the un-vote padding should be a fairly low vote.
Both of these assume that any given song is equally likely to be considered. A song that just isn’t heard much (because it was entered into the contest just before deadline, say, or it was on the last page of the list) would then tend to look more average than it really is (even if it’s actually unusually good or unusually bad). In this case, it would probably be better to require some minimum number of votes for a ranking to count. If one person votes 10 on something, that might just mean that that was the guy who wrote the song, or something. But if, say, 100 people all vote 9 or 10 for something, and nobody votes lower, then you could safely say that some unbiased folks are listening to it and liking it. I’m not sure where to set the cutoff, but it would probably depend on the number of people you expect to be involved (if only 50 people are going to see the website at all, then obviously a cutoff of 100 won’t work).
I would also advise that, whatever ordering system you use, that you give the later viewers as much information as feasible. So, for instance, your number 1 pick might say “<name of song 1> 9.82 (736 votes)”, followed by “<name of song 2> 9.76 (3644 votes)”, etc. If you wanted to get really fancy, you could include a little histogram image with each song, so you could tell which “average” songs everyone thought were average, and which ones some folks loved and others hated.
I would strongly advise the exact opposite if the desire is to get an accurate opinion. Giving info about how previous voters had voted can bias the results.
Yes, we had already decided that we wouldn’t show ratings on a piece until after a person had voted on that piece (in other words, only s/he would see the score, and only after s/he had voted).