You’re ranking a large group of items on a scale of one to five, with one being the worst and five being the best. What principle do you use to assign numerical rankings?
I’ve heard that some people essentially use a bell curve, where most items are ranked in the middle. Ranking of one or five are rare and only used for a handful of the very best or very worst items.
Personally, I go for a more even distribution, with each number getting an approximately equal number of items. I would figure on giving every item in the top twenty percent a rank of five and every item in the bottom twenty percent a one and so on.
I probably wouldn’t worry about the distribution at all, I’d assign some kind of standard for each level, then classify the items on that basis, and the distribution ends up being what it is.
5 Star this is virtually perfect, it does everything it’s supposed to do, and nothing it shouldn’t do
4 Star this is great, except for one or two issues
3 Star this does the job, but doesn’t stand out in any obvious way
2 Star You can do the job with this thing, but it’s a pain in the ass, and the results kind of suck
1 Star I’d rather eat broken glass than use this product.
It’s entirely possible that everything could be 5-star or 1-star. Not likely, but possible. Of course, in practice, this would likely end up with a bell curve, because Nature just loooooooves bell curves, so what are you going to do?
The state of Arkansas has the ridiculous policy or rating employees in a group so that they fit a Bell curve. That slews your data. All your employees may be excellent, or they may all be crummy, or anything in between.
Unless you are ranking things precisely and not subjectively, I’m not sure any mathematical tools are the right answer.
What I would do is an initial ranking, then maybe walk the list and do a kind of bubble sort - is Item #2 really worse than item #1? Sure. Is Item 3 really worse than item 2? If not, flip them. Repeat.
If the stuff being ranked is precise, then I’d have to think about the probable distribution - anywhere from a Pareto ranking to a fat-tailed ranking.
Pedantic note: what the OP describes is not “ranking,” it’s scoring. if you were ranking items, you’d be giving each item a unique score, and placing them in order from the best to the worst. If one had five items, and a ranking scale from 5 to 1 (with 5 being the “best”), an actual ranking would yield one item that’s “5,” one item that’s “4,” etc.
What the OP is describing is assigning items individual, independent ratings on a five-point scale; in such a rating, individual items can receive the same score as each other (and could, hypothetically, all receive the same score).
Did just this kind of thing for years as a judge at bird dog trials. I rated each on it’s own merit for their performance, and the chips fell where they fell. You might end up with a stake where none of the dogs were any good, and you withheld placements. On the flip side, there were stakes where you had so many good dogs it was difficult to decide places 1 through 4, and it would come down to splitting hairs based on your personal preferences in each performance, and you might even end up giving out awards of merit for those that didn’t make the top 4. I pretty much see everything through that lens - consider them individually, then see how they stack up against each other.
Pick any items which are commonly rated; books, movies, restaurants.
I feel equal distribution makes sense. If you have five ratings and each rating is given to approximately one fifth of the items that seems to make sense to me.
Speaking as a market research professional: when we create questionnaires with this kind of scoring/rating question, we’ll usually use some sort of descriptor for each of the points of the scale.
In the five-point scale that the OP suggests, the scale is commonly:
5 = Excellent
4 = Very Good
3 = Good
2 = Fair
1 = Poor
Of course, those adjectives are still subjective, but it helps many respondents wrap their heads around what each level of the scale means.
But it doesn’t tell you much about the actual value of anything, just relative value. In your system, a one-star item might be perfectly adequate, but it looks like crap because it’s not a five star item. That’s the problem with demanding any kind of fixed distribution; it over-emphasizes what may be tiny difference.
I was thinking about this today at the grocery store. I was thinking about buying snacks, but all the chips I like are too expensive, and the chips that are cheap I don’t like. So I thought about buying pretzels instead, because I literally can’t see any functional difference between the “good” expensive pretzels and the “bad” cheap ones. As far as I’m concerned, all pretzels are functionally interchangeable. Forcing a distribution in their ratings serves no useful purpose.
I’m a bell curve kind of guy. It may be because I don’t have strong preferences so most things end up in a muddle of “pretty much OK” from which relatively few things differentiate themselves.
That’s not how I’d generally think of it, unless the exercise forced me into putting the items into five groups of similar size. And, it’ll depend entirely on the nature of the things I’m being asked to rate.
“Rate this list of songs by your favorite band?” Most of them are going to get a 5 or a 4 from me, and very few are going to be below a 3.
“Rate this list of songs by a band you can’t stand?” I’d probably be hard-pressed to rate anything above a 2.
Has it occured to them that this would mean that even if all the employees in a particular area happen to be excellent, some of them would have to be ranked as terrible; and even if all of them are terrible, some of them would have to be ranked as excellent?
I abhor numerical rankings and avoid doing them whenever possible. If I’m forced to do one that allows multiple items to have the same ranking, then I’m not going to pay any attention to the distribution.
That makes no sense to me at all. Again, they might all be great, or they might all be awful.
How does that make any sense? OK, let’s say I rate the last five movies I saw from 1-5 stars. (which I do on Letterboxd.) I have not seen any 1-star movies so to give one a 1-star rating makes no sense. The last five movies I have seen have 2 at 5-stars, 1 at 4-stars, 1 at 3.5 and 1 at 2.5. According to you, I should say a film I liked was complete crap because… why?
Why do you think each ranking must have an equal representation? That’s not how things actually work. Sometimes a majority are better, sometimes a majority are worse.
Are you going to rate something you bought on Amazon as 1-star, even if it did exactly what it was supposed to, because you already rated another item 5?
That raises an interesting issue. When you’re rating items, do you rate them based within the context of what they are? Or do you incorporate a meta-rating which reflects your opinion of their collective identity?
For example, let’s say you like movies a lot but aren’t as big a fan of books. If you’re rating various items on Amazon, which includes books and movies, do you rate the books on what is effectively a scale of 1-4 while you rate the movies on a scale of 2-5? Or do you effectively have one scale of 1-5 for books and a separate scale of 1-5 for movies?
I feel it’s a matter of better communication. Which I feel is the main purpose of rating systems.
If I restrict the rating of five for things which I judge perfect and the rating of one for things which I judge I’d rather eat broken glass than use (to borrow Horatius’ phrase) then there will be very few things I give a rating of one or five to. What I will effectively have is a system of ratings on a scale of two to four, with pretty much every item receiving one of three ratings.
I feel a system which used five ratings conveys more information than a system that uses three ratings. Just as a system that rates things on a scale of one to ten gives more information that a binary system that rates things yes or no. The more possible ratings being used in a system, the more detailed the information it conveys.
So I feel if you have a system which includes five possible ratings and then choose to minimize the use of two of those ratings, you’re handicapping the system. You’d do a better job of communicating by using the full range that you have available.