What principle do you use when ranking items?

…except that my list wouldn’t look like that. You’ve arbitrarily imposed your even distribution scheme to the 2 to 4 ratings.

My list would probably look more like this:

1 star - two restaurants
2 stars - seven restaurants
3 stars - forty eight restaurants
4 stars - thirty seven restaurants
5 stars - six restaurants

Remember my scale:

5 Star this is virtually perfect, it does everything it’s supposed to do, and nothing it shouldn’t do
4 Star this is great, except for one or two issues
3 Star this does the job, but doesn’t stand out in any obvious way
2 Star You can do the job with this thing, but it’s a pain in the ass, and the results kind of suck
1 Star I’d rather eat broken glass than use this product.

I’d guess that a large majority of restaurants in our society would be a 3. You can eat there, not get food poisoning, and not break the bank in the process. But you’re not going to go raving to your friends about it. Think McDonalds or Olive Garden. Very few would be 1, or even 2, because a shitty restaurant can only survive under certain conditions, like it’s literally the only place around, so they just don’t give a shit.

I’d be lying if I rated items as 5 that I thought were actually 3’s or 4’s. I don’t know what other people are doing; maybe they’re convincing themselves that, as in @puzzlegal’s example, the restaurant with stale bread sandwiches belongs in the same category as the one that gives you food poisoning.

No, more information. Using your technique would require me to give incorrect information; and there’d be no way for the survey reader to tell which of the information was incorrect. So there’d be no way to distinguish between any two or three adjacent categories. You couldn’t even tell whether I liked something I’d put into category 2 better than something I’d put into 4, because what if I actually thought most of the stuff belonged in category 3? In order to meet your technique, I’d have to put some of it into 2 and some of it into 4 and possibly even some of it into 1 and/or 5, instead of being able to accurately say that I thought almost all the items were pretty so-so and belonged in 3.

It’s not like flipping coins, because movies aren’t coins. Flipped coins are indeed going to fall neatly into two groups of relatively even numbers. But there’s no law of physics saying that movies in general will be spread evenly in quality all over a scale; and there certainly isn’t one saying that the selection of them that you happen to see will be spread evenly in quality in that fashion. If most sources tell you that Movie A is great and Movie B is awful: which movie are you going to see?

(In addition: I don’t think I’ve ever seen a number rating scale that asked me to rate a thousand items; or anywhere remotely near that.)

For one thing, I can’t imagine being able to rate a thousand, or a hundred, or probably even five restaurants, or movies, or whatever, that precisely. One restaurant makes a great Reuben but has no real fried chicken, only chicken nuggets which to me taste only of the salt and whatever they added to the breading; but they have really good breads and a great cheese selection. Another has really good actual fried chicken from pasture-raised birds and the best omelets in town, but they have nothing resembling proper rye bread and don’t even try to make a Reuben, and their coffee is awful. One makes, according to all of my friends, utterly fantastic food perfectly served; but everything is based on pasta and I don’t like pasta, so all I can eat there is dessert, but it is really good dessert and I like hanging out with my pasta-loving friends. There’s no way in the world I can rank those in order. It’s nothing like flipping coins. It’s more like reaching into a bag containing all the game tokens in the world and picking out handfuls, any given handful varying between three and seventeen; with a few scattered solid gold ones mixed into the whole pile and also a few that’ll stab a spike into your hand.

But I feel that just makes the issues I described worse.

The star rating system should convey information about the quality of a restaurant. And the premise (for the sake of the argument) was that you and I agree on the qualities. So we both think that Smitty’s BBQ Grill is slightly better than Fred’s Burger Joint and slightly worse than Luigi’s Pizza Palace and so on.

This means that theoretically we could have numbered the one hundred restaurants from one to one hundred and we would have the same lists. We both would agree that the seventy-first restaurant is slightly better than the seventieth and slightly worse than the seventy-second.

But an individual numbering system would be difficult in the real world. So we use five numbers and group restaurants together under a single rating.

By making the sizes of the groups approximately equal (20-20-20-20-20), I feel this conveys the most precise information about the most restaurants. You could take any two restaurants in my system that have the same rating and you know that they are, at the maximum, within twenty places of each other.

Under the unequal system that I described (2-32-32-32-2). You have four restaurants that you have very precise information about. If a restaurant has a five-star rating, you know it’s one of the two best restaurants in town; that’s a precision of two places. The same is true for the two one-star restaurants.

But the accuracy of the ratings for the other ninety-six restaurants is worse; they’re only rated to an accuracy of thirty-two places. As I wrote, you could go to a four-star restaurant not knowing if it’s the third best restaurant in town or the thirty-fourth best restaurant in town. That’s a wide range of quality.

And the system you described (2-7-48-37-6) is even worse. Your three-start rating has a range of forty-eight. That means the forty-fourth best restaurant in town and ninetieth best restaurant in town have the same rating. I think there would be a significant difference between a restaurant in the top half and one in the bottom ten.

The general rule is the smaller the group is, the more precise the information about the things in that group is. And the way to put the most things in the smallest possible groups, is to make the groups as close to equal in size as possible. I’ve spoken about the arbitrariness of ratings systems but what I’ve said in this paragraph isn’t arbitrary. It’s a mathematical fact.

That said, maybe there are reasons outside of mathematics why a different ratings system is superior. But people making that claim need to present the argument.

BTW, there’s a name for the rating “system” that @Little_Nemo is using: it’s called a “quantile” scheme (in this case, since there are five groups, it’d be called “quintiles”). Quantiles are sort of a meta version of a ranking or rating system – one places the items being “rated” into X number of equal-sized groups, based on their actual scores in a rating or ranking.

If you’ve ever taken a standardized test, and seen a “percentile” number associated with your score, that’s another quantile – in this case, dividing all the people who took the test into 100 equal-sized groups, based on their test scores.

So, in @Little_Nemo 's example, what they want to rate as “5s” are the “top quintile,” representing the 20% of the items that are most highly rated; the “second quintile” would be the next 20%, and so on, down to the “bottom quintile,” which is the 20% of the items with the lowest rating.

Quintiles (and other quantiles) can be useful in data analysis (I use them frequently as part of my job) – if you want to know, “what are the top 20% of items?”, it’s an excellent too. But, what they don’t do well is tell you what the underlying values are that are actually used to place each item into its quintile. As several of us have pointed out, if you have a large number of your items which are all highly rated (or lowly rated), or if most items are average, a quintile system gives you an incomplete picture of the spread of scores.

If I used my example of “rate the songs from my favorite band,” and I was forced to place them into quintiles, to be honest, the songs in quintiles 5 through 3 are probably all going to be great songs, IMO.

That’s not the question I asked though, is it? And it’s an important question.

Yes, you think you’re telling the truth when you use your system. And you would be lying if you didn’t use your system.

But the question is what do you think the rest of us are doing? Some of us aren’t using your system. Do you think we know your system is the one true system and we are choosing to lie by using a different one?

Or do you think we believe our system is the true one and we are doing what we think is true when we use it? And if you acknowledge that we’re sincere and not knowingly lying, think about the implications that follow.

All of us are using a system that we think is the true system and we think we are telling the truth by using it. But you argue that our system isn’t the true system. Our belief that our system is true is wrong and our system isn’t the true system even if we believe it’s true.

Now double back to yourself. You say that you’re using a true system and you are telling the truth by using it. But how do you know your belief that your system isn’t wrong and your system isn’t the true system even if you believe it’s true?

I realize this argument usually isn’t applied to rating systems. It’s an argument usually used for religions. People argue that the God they believe in is real and they offer their belief in that God as proof that he is real. But they also acknowledge that other people believe in other Gods while dismissing the reality of those Gods. It was this line of reasoning that led me down the path of atheism.

Walk me through this because I’m not seeing it.

I understand what you’re saying. I certainly have bands who I feel have only made great songs. And I would rate all of their songs as a five.

But that would be on a scale of all songs. I raised the issue before about context. In the context of all songs, than all of the songs by my favorite band might deserve a five.

But suppose I’m specifically rating just the songs by my favorite band. Not comparing them to songs by other bands; just comparing them to each other. Can I really claim that every song is equally great? Because that means I’m saying that every song they performed is the best song they ever performed. And phrasing it that way reveals the weakness of the argument.

The thing about grouping is that giving a bunch of items the same rating doesn’t mean you’re saying they’re all equal. You’re just saying they’re all in the same range. Two items can deserve the same rating even if you feel one is clearly better than the other. What you’re saying is that these two items, even while they are not equal in quality, are closer in quality than they are to two items in a different range.

My example of the songs from my favorite band is such an example.

Using @Horatius 's example of restaurants in post 41, in which they said:

…if you’re putting those 100 restaurants into five quintiles, your “top quintile” is the top 20 restaurants, and will contain all six of the 5 star restaurants, and some (14 out of 37) 4 star restaurants. Your second quintile (the next best 20%) would all be 4-stars, and the third quintile would be some 4 stars, and some 3 stars. The fourth quintile would all be 3-stars, as would over half of your bottom quintile (which would also include all of the lousy 2-stars and 1-stars).

When you decide that it’s “equal-sized groups” is the only thing that’s important to your analysis and rating, if the actual distribution of scores in the population isn’t equally distributed, you’re not seeing the entire picture.

Edit: I think that this discussion is crossing the streams between “ratings” and “rankings” (and I apologize if I helped to muddy the water). Ratings should be on an absolute scale, in which each item is assigned a rating based on its own merits; rankings are done relative to other items in the population.

But as I have said, there’s no objective standard. It’s not that this song is a three and I’m lying if I rate it a four.

What I’m saying is “I feel this song has a certain level of quality to it. And I am arbitrarily assigning the number three to this level of quality.”

You may come along, listen to the same song, and be in complete agreement with me about the level of quality of that song. But you might arbitrarily assign the number four to that level of quality.

And both of us group songs of different levels of quality together and assign them all the same number. Both of us would agree that some songs that have a rating of three are better than other songs that have a rating of three.

So the argument comes down where you draw the lines. Where do you put the line dividing three from four. And I disagree with those people who feel that there is one true place where that line must be drawn and putting it anywhere else is wrong.

Sure. However, in your OP, you also said this:

So, your personal arbitrary ratings (and, yes, I agree, such ratings are arbitrary, and in cases like rating songs, or restaurants, there is no absolute “true rating”) also seem to include an additional bit of arbitrariness: that you want to have the same proportion of items in each rating “level” (i.e., quantiles).

Is this accurate?

But what’s the context you’re rating the songs in?

I don’t know what your favorite band is. But let’s say it’s the Beatles.

If you’re rating a bunch of songs on a scale of one to five and you place all of the Beatles songs in the five group, I’m fine with that.

But if you’re rating a bunch of Beatles songs on a scale of one to five and you place every song that you rated in the five group, I’d argue you did a poor job of rating.

Sure you’ve conveyed the information that you really love the Beatles. But that’s all of the information you’ve conveyed. Do you love “Yesterday” the same amount you love “I Call Your Name”?

I’d disagree, and say, “no, I did a great job of rating, because I really do think that all of those songs are ‘5s’, using the scale you gave me. You’re asking me to rank them – either that, or you should give me a more sensitive scale, with more points on it, than a 5-point scale.”

(Parenthetical: the Beatles are, indeed, one of my favorite bands, but my very favorite is Electric Light Orchestra. :wink: )

No, I don’t see any additional level at work. We are both using the same five numbers; there is no additional level.

I am saying that placing the five numbers as close to equally distant apart as possible allows you to convey more information than placing five numbers at non-equal distances apart would. We’re using the same tools but I feel I’m getting the maximum use out of them. (I offered a mathematical explanation why I feel this is true in a previous post.)

I disagree, for the reasons already stated. Your system treats the 5-4-3-2-1 scale as interval data, when, depending on the context and question, it may only be ordinal data.

At this point, I don’t think either of us are going to convince the other of much of anything. Your preferred approach makes you happy, so enjoy it.

So like Garrison Keillor you think all ELO songs are above average ELO songs?

“Rate each ELO song” is a different question than “compared to other ELO songs, are each of these songs above average, average, or below average?”

I feel that rating everything the same isn’t rating. It’s just saying something exists. Rating requires comparison.

Saying that Jeff Lynne and Wayne Newton are both singers is a fact. Stating that Jeff Lynne is a better singer than Wayne Newton is a rating.

Again, a rating does not require a comparison, except against a (likely arbitrary) scale, and each item being rated should be given its rating based solely on its own merits. What you’re asking for is something that’s relative to other items in the group, and that is actually either (a) a ranking, or (b) a grouping.

If you asked me “place all of ELO’s songs into five groups of the same size, based on how well you like each song,” I could probably do that, but at a certain level, for me, that’ll wind up forcing me to make a distinction between songs (such as, what goes into the top quartile versus the second quartile) that doesn’t actually exist in my head.

What you’re doing then is ranking the restaurants, not rating them. But there isn’t much granularity in your rankings when you do it that way.

Nah. A normal distribution plots as a bell curve, not a straight horizontal line

It’s not like flipping coins. Not unless you’re using a 5-sided die (what?!? :stuck_out_tongue:) to randomly assign ratings.

Too late for edit: Of course, I posted before I read the entire thread. I see kenobi_65 has already discussed some similar things. Also, on re-reading, it occurs to me that opinions or ratings don’t naturally chart as a normal distribution. I think some things end up in a bell curve anyway. Most restaurants and movies are probably crowded around the quality mean.

It’s only a “mathematical” fact if you can assign precise scores to each item, that aren’t based on a personal opinion. I can assign precise scores to determine who is the best math student, because they all did a test, and got some score out of 100. We can group them into percentiles or quintiles or whatever we want, but ultimately, we can look up the underlying score to get more information.

You can’t do that with restaurants or movies, though. There’s no objective underlying score that determines the rank of every movie. It’s purely personal, and relative. So there’s no underlying score that can be evaluated outside the 5-star rating system, and the system is worthless unless we have some idea as to what criteria determine a 1 or a 5, or any other score.

If you don’t think that 5 levels is sufficiently granular, then you need to use more stars, not impose an arbitrary distribution. Of course, at the extreme of this, we end up back with just a plain ranked choice list, with the top movie out of one hundred movies getting one hundred stars, and so on.