Questions about making a ranking from submitted lists

For a classical music board I’ve prepared a ranking of composers. In total 57 members sent in their top 30 composers, which I rated (40 points, gradually decreasing to 6 points, initially a bit steeper) and combined to a list. I chose a cut-off point: to be ranked in the final list, a composer had to be named at least 3 times. In the end, I could create a top100 this way. I realize that most exact rankings in this top100 are statistically not valid, but I have two questions.

[1] Someone objected that by using top30’s I could not go beyond a top 30 for the results. This sounds wrong to me, but I can’t find anything to disprove (or prove) the statement.

[2] Should I have gone for a different number than 3 for the cutoff for statistical reasons?

Thanks for any help.

I’m not sure about #2, but for #1, consider a very segmented polling population. Let’s round to 56 respondents and divide them into two groups of 28, and every respondent in a given group answers in the exact same way, but no one repeats a composer from the other group. Thus you have 60 different composers, and to me it would seem more arbitrary to cut off the results at #15 from each group rather than have them include all 60, since even the lowest composer got 168 points.

We’ve had similar questions before; I’ll just make one comment here.

I like to think about “corner cases.” Suppose hypothetically that everyone ranked Leonard Cohen as the #31 composer. That would mean that he “should” be ranked well above #31. But in your scheme he doesn’t even make the Top 100.

Once scored, totalled and those with less than 3 votes omitted then you just do a sort on the scores?

Do you have a way to distinguish between similar scores with different characteristics? From what you describe, Dave Mozart can score say 240 points from six fan-boys who’ve scored him as top, while no-one else even put him on their lists, versus 240 from 40 different voters who were unanimous in putting him last.

If this was a proper music contest, like Eurovision, then the top 10 get scores, while lower ranked get nul points. This seems to separate the favourites from consistent also-rans much more quickly.

Thanks for all reactions so far, I’m really looking for an answer from a statistics point of view. I want to counter the people stating that of course you can’t go beyond 30 in this case, and I know them - mental exercises like sketched here will be brushed aside as irrelevant.

I’m on board with the objection to going much beyond 30 if that’s all any one person could submit. Certainly not to 100. I’m surprised, actually, that you even have 100 unique names left after your “named three times” criterion.

Consider ice cream flavors. Ask 30 people to provide their top 6, and then try to make a list of the top 20 flavors. It’s not going to make any sense. The “true” 18th, 19th, 20th-ranked flavors should end up being kinda-weird-but-not-completely-crazy stuff like cucumber or whatever, but cucumber isn’t going to appear in anyone’s top 6, so it has no way to show up in the list where it belongs. In this example, you probably just won’t end up with 20 unique flavors to fill the list, but if you somehow managed to, the poorly ranked flavors will represent individual outliers in people’s top 6 rather than any sort of consensus opinion about what should be down at those rankings.

(To be more specific: Say cucumber should be the true 20th, but nobody puts it in their top 6 [who would?]. But, one person is really keen on pineapple and ranks it 5th, and another is really keen on blueberry and also ranks it 5th, and some weirdo puts carrot in their 6th slot. Those become the flavors that can end up in 20th place, but everyone would agree that pineapple and blueberry should be around, maybe, 12th and that carrot shouldn’t make the top 20 at all. And poor cucumber never even gets a chance!)

First off, there is actually a theorem (Arrow’s theorem) that states that no voting method can satisfy a bunch of quite reasonable looking criteria. What I would have done would have been to give each of the voters 100 points to distribute as they see fit. If they want to give Beethoven 100 points and not give anyone else any, so be it. Then just add all the point totals and take the top 100.

The other issue, as mentioned by Ludovic in the second post - if it’s a diverse group with two or more distinct preference groups, then a single ranking list is meaningless unless you can guarantee the population was fairly chosen. (Think “who should be president?” poll)

Thanks for the reactions. Food for thought here.

Your point is clear (but … cucumber?? :stuck_out_tongue: ); let me give another real-world example:

I’ll guess relatively few people would put Shawshank Redemption on their Top Five Movie list, let alone their #1 slot. Yet there it is, sitting at the very very top of IMDB’s Top 250, the only 9.2 on the list. Lots of people put Godfather way ahead of Shawshank, but others don’t like gangster flicks. Lots put Casablanca at the very top, but it’s black-white and WWII is ancient history for Millennials. Most put Lord of the Rings near the top, but some don’t like fantasy. ** But everybody appreciates Shawshank Redemption to some extent.**

Which is why Taco Bell always wins “Best Mexican food” in readers polls out here.

Here’s how I would approach it.

  1. Make a list of every composer ranked by any member.
  2. Consider every possible pair of composers. What percentage of members preferred composer A over composer B? If a member ranked one composer in the pair but not the other, then the ranked composer is preferred. If a member ranked neither composer in the pair, then that member doesn’t contribute to that percentage.
  3. Evaluate each composer’s percentages. Their “natural” ranking is one plus the number of their pairwise percentages less than 50%. There is likely to be ties at multiple ranks, but up to this point it’s difficult to argue the process is unfair.
  4. There are many ways to break the ties. I’d use the median pairwise percentage score. The mean gives weight to the extremes, while the median estimates the center better.
  5. Showing the intermediate results would probably be interesting to the members and give some transparency to the process.

Any chance you can make available your raw data, anonymized if necessary?