Let’s say you’re trying to rank basketball players on how good they are at shooting free throws (FTs).
For each player you have a season’s worth of stats on how many FTs they took and what percentage they made. But they’ve all attempted different numbers of FTs.
You could rank them based on the percentage of FTs they made, but then someone who only took 1 FT and made it would be at the top with 100%, even though he probably isn’t the best FT shooter.
You could rank them on total FTs made but then someone who took a lot of FTs could be at the top even if his percentage isn’t that great.
You could rank them based on FT percentage but exclude people who haven’t taken at least a certain minimum number of shots. (This is what I’ve seen some sports sites do). But it occurs to me that the person who took 100 shots and made 85% still might not really be as good as the person who took 300 shots and made 83% – it’s just that it’s easier to perform slightly above your true skill level over the course of 100 shots than it is to do it for three times as many shots.
I think a better idea would be to do some sort of weighting. For instance:
Free throw rank = F(N) * (your FT %) + (1- F(N)) * (league avg. FT%)
where N is how many shots you took and F(N) ranges from 0 to 1 as N ranges from 0 to infinity. In other words, the more shots they took, the more we believe that their FT% represents their true level of skill.
But how to pick F(N)? It could be x / (x + 1), but really it could be g(x) / (g(x) + 1) for any monotonically increasing function g(x) with g(x) --> infinity as x --> infinity and g(0) = 0. For instance, (5x[sup]2[/sup]) / (5x[sup]2[/sup] + 1).
So how can I decide on a good choice of F(N)? Or is there some other, better way to do the ranking?