statistics question

Let’s say you’re trying to rank basketball players on how good they are at shooting free throws (FTs).

For each player you have a season’s worth of stats on how many FTs they took and what percentage they made. But they’ve all attempted different numbers of FTs.

You could rank them based on the percentage of FTs they made, but then someone who only took 1 FT and made it would be at the top with 100%, even though he probably isn’t the best FT shooter.

You could rank them on total FTs made but then someone who took a lot of FTs could be at the top even if his percentage isn’t that great.

You could rank them based on FT percentage but exclude people who haven’t taken at least a certain minimum number of shots. (This is what I’ve seen some sports sites do). But it occurs to me that the person who took 100 shots and made 85% still might not really be as good as the person who took 300 shots and made 83% – it’s just that it’s easier to perform slightly above your true skill level over the course of 100 shots than it is to do it for three times as many shots.

I think a better idea would be to do some sort of weighting. For instance:
Free throw rank = F(N) * (your FT %) + (1- F(N)) * (league avg. FT%)
where N is how many shots you took and F(N) ranges from 0 to 1 as N ranges from 0 to infinity. In other words, the more shots they took, the more we believe that their FT% represents their true level of skill.

But how to pick F(N)? It could be x / (x + 1), but really it could be g(x) / (g(x) + 1) for any monotonically increasing function g(x) with g(x) --> infinity as x --> infinity and g(0) = 0. For instance, (5x[sup]2[/sup]) / (5x[sup]2[/sup] + 1).

So how can I decide on a good choice of F(N)? Or is there some other, better way to do the ranking?

You’re never going to be able to get away from this problem-- Even if two players take the exact same number of shots and make the same number of them, either one might actually be better than the other.

Are you looking to compare pairs of individual players, or to compile a ranked list of all players? The two problems are very different.

Another wrinkle, incidentally, is that the number of shots a player takes is not independent of how good a free-thrower the player is. Shaq, for instance, gets fouled a lot, because frankly, he sucks at free-throws, so there’s very little incentive not to foul him.

But it is also equally likely that you will perform below your “true skill level”. At best you could calculate the probability that the 83% player is actually better than the 85% player. It would never be a better than even chance though, so it would be hard to justify ranking the 83% player higher.

Well, yeah. But I still think there’s some logic to saying “a good FT% for a large number of shots is better than a slightly better FT% for fewer shots.” In the extreme example, it’s much more impressive to shoot 90% over 500 shots than it is to shoot 100% over 2 shots. But I’d like to be able to take this into account even for less drastic cases, which simply having a “minimum number of shots to be ranked” (as some sites do) won’t achieve.

The latter – I’d like to be able to rank every player in the league.

Good point. Although I guess for below average FT shooters an equation of the form I suggested above would actually penalize you more for having more shots, so I guess that kind of works out well.

I’m not so sure this is true. For an arbitrary player whose taken 100 shots, sure, it’s just as likely they’ve done better than they usually would or that they’ve done worse than they usually would. But if we know a player is performing above league average, that suggests it’s more likely than not he’s had some luck on his side (since less than 50% of the players would be able to perform above average without some luck), and this becomes more true the further above average they are. That’s why the further someone’s performance over a particular timespan is above (or below) the mean, the more likely it is that they will perform somewhat closer to the mean in the future. (regression to the mean)

Consider the extreme case: one player is a 100% FT shooter with one shot, and one is an 83% FT shooter with 300 shots. Would you still say “there’s a better than 50% chance that the guy who is 1 for 1 is the better shooter?” Most of the people in the league can go 1 for 1 more than half the time, but they can’t go 83% over 300 shots.

I think the best way to present the info is strictly by FT% (shots made/shots took) in descending order and in parentheses next to each % indicate how many shots were actually taken. That way, the reader can easily eliminate those %'s that are deemed irrelevant based on the number of shots took. It’s certainly the least misleading way to present the information. It’s up to the reader to decide if a 90% shooter based on 30 shots is as good as an 82% shooter based on 11 shots.

I think that if you rank them by free throws made minus free throws not made, you’ll end up with something that behaves reasonably.

If you want something “pure” (e.g., no arbitrary cutoffs, etc.), I don’t think you can avoid a prior probability distribution. Let’s call the God-given underlying probability distribution of FT success rates D(r). How one might construct D(r) is discussed at the end.

Since D(r) is a p.d.f.:
Int[sub]0[/sub][sup]1[/sup] D(r)dr = 1

Consider two players with unknown rates r[sub]1[/sub] and r[sub]2[/sub]. We want to know:
P(r[sub]1[/sub]>r[sub]2[/sub]|observations)
or, the probability that r[sub]1[/sub] is greater than r[sub]2[/sub] given the FT data we have available. If this is greater than 50%, Player 1 is ranked higher. Applying Bayes’ theorem:
P(r[sub]1[/sub]>r[sub]2[/sub]|obs) = 0.5P(obs|r[sub]1[/sub]>r[sub]2[/sub])
where I have used the fact that the prior probability P(r[sub]1[/sub]>r[sub]2[/sub]) must be 0.5, since either player could be the better one.

To calculate the RHS of the above expression, we must integrate:
P(obs|r[sub]1[/sub]>r[sub]2[/sub]) = Int[sub]0[/sub][sup]1[/sup] dr[sub]2[/sub] {Int[sub]r[sub]2[/sub][/sub][sup]1[/sup] dr[sub]1[/sub] {P(obs|r[sub]1[/sub],r[sub]2[/sub]) D(r[sub]1[/sub]) D(r[sub]2[/sub])}
where the probability in the integrand is given by the product of two binomial probabilities.

Assuming this comparison is transitive (I think it should be, but I haven’t actually thought about it), then the list is now formed.

Constructing the prior, D(r)
We still need a way to know D(r). Two options:

  1. Put in a cutoff (“10 FTs attempted”) and form a distribution of observed rates calculated straight from the data (i.e., made/attempted). A smoothing process would not be unreasonable, as only a small set of rational numbers will be represented.

  2. If a reasonable functional form can be assumed for D(r), perhaps one motivated by the shape of the distribution formed in (1), simply fit D(r) to the observations using a maximum likelihood method or similar.

In either case, one would want to test the robustness of the ordering under reasonable variations of the prior. In Option (2), this comes right out the fitted function parameters and their associated errors. That is, varying the functional parameters according to their (possibly correlated) uncertainties, and see if any rankings change. For those that do, one could declare a tie or (better) assign the higher rank to the player more often ranked higher across all variations or (better) report the probability that the ranking is correct, based on how often the higher-ranked player is ranked higher under variations. (In other words, treat the prior’s functional parameters as a source of systematic uncertainty, and propagate the uncertainty through to the rankings.)

Okay, this is maybe beefier than you were looking for. :slight_smile:

Only if the league free-throw average is 50%, I think. Your method would give the same score to the guy who made 501 out of 1001 free-throws as to the guy who made 1 out of 1. The guy who made 501 out of 1001, we can be pretty confident that his true rate is very close to 50%, but the guy who’s gone 1 for 1, our best bet is to say that he’s probably around average, or maybe a little better. So if the average is, say, 75%, then we should conclude that the guy who went 1 for 1 is probably better than the guy who went 501 for 1001.

I’m not sure what you’re doing with this, but I would want to present estimates and confidence intervals for each player. It might look somthing like this for two players:



   -
   |
   |
   |    -
   o    |
   |    o
   |    |
   |    -
   -


The one on the right made a lower fraction of his shots, but the estimate is tighter because he took more shots.

This strikes me as a good idea.

There may not be a “best” way to rank the players, but if you ranked them according to the lower limits of their confidence intervals (the “I’m pretty sure he’s at least this good” number), that seems as good a way as any to me.

The best estimator of the actual proportion is, of course, observed successes over observed attempts. The “lower limit of their confidence level” (which seems to assume there is some one confidence level which is privileged over all others) is not some sort of “worst-case scenario” statistic. Moreover, confidence levels are determined in part by the size of the sample, and it seems unlikely that the OP wants this to distort his ranking schema (but then again, who can tell, given the OP’s addled remarks about how an 83% success rate over 300 tries is a better record than an 87% success rate over 100).

What really is at issue here is the ordinary “math mysticism” of amateur scientists (which this board has got in spades). The OP’s fundamental confusion seems to be predicated on the belief that empirical observation can be dispensed with if you just have the right statistical armaments. If you want to know who is likely to sink more free throws, you’ve just got to watch them try to sink free throws. No math formula is going to take its place.

That’s the point, really.

I’m a bit offended by this.

First, I’ve got a Ph.D. in physics (just finished it earlier this year). I’ve crossed over to working in software development, so I’m not exactly a professional scientist, but it’s hardly fair to characterize me as a “amateur scientist” either (at least in the condescending way you seem to mean it).

What I’m definitely not, though, is a statistician, although I’ve taken a course or two in the subject. And in my experience this board actually has quite a few professional scientists, mathematicians, etc. who may know more about the subject than me.

At any rate, if you think that real science is all about making observations without ever trying to predict anything, you’re dead wrong. That’s what I’m asking for here: given unequal amounts of data on different players, what is the best way to predict who is most likely to make more FTs in the future. I phrased this as “what’s the best way to rank them”, but it amounts to the same thing.

Whether or not you find it “addled”, I’m quite sure that sometimes having a slightly lower FT % over a much larger sample is better evidence of FT making ability than having a higher FT% over a smaller sample. Again, the extreme example illustrates this: Making FT’s at a rate of 100% proves very little if you’ve only taken one shot.

Now that I’ve got that off my chest, thanks to the rest of you for your helpful suggestions. I’ll look over them a bit more this weekend and possibly follow up with another reply.

[bolding mine]

That could easily be a different question than we’ve been discussing. If you want to predict how many points a player will score from the free-throw line, you need more than simply his rate of success. You can get a success rate and general range with the confidence intervals, but they won’t tell you much of anything about how many points actually go on the board.

To predict actual points, I’d start with playing minutes, especially if you’re looking at a higher-level league. In the NBA, for example, I’d expect a much greater variance in playing time than in free throw efficiency. If you’re looking at a kids league or something like that, it might be a different story.

To clarify, I’m not really concerned about being able to say “Player A will make N free throws” or even “Player A will make X% of his free throws”. What I would like to do is rank all the players based on “how good they ‘really are’ at shooting free throws.” But I think “how good they ‘really are’” can be stated more concretely as “who would make the highest percentage of FTs in the future over the long haul”.

In otherwords, I’d like to predict who is more likely to make a higher % of their FTs in the future, but not necessarily what their FT % will be in the future.

I think Pasta is on the right track. If you have no reason to think any particular ‘true’ shooting percentage is more likely than any other, then the only way to predict future performance ranking is ranking by the percentage made previously. Now, you’ll be more confident in your predictions for some players, but you can’t change the ordering. For example, if player A made 744 of 1000 shots, player B made 6 of 10 and player C made 580 of 1000, the only possible ranking (without guessing what the distribution of ‘true’ abilities is) is A>B>C. In this case, you’ll be pretty sure that A>C over the long run while not be very sure at all where B will end up, but the only way to rank B is between A and C. (Or, in other words, if you have to bet, bet that A>B>C, but put a lot of money on A>C and as little as possible on A>B or B>C).

Now, if you follow Pasta’s idea and come up with some idea of what the league average is, and how players tend to fall around it, you can, with at least some kind of mathematical precision, say things like “Player Z made 0 of 3 shots, but that’s such a small sample that it’s more likely than they’re close to the league average in percentage than it is likely that they’re actually a zero percent shooter. So we’ll rank player Z above player S who only made 2,146 of 11,784 free throws and has demonstrated that they’re almost certainly well below league average.”

This bullshit “get out of your mom’s basement and just watch the games” argument is getting tiresome.