Bookies & Statisticians: How are odds assigned?

All of you have contributed some great thinking on this, and have advanced my approach to the problem. Many thanks. I’ve gone back to the booklet in which I found the 6 hits sucker bet. It’s Beating the Bookie; Huey Mahl; Gambler’s Book Club; 1975. There’s a fairly good analysis of bookies’ lines and vig. On the most popular line, the 20-cent line, the average vig for the bookie hovers around 0.0345. So it’s likely that a sucker bet will profit a bookie more - probably much more. Let’s say that 9 to 1 are the true odds on a bet paying 8 to 1. Now .125 - .111 = .014 is not a very good payoff. But if the true odds were 12 to 1, then an 8 to 1 payoff should get the bookie .0417. That’s close to the average vig on a 20-cent line, but probably not enough to qualify as a sucker bet. My guess is the true odds for the 6 hit sucker bet are around 15:1.

One way to solve this is to go to the almanacs, take a few years of box scores and, without looking at the performances, take all the hitters for each day and try various selection strategies (e.g., the three hitters with the best avgs, the three hitters with the best avgs who are fourth in the batting rotation, the three hitters with the best avgs who are first in the batting rotation, hitters who are in a slump (and therefore “due”), hitters who are on a streak); then see how they did. From this, we should be able to come up with an optimal strategy. The 8:1 payoff has to presume that the optimal selection strategy will be employed (perhaps what makes this a sucker bet is that the optimal strategy is counter-inuitive). This bet was around before the computer age, but it should be easy to write a program that can apply multiple selection strategies to all the hitters playing on a given day and rate those strategies based on their ability to pick three hitters who will get six hits among them.

Ideas?

I’d take the top 10 hits-per-game hitters and the top 100 hits-allowed pitchers as a starting point. Compare each hitter to the starter and reliever they’re most likely to face, weighting the starter’s score and the reliever’s score by 7:2 (assuming reliever comes in for last two innings).

A player averaging 1.4 or better could do it by getting 7 hits in 5 games, and is going to be your strongest bet. He can do it many different ways, but let’s assume your hitter won’t get more than 3 hits per game to simplify things:

3-3-1-0-0, 3-2-1-1-0, 3-1-1-1-1, 2-2-1-1-1, 2-2-2-1-0 are the ways he can have an average series of five games. The odds of the four different outcomes are then:

3 hits: 4/25
2 hits: 6/25
1 hit: 11/25
0 hits: 4/25

Three hitters with a 1.4 or better average can get you six (or more) hits in one night by hitting any of the following combinations:



3-3-(0 or more) = 2.5%
3-2-(1 or more) = 3.2%
2-2-(2 or more) = 2.3%
------------------------------
                  8.0%


Which suggests the odds for the very best batters out there are still only 1 in 12.5. It might be a good idea to examine ways that 4 or 5 hits could come up in a night and redo the analysis. I maintain that your best bet is to pick three who are averaging better than 1.4 hits-per-game and find a night when they’re each facing weak pitchers.

please explain why hitters winning combinations would not also include:

0-3-3; 3-0-3; 1-2-3; 2-1-3

Because in my assumptions, I stated that you could only hit 0,1,2, or 3 hits per game, so 0-3-3 is the same as 3-3-0. Likewise, 1-2-3 and 2-1-3 are both equal to 3-2-(1 or more). I’ve redone my calculations to include the possibility of getting up to five hits per game, and posted a spreadsheet here.

For four player averages (1.25, 1.4, 1.5, and 2) I created a table of possible ways to get that average over a series of games. Then I looked at what frequency you’d have to hit each of the hits-per-game, and calculated the odds that three players with the same average would reach six (or more) hits in one game. Then I had MS Excel put a trend line on the results, so the large centered table in the spreadsheet shows what your odds are if you pick three players with the same average.

I’ve assumed that

  • You know the player is playing
  • The game goes into the books (isn’t rained out)
  • No player hits more than 5 hits per game
  • All three of your players have faced average pitchers all season, and are facing average pitchers tonight

It’s possible, but much harder, to calculate the odds of three players with different averages. However it’s kind of useless: unless you’ve got 3 players with an average over 1.45 hits per game, 8-to-1 is not a good payoff.

But you’ve only counted 3-3-0, and not 3-0-3 and 0-3-3, and hence undercounted, by a factor of 3, cases of 6 hits with one batter getting no hits.

The distribution of the total number of hits can be obtained by repeatedly convolving the distribution for one player with itself (this assumes independence). Doing it for your 4:11:6:4 distribution yields the following distribution for the total:

0 0.0040960
1 0.0337920
2 0.1113600
3 0.1988480
4 0.2346240
5 0.2058240
6 0.1274880
7 0.0614400
8 0.0184320
9 0.0040960

The probability of 6 or more is 0.21146.

You can do this for any single-player distributions that you want to, even if they’re different (convolution just sums all the relevant products).

I’m pretty sure the other cases are redundant. I’m assuming three batters with identical hits-per-game distributions to simplify the problem. So it doesn’t matter which pair of batters hit three hits – it’s only relevant that if one batter gets exactly three, and the next gets three or more, that the third one needn’t get any (but may get as many as he likes). I used 3-3-0 as shorthand, but it would be more accurate to call it (3,3+,0) if that helps you see it more clearly. It’s backed up by the fact that probabilities of events that must happen together (“X and Y and Z”) are calculated by multiplying the individual probabilities, and multiplication is commutative (XYZ = YZX). I’m saying that the odds of a (3,3+,0+) night are equal to:

(Odds that A, B, or C will hit 3 runs) * (Odds that one of the remaining pair will hit 3 or more runs) * (Odds that the third batter gets zero or more hits)

The third term is clearly 100%. I went back and re-calculated the distribution assuming a maximum of 5 hits per game after checking the records; since 6 hits per game is rare enough to be in the record books, I felt that 5 was a better simplification. My new distribution is below, and it yields odds of ~10% that a batter with an average of 1.4 hits per game will hit exactly 3 hits in a game, and odds of 20% that another batter will hit either 3,4, or 5 hits.

That means that the odds of three batters combining for six or more runs, with one batter hitting exactly three runs, are (10% * 20% * 100%) = 2%. That’s only one of the possible scenarios, but it is irrelevant for our purposes which batter gets exactly three, which one gets three-or-more, and which one can sleep through the game. What is important is that we must add the odds of all the possible winning scenarios to discover the odds of winning.

Your solution gives this one scenario a probability of 21%, better than one in five. If that were the case, why would any bookie offer 8-to-1 odds that none of the other winning scenarios would come up?



New Distribution for a batter with 1.4 hits per game
hits	exactly N   at least N     
5	4%   4%
4	6%   10%
3	10%   20%
2	16%   36%
1	34%   70%
0	30%   100%


I’m certain that you’re not computing the probability properly. Consider just the probability that some batter will get exactly one hit, another will get two, and the remaining one will get three. You seem, in #22, to be reasoning that this probability is the product of the single-batter probabilities of 1, 2, and 3 hits, or 11/25 * 6/25 * 4/25 = 0.016896. But that’s just the probability that batter A gets 1, B gets 2, and C gets 3. With equal probability, A gets 2, B gets 1, and C gets 3. And so on. There are 3! = 6 permutations, so the total probability is 0.10138. These possibilities alone exceed the 8% that you claimed.

My answer is correct given the assumptions that you made: a certain distribution of the number of hits that a single batter gets, plus independence among batters. We have no reason to believe that this distribution is correct. Even if we accept the average number of hits that you’re assuming, you’ve picked one of many possible distributions.

The probability distribution of the sum of independent random variables is the convolution of their distributions. If you do the convolutions, using the distrubution you hypothesized, you’ll get the same answer.

You’ve got me dead to rights here – I accounted for all of the different ways a batter could average 1.4 hits per game (all the ways to get 7 hits in 5 games), assumed they’re all equally likely (they’re not), and looked at the relative number of zeroes, ones, twos, threes, and so on. It’s absolutely possible that a hitter averaging 1.4 hits per game will simply get one- and two-hit games over and over.

Yep. I don’t see any good way to assume a distribution without reams of statistical data on each player, though.

I’d love to, but I never learned how to do convolutions. I’m looking at last night’s box scores to come at the problem from a different direction – how many combinations of three batters were winning combinations vs. losing combinations – so we’ll see if that branch of reasoning yields more fruit.

Keep in mind that the bet is that three players will get at least six hits among them - not that they will get exactly six hits. If one selected hitter gets four hits and another gets three hits, that’s enough to satisfy the requirement of the bet. It doesn’t matter in this case what the third selected hitter gets. He could get none or he could get five. It makes no difference to the outcome. So whatever mathematical representation of this bet we come up with must accommodate scenarios like this.

do you actually know anything about baseball?

I know the rules and I played for three seasons as a child. I know what constitutes a “hit” and what doesn’t. I know what is required for there to be a “game” (in the “hits per game” calculation) and what things can cancel a game. I don’t think I need to have a team jersey or season tickets to help solve this statistics problem. I’ve admitted the error in my initial assumption – did you show up just to throw rocks, or are you here to help?

On the other hand, I haven’t watched a game since a friend dragged me to the Orioles vs. Sox game early this season, and before that, I think I caught Game 6 and Game 7 of Yankees vs. Sox the year the Red Sox won the World Series, and I don’t think I could tell you with any certainty what year that was. If you’re looking for someone who knows, say, which team plays in Detroit, or who the best left-hander is for the Padres, or whether a fielder is more likely to throw to second or third in a given situation, you’ll have to find someone else. Is any of that likely to be relevant to this problem?

Here’s the distribution, by the way: 339 players batted on Friday night. Between all 339, they delivered 320 hits. 133 of them had no hits, 117 had one hit, 66 had two, 21 had three, and 2 had four (both in the same game, a 13-3 slaughter where the pitcher gave up a total of 23 hits). Assuming Friday night was average, it should be easy enough to figure out how many combinations of batters could have been picked who would have won money for the bettor.



Hits	freq	Hits * Freq
0	133	0
1	117	117
2	66	132
3	21	63
4	2	8
5	0	0
6	0	0
-------------------
        339     320


It was a serious question. i actually offered my opinion already and for some reason you seem to be determined that there is an arithmetic solution to the problem. this leads me to believe, though i could be mistaken, that you may no far more about math than i, but you know relatively little about baseball. the approach you are using or suggesting is dependent on there being a relative equivalency among hitters. that is decidedly not the case. you also say something about ichiro having certain statistics and that if we had 3 hitters like ichiro etc. Well there is ichiro and everyone else.

i think there are many baseball factors that enter into the equation that are not being considered, some i have mentioned and some which i am not aware of. But as limited as my knowledge is of statistics, i do know that a sample of 1 has no statistical merit whatsoever and there is no reason in the world to think that any one day is representative of anything at all.
sorry if you felt maligned, i meant no harm.

And 320/339 = 0.944 hits per person on average. And it turns out that the expected number of people got hits very close to what the Poisson model predicts (see post #8):

P(n) = avg[sup]n[/sup] * e[sup]-avg[/sup]/n!

…and the expected number of hits H, being the number of trials (339) times the probability:

H(0) = 132
H(1) = 125
H(2) = 59
H(3) = 18
H(4) = 4
H(5) = 1
H(6) < 1

I’d clarify your language above to read “hits per person per game” but it looks promising. Does that mean we can assume that most batters (many batters? some batters?) follow a Poisson model more-or-less, and so we can use their hits-per-game average and the formula given to predict the frequency of their hits per game? I’m still a little wary of a model that assumes an “average” pitcher (and a consistent batter), but it may be the best we can do.

Okay, so here we go. We assume:

(1) A batter’s season-to-date average for hits per game is representative of how he will continue to play, and how he will continue to be pitched to. With 80+ games gone by, we’re probably not yet on solid ground, but I like it better than using a batter’s career data. denquixote might be able to make a good argument for using career data instead, but for now let’s use current season hits-per-game as the mean.

(2) The number of hits per batter per game is a Poisson process and can be modeled as in Punoqllads’s post above: a game is a “trial” and a hit is an “arrival” or “event”, so average hits-per-game corresponds to the Poisson term lambda (average). Notice that this is a leap from what he proved above, which is that on a Friday night, all batters in all games matched the Poisson distribution. I’m willing to take some flak from smarter math-heads on this assumption.

(3) We can model the output of any three batters by assuming a Poisson distribution with lambda equal to the sum of their hits-per-game averages. For any number of combined hits “N or more” we can subtract {P(N-1)+P(N-2)+…+P(0)} from 100% to get the probability that these three hitters will combine for N-or-more hits, if each is given one trial. I’ve read up on the Poisson distribution and I think this assumption proceeds from (2), but again, point out if you think I’ve gone astray.

With those three assumptions, we have the probability of getting seeing N hits (N across the top) given a group of three batters with a given HPG total (HPG down the left). The winning case, for N >= 6, is the last column.



HPG	0	1	2	3	4	5	6	7	8	9		6 or more
3	4.98%	14.94%	22.40%	22.40%	16.80%	10.08%	5.04%	2.16%	0.81%	0.27%		8.39%
3.05	4.74%	14.44%	22.03%	22.39%	17.08%	10.42%	5.30%	2.31%	0.88%	0.30%		8.90%
3.1	4.50%	13.97%	21.65%	22.37%	17.33%	10.75%	5.55%	2.46%	0.95%	0.33%		9.43%
3.15	4.29%	13.50%	21.26%	22.32%	17.58%	11.08%	5.81%	2.62%	1.03%	0.36%		9.98%
3.2	4.08%	13.04%	20.87%	22.26%	17.81%	11.40%	6.08%	2.78%	1.11%	0.40%		10.54%
3.25	3.88%	12.60%	20.48%	22.18%	18.02%	11.72%	6.35%	2.95%	1.20%	0.43%		11.12%
3.3	3.69%	12.17%	20.08%	22.09%	18.23%	12.03%	6.62%	3.12%	1.29%	0.47%		11.71%
3.35	3.51%	11.75%	19.69%	21.98%	18.41%	12.34%	6.89%	3.30%	1.38%	0.51%		12.32%
3.4	3.34%	11.35%	19.29%	21.86%	18.58%	12.64%	7.16%	3.48%	1.48%	0.56%		12.95%
3.45	3.17%	10.95%	18.89%	21.73%	18.74%	12.93%	7.43%	3.66%	1.58%	0.61%		13.58%
3.5	3.02%	10.57%	18.50%	21.58%	18.88%	13.22%	7.71%	3.85%	1.69%	0.66%		14.24%
3.55	2.87%	10.20%	18.10%	21.42%	19.01%	13.50%	7.99%	4.05%	1.80%	0.71%		14.91%
3.6	2.73%	9.84%	17.71%	21.25%	19.12%	13.77%	8.26%	4.25%	1.91%	0.76%		15.59%
3.65	2.60%	9.49%	17.31%	21.06%	19.22%	14.03%	8.54%	4.45%	2.03%	0.82%		16.28%
3.7	2.47%	9.15%	16.92%	20.87%	19.31%	14.29%	8.81%	4.66%	2.15%	0.89%		16.99%
3.75	2.35%	8.82%	16.54%	20.67%	19.38%	14.53%	9.08%	4.87%	2.28%	0.95%		17.71%
3.8	2.24%	8.50%	16.15%	20.46%	19.44%	14.77%	9.36%	5.08%	2.41%	1.02%		18.44%
3.85	2.13%	8.19%	15.77%	20.24%	19.48%	15.00%	9.62%	5.29%	2.55%	1.09%		19.19%
3.9	2.02%	7.89%	15.39%	20.01%	19.51%	15.22%	9.89%	5.51%	2.69%	1.16%		19.94%
3.95	1.93%	7.61%	15.02%	19.78%	19.53%	15.43%	10.16%	5.73%	2.83%	1.24%		20.71%
4	1.83%	7.33%	14.65%	19.54%	19.54%	15.63%	10.42%	5.95%	2.98%	1.32%		21.49%
4.05	1.74%	7.06%	14.29%	19.29%	19.53%	15.82%	10.68%	6.18%	3.13%	1.41%		22.27%
4.1	1.66%	6.79%	13.93%	19.04%	19.51%	16.00%	10.93%	6.40%	3.28%	1.50%		23.07%
4.15	1.58%	6.54%	13.58%	18.78%	19.48%	16.17%	11.18%	6.63%	3.44%	1.59%		23.87%
4.2	1.50%	6.30%	13.23%	18.52%	19.44%	16.33%	11.43%	6.86%	3.60%	1.68%		24.69%
4.25	1.43%	6.06%	12.88%	18.25%	19.39%	16.48%	11.67%	7.09%	3.77%	1.78%		25.51%
4.3	1.36%	5.83%	12.54%	17.98%	19.33%	16.62%	11.91%	7.32%	3.93%	1.88%		26.33%
4.35	1.29%	5.61%	12.21%	17.71%	19.26%	16.75%	12.15%	7.55%	4.10%	1.98%		27.17%
4.4	1.23%	5.40%	11.88%	17.43%	19.17%	16.87%	12.37%	7.78%	4.28%	2.09%		28.01%
4.45	1.17%	5.20%	11.56%	17.15%	19.08%	16.98%	12.60%	8.01%	4.45%	2.20%		28.86%
4.5	1.11%	5.00%	11.25%	16.87%	18.98%	17.08%	12.81%	8.24%	4.63%	2.32%		29.71%


It shows that if you pick the three best hitters (per game) in the league-- Suzuki, Jeter, Young (combined avg = 4.12) – you have a 23% of getting your payout and a 77% chance of losing your money. It looks like the break-even point (12.5% expectation of success) is when the batters have a combined average of 3.35 hits per game, and there’s a pretty steep drop-off even within the top twenty hitters, so if you’re arbitrarily picking batters you’re practically guaranteed to lose.

I was surprised to see this result because it implies that either the bookies are setting the odds wrong or the punters pick their batters badly. I’d be interested to have someone with a stronger math background examine my third assumption - that we can sum the averages and count the three parallel trials as a single trial of the collective.