Probably an easy probability question

dgrdfd · August 21, 2014, 11:50pm

I am interested in converting Vegas NCAAFB spreads to win percentages using historical data, and then using said data to look at the probabilities of a collection of teams winning N games (ie, winning no games, 1 game, etc).

In particular,

There are six games.G1…G6.


Game	Win %
1	12%
2	89%
3	19%
4	5%
5	50%
6	29%

The odds of winning all six games is 0.290.500.050.190.89*0.12 ~= 0.015%.

The odds of winning no games is similarly (1-0.29) * (1-0.50) * …* (1-0.12) ~= 2.6%

Now how would I find the odds of winning 1, 2, or 3 games? The obvious brute force solution to this is to look at the probability of winning each game times the odds of losing every other game…and repeat this for each game and sum up the probabilities. That’s a lot of work and figured there had to be an easier approach. Thanks

Furthermore, is there a general approach? Suppose I have N events (E1…En) each with probability of occurring (P1…Pn) and they are each independent. What is the probability of some number Y from 0 to N occurring?

Thudlow_Boink · August 22, 2014, 12:17am

It is a lot of work, but it’s the kind of work that could readily be done by a computer. As far as I know, there is no easier way; but I’m not an expert.

If the probability were the same for each game, it’d be a straightfprward application of the binomial distribution.

dgrdfd · August 22, 2014, 12:59am

Thank you for the response. I could code something up in Python, but since I have two kids screaming in my ear and this is sort of a one-off I did it in Excel. What I ended up doing was basically creating a 64 row (2^6) table of 0s and 1s representing a “win” or “loss.” Then I could translate into each row the corresponding probabilities for a win/loss and multiply it out to determine the probability of a given scenario occurring. I could then sum of the bits in each row to find out how many teams won for each of the 64 scenarios. Then a simply sumif in excel to add in all the probabilities for ever 1 win scenario, two win scenario, etc. The final results are shown below and line up with my 0 win and 6 win hand computations:


#games	Odds
0	2.64%
1	26.24%
2	41.99%
3	23.26%
4	5.35%
5	0.50%
6	0.01%

Lance_Turbo · August 22, 2014, 1:06am

You could do this with generating functions.

For each game write p(loss) + p(win) x and multiply these together.

(0.88 + 0.12 x)(0.11 + 0.89 x)(0.81 + 0.19 x)(0.95 + 0.05 x)(0.50 + 0.50 x)(0.71 + 0.29 x)

Then use Wolfram Alpha or the like to expand this polynomial.

0.000147117 x^6 + 0.00502675 x^5 + 0.0535149 x^4 + 0.23258 x^3 + 0.419895 x^2 + 0.262393 x + 0.0264431

The coefficient of x^n is the probability of winning n games.

Saint_Cad · August 22, 2014, 1:56am

Like dgrdfd, I’d use a tree diagram. You would have 2^6 = 64 paths which wouldn’t be that difficult to setup in Excel where each edge’s weight is the probability of that event and the path’s weight is the product of the edges’ weights. Count the paths with the desired numbers of wins and add their weights.

Now that I think about it, this would make an awesome programming activity for students.

septimus · August 22, 2014, 11:47am

Go to the Wolfram Calculater and enter the following:

(1+0.29 (x-1))(1+0.50 (x-1))(1+0.05 (x-1))(1+0.19 (x-1))(1+0.89 (x-1))(1+0.12 *(x-1))

This is simply the list of six numbers you gave, with some parentheses and other symbols interjected.

Click the ‘=’ sign in the Wolfram input bar, wait a long while, and then scroll down to “Expanded form.” In the polynomial displayed, the chance of winning exactly K games appears as the coefficient of x[sup]K[/sup].

The Wolfram calculater does much more than just this simplification. Is there a way to turn the excess work off? Google calculater also accepts the expression, but it shows only a graph, not the simple expansion we seek.

ETA: Oops. Lynch me for skimming. Lance Turbo already gave a similar answer.

dgrdfd · August 22, 2014, 12:13pm

Thanks for responses. How does the generating function work here? I am not seeing a logical leap to get there…

Cabbage · August 22, 2014, 2:12pm

About the generating function: Think of each factor is being made of two possibilities. For example, in the factor (0.88 + 0.12 x) we have two terms: One corresponding to losing the first game (.88) and one corresponding to winning the first game (.12x). The coefficients, of course, give the probability of each event.

Now take the first three factors (only the first three since that should be enough to illustrate the point):

(0.88 + 0.12 x)(0.11 + 0.89 x)(0.81 + 0.19 x)

This corresponds to: (lose or win 1st)(lose or win 2nd)(lose or win 3rd)

Now think about polynomial multiplication. The terms from each factor are multiplied in every possible combination. Some of these products will be of degree 3 (x^3), degree 2 (x^2), degree 1 (x), or degree 0 (no x).

Let’s get all the ones of degree 1 (having just 1 power of x). The x must come from the first, second, or third factor:

(.12x * .11 * .81) + (.88 * .89x * .81) + (.88 * .11 * .19x)

This means exactly: (You win only the first game) or (you win only the second) or (you win only the third)

The arithmetic on all the coefficients will give you the total probability (total probability of winning exactly one game = coefficient of x^1).

Does that help?

Lance_Turbo · August 22, 2014, 5:59pm

You can enter, “Expand[(0.88 + 0.12 x)(0.11 + 0.89 x)(0.81 + 0.19 x)(0.95 + 0.05 x)(0.50 + 0.50 x)(0.71 + 0.29 x)]” into Wolfram Alpha and it doesn’t do quite as much stuff as just entering, “(0.88 + 0.12 x)(0.11 + 0.89 x)(0.81 + 0.19 x)(0.95 + 0.05 x)(0.50 + 0.50 x)(0.71 + 0.29 x)”. In either case it didn’t seem to take all that much time.

septimus · August 22, 2014, 8:57pm

I liked my formula, where the user doesn’t have to do the subtractions to get .88 = 1 - .12, etc.

The Wolfram site begins its answer soon, but took 10+ seconds before the Expanded form appears. Maybe it was weird of me to suggest that’s a long wait. :smack: … But your way is much faster.

I’ve never used Wolfram’s webpage before and am now rather amazed what you can do there for free!

As to OP’s question about understanding generating function, I’d suggest (just for conceptual understanding) replacing (.88 + .12 x) with (.88 Lose + .12 Win). The formula then adds and multiplies probabilities in the normal way. Replacing the two formal variables {Win,Lose} with a single variable is just a shortcut.

As another example of using generating functions, a Mathematica program to calculate the probabilities a bridge hand has x=0,1,2,3,… (Milton Work) High Card points is something like
In[1]:= z1 = ((1+xy^4) (1+xy^3) (1+xy^2) (1+xy) (1+x)^9)^4 ;
In[2]:= z2 = Coefficient [z1, x, 13] / Binomial [52, 13] ;
In[3]:= Table [Coefficient [z2, y, p], {p, 0, 37}]
Out[3]=

Can Wolfram’s free webpage do multiple-step procedures like that?

Dr.Strangelove · August 22, 2014, 9:17pm

Wolfram Alpha doesn’t seem to like long variables, but you can certainly do this:
(0.88l + 0.12w)(0.11l + 0.89w)(0.81l + 0.19w)(0.95l + 0.05w)(0.50l + 0.50w)(0.71l + 0.29w)

This approach also makes it more obvious how to add extra states, like:
(.34w + .54l + .12t)(.57w + .35l + .08t)

Lance_Turbo · August 22, 2014, 9:34pm

You can enter, “Table [Coefficient [Coefficient [((1+xy^4) (1+xy^3) (1+xy^2) (1+xy) (1+x)^9)^4 , x, 13], y, p], {p, 0, 37}]”.

This yields:

{2310789600, 5006710800, 8611542576, 15636342960, 24419055136, 32933031040, 41619399184, 50979441968, 56466608128, 59413313872, 59723754816, 56799933520, 50971682080, 43906944752, 36153374224, 28090962724, 21024781756, 14997082848, 10192504020, 6579838440, 4086538404, 2399507844, 1333800036, 710603628, 354993864, 167819892, 74095248, 31157940, 11790760, 4236588, 1396068, 388196, 109156, 22360, 4484, 624, 60, 4}

Link

Lance_Turbo · August 22, 2014, 9:40pm

Whoops I left out, “/ Binomial[52,13]”.

New link.

Lance_Turbo · August 22, 2014, 9:43pm

One more thing. You can tighten it up a little bit with CoefficientList.

CoefficientList [Coefficient [((1+xy^4) (1+xy^3) (1+xy^2) (1+xy) (1+x)^9)^4 , x, 13] / Binomial [52, 13] , y]

septimus · August 22, 2014, 10:12pm

Wow!! I am very impressed!

Indistinguishable · August 24, 2014, 8:34pm

If you like you can think of the calculation this way: Let P[sub]n/sub be the probability of winning m games out of the first n. We’re ultimately interested in the particular case where n is the total number of games, but it is convenient to build up to this by considering other n.

Why is that? Well, once we know all the values of P[sub]n/sub [that is, the values for m from 0 through n, the probability of course being zero elsewhere], we can readily compute from these all the values of P[sub]n + 1/sub, by the observation that P[sub]n + 1/sub = P[sub]n/sub * the probability of losing the (n + 1)th game + P[sub]n[/sub](m - 1) * the probability of winning the (n+th)th game.

In this way, starting with the trivial P[sub]0/sub = 1, we can in N many iterations build up to the distribution we want, P[sub]N[/sub].

This amounts to the same thing as taking the polynomial expression mentioned before and multiplying in each new factor one at a time in the straightforward way (distributing that one factor over the previously computed product of the previous factors and collecting the resulting terms before proceeding to the next factor). As P[sub]n[/sub] contains about n values, and it takes about 2n multiplications and n additions to update this to P[sub]n + 1[/sub], getting to P[sub]N[/sub] this way takes about N[sup]2[/sup] multiplications and N[sup]2[/sup]/2 additions.

Whereas the original naive approach of first considering every possible series of wins and losses separately and then grouping probabilities together amounts to taking the polynomial expression, first completely distributing it out, and only then collecting terms. Which means adding 2[sup]N[/sup] terms, each a product of N factors; thus, about 2[sup]N[/sup] * N multiplications followed by about 2[sup]N[/sup] additions.

Lance_Turbo · August 25, 2014, 7:07am

You can verify Indistinguishable’s observations on the order of the algorithms with the following Python code snippets.

I’m sure both could be improved upon, but as is they should serve as a proof of concept.

Polynomial method.



def polymethod(winprobs):
    answer = [1]
    for i in range(len(winprobs)):
        temp = [0] * (len(answer) + 1)
        for j in range(len(answer)):
            temp[j] += answer[j] * (1 - winprobs*)
            temp[j + 1] += answer[j] * (winprobs*)
        answer = temp
    return answer

Tree method.



def treemethod(winprobs):
    answer = [0] * (len(winprobs) + 1)
    for i in range(2**len(winprobs)):
        num = i
        prod = 1
        wins = 0
        for j in range(len(winprobs)):
            if num%2 == 1:
                prod *= winprobs[j]
                wins += 1
            else:
                prod *= (1 - winprobs[j])
            num = num>>1
        answer[wins] += prod
    return answer

Here’s the code to make them race to solve the OP’s problem.



import time

probs = [0.12, 0.89, 0.19, 0.05, 0.50, 0.29]

start = time.time()
out = polymethod(probs)
end = time.time()
print(out)
print(end - start)

start = time.time()
out = treemethod(probs)
end = time.time()
print(out)
print(end - start)

They will be pretty close to each other for just six games. However, if you expand the list to something like twenty games the difference will be noticeable. For thirty games, I don’t know if you want to wait for the tree method to finish.

It looks like coding up a polynomial class would be a better activity for students.

Topic		Replies	Views
NCAA Odds Question Factual Questions	4	2658	April 6, 2003
Math Question - 3 Outcome Probability Factual Questions	10	1124	December 6, 2008
A (simple ?) probability question. Factual Questions	54	6712	April 25, 2013
Simple math question Factual Questions	2	758	April 13, 2006
Probability question Factual Questions	10	1600	August 21, 2009

Probably an easy probability question

Related topics