Yet another probability/combinatorics question involving dice...

Or at least, it can involve dice, depending on how you look at it.

Here’s the basic situation:

If I roll N S-sided dice (where N>=S), what is the probability that every face value (from 1 to S, of course) will be represented at least once? Preferably, I’d like to be able to actually count (but nut enumerate :eek:) the number of ways that this can happen for arbitrary values of N and S.

Another way of looking at the problem is the happy meal dilemma. Suppose that a national burger chain is giving out a free toy with each kid’s meal. There are S different toys in the set. Each time that you buy a kid’s meal, the particular toy that you receive is selected at random (that is, with equal probability) from the set of toys. Now suppose that I have a bratty kid that insists on not just getting a particular toy, but on getting all of the toys. If I purchase N kid’s meals, what is the probability that I will have obtained (at least one of) every toy?

And now for the “weird” interpretation which is probably more familiar to math students:

I have S distinguishable objects (ex. numbered, labeled balls) and N distinguishable bins. How many ways can I place the balls into the bins such that no bin is empty?

This is a deceptively simple-sounding problem, probably because it’s so similar to a variety of very easy to solve problems. But every attempt I’ve made to solve it quickly turns into an epic quagmire of combinatorics.

Here’s the general train of my thought so far. For the sake of example, let’s say that I’m rolling 12 normal dice (6-sided). I want to know how many of the possible outcomes will contain at least one 1, one 2, one 3, …, and one 6.

Solving for the number of ways that this can happen without respect to order is surprisingly simple. This is really the same problem as: I have 12 (unnumbered) ping-pong balls and 6 numbered boxes. How many ways can I place the ping-pong balls into the boxes, leaving no empty boxes? Which, in turn, can be conceptualized as laying out the ping-pong balls in a line, and then partitioning the line into 6 parts using 5 dividers. There are 11 places that I can put each of the 5 dividers. Therefore, there are C(11,5) = 462 ways of doing this. However, a “way” in this case doesn’t respect which balls go into which boxes, just the total number of balls in each numbered box. This is great if you want to know how many distributions there are for rolling N S-sided dice, but it doesn’t give you the number of ways that each distribution can occur, which you would need to know to solve this problem.

Similarly, it’s easy to determine the number of ways that we can place N distinguishable objects into S distinguishable boxes, where the exact number of objects in each of the bins is known. That is, if I have an exact distribution in mind – like 1 one, 4 twos, 2 threes, 2 fours, 1 five, and 2 sixes – I can work out the number of ways for that happen pretty easily. In the example just given, this is simply:

12! / (1! * 4! * 2! * 2! * 1! * 2!) = 2,494,800

Which is about 0.115% of all possible outcomes for rolling 12 6-sided dice.

I’ve been trying to figure out some way to fuse these two tools together into something that I can leverage against this beast. That is, to somehow count the number of distributions, and from there, count the number of permutations for each distribution. But it just gets way too complex for anything but the smallest values of S and N. Certainly nothing I’ve come up with will provide a general approach for solving this for any arbitrary value of S and N.

Have I missed something really obvious? It seems like such a simple-sounding problem should have a simple solution.

Isn’t it relatively easy to compute the probability that each possible die value doesn’t appear? Compute the number of rolls that a 1 does not appear, and so on for all S faces. Then subtract the overlap of two numbers not appearing, add in the overlap of three appearing because you’ve over counted the number to subtract, and so on up to rolling the same face all N times. Subtract the total from the total number of possible rolls, and you’re done.

My probability skills have diminished over the year but I can run simulations:

For a six sided dice, 10,000 iterations:

At least all the numbers come up at least once:

7 dice 548
8 dice 1145
9 dice 1819
10 dice 2703
11 dice 3588
12 dice 4423

Here’s one ugly way to tackle it:

Let J(p, N, q) be the number of ways to make a sequence of length N, each of whose terms has N many possible values, with the property that “the first” q of those many possible values each appears somewhere in the sequence. (It doesn’t really matter that I say “the first”; any fixed subset will do)

Recursively, we have that J(p, N+1, q+1) = p*J(p, N, q), no matter what p, N, and q are. [We have p many possibilities for the first term of the sequence, and once we’ve achieved that, we don’t need to worry about achieving it again throughout the rest of the sequence].

We also have that J(p, N, 0) = p^N [if we don’t have to worry about making any specific values appear, we’re just making N many choices from p many possibilities] and that J(p, 0, q+1) = 0 [if we have to make a specific value appear, but don’t get any chances to do so, then we’re fucked].

Thus, stepping through the recursion, we have that J(p, q+d, q) = p^d * J(p, d, 0) = p^d * p^d = p^(2d).

In particular, if N>= S, then J(S, N, S) = S^(2(N-S)). That’s how many ways there are to accomplish this task; for the probability, we have to divide by the total number of possible dice outcomes, which is just S^N.

Thus, the final probability is S^(N - 2S). [Unless I’ve fucked up, which is a strong possibility]

Hm… I’ve clearly fucked up, since that formula can go out of range. It did seem far too easy. I shall have to ruminate on my mistake.

Ah, I see my mistake.

We don’t have that J(p, N+1, q+1) = p*J(p, N, q). Rather, we have that there are two possibilities: the first term of the sequence gives us one of the values we’re still looking to hit, or one of the values we’ve already taken care of. So, we get J(p, N+1, q+1) = (q+1)*J(p, N, q) + [p - (q+1)]*J(p, N, q+1).

Which makes the recursion “fork”, and thus, at least a priori, much nastier. Alas. Sorry to have wasted your time.

Inclusion/Exclusion

This yields the following formula (sorry for awful typesetting):

In Mathematica


nDsProb[n_, s_] := 
 Plus @@ Table[(-1)^i *Binomial[s, i]* ((s - i)/s)^n, {i, 0, s}]

Which means summation from i = 0 to s of (-1) to the i times s choose i times s-i to the n over s to the n.

Probabilities for 7d6 to 12d6:

7d6: 0.0540123
8d6: 0.114026
9d6: 0.189043
10d6: 0.271812
11d6: 0.356206
12d6: 0.437816

These match notfrommensa’s sims pretty well.

Hmmmm…I think I see where you’re going with that. It’s worth a try.

Let’s attempt it with 6 dice that have 4 sides each:

There are 4^6 = 4096 possible combinations.

Number of combinations which do not contain 1 = C(3,1)^6 = 729
Number of combinations which do not contain 2 = C(3,1)^6 = 729
Number of combinations which do not contain 3 = C(3,1)^6 = 729
Number of combinations which do not contain 4 = C(3,1)^6 = 729

Number of combinations which do not contain 1 or 2 = C(2,1)^6 = 64
Number of combinations which do not contain 1 or 3 = C(2,1)^6 = 64
Number of combinations which do not contain 1 or 4 = C(2,1)^6 = 64
Number of combinations which do not contain 2 or 3 = C(2,1)^6 = 64
Number of combinations which do not contain 2 or 4 = C(2,1)^6 = 64
Number of combinations which do not contain 3 or 4 = C(2,1)^6 = 64

Number of combinations which do not contain 1, 2, or 3 = C(1,1)^6 = 1
Number of combinations which do not contain 1, 2, or 4 = C(1,1)^6 = 1
Number of combinations which do not contain 1, 3, or 4 = C(1,1)^6 = 1
Number of combinations which do not contain 2, 3, or 4 = C(1,1)^6 = 1

Ok, there’s a certain logic to this. There are C(4,1) of the single-dice exclusions, C(4,2) of the double-dice exclusions, and C(4,3) of the triple-dice exclusions. Or to be more general: There are C(S,1) sets of combinations which exclude a single face, each of which contains C(S-1,1)^N combinations. There are C(S,2) sets of combinations which exclude a particular pairing of face values, each of which contains C(S-2,1)^N combinations … There are C(S,S-1)^N sets which exclude a particular combination of S-1 face values, each of which contains C(S-[S-1],1)^N = C(1,1)^N combinations.

So we can make a sort of series out of this where each term is:

C(S,i) * C(S-i,1)^N

The series runs from i=1 to i=S-1. Each term in the series tells you the number of combinations at each level of the tree, so to speak.

Now the hard part is figuring out how to operate on these terms in a way that handles all the overlaps properly.

I don’t believe there’s any closed form known, so the recursion’s as good as you’re going to get.

Blargh.

Obviously Lance Turbo is lot faster at this than I am. I was still busy composing my partial solution when he posted that. At least it looks like I was on the right track.

I’m going to have to ponder his equation a little longer to figure out how it relates to what I was chasing, but I’m pretty sure they’re related. But right now I need to go home!

A little manipulotorics reveals that my formula is identical to:


nDsProb2[n_, s_] := StirlingS2[n, s]*Factorial[s]/s^n;

Where StirlingS2[n, s] is the stirling number of the second kind usually denoted S(n,s).

Also I’m pretty sure the answer to this and similar questions can be found at this link.

Oh, very nice. I’m not sure what the “manipulotorics” you went through to get that were, but having reached it, it’s sort of head-smackingly clear that it’s right (at least, once one is reminded what the definition of Stirling numbers of the second kind is).

[Of course, “Stirling numbers of the second kind” is just a convenient name for something defined directly as very close to the solution of this problem anyway, and thus to express the solution in that form doesn’t help compute the answer any more efficiently than your original sum (i.e., there’s not generally considered to be more of a “closed form” expression for Stirling numbers of the second kind, as noted by ultrafilter. [But then, what counts as a closed form expression is all very language-relative anyway; one could imagine the man who complained that factorials didn’t count to him as closed-form!]). But realizing the connection does give the closure that’s found by tying back in a clear way to well-explored mathematics, instead of stumbling around and wondering “What if–?”, and thus achieves, for this problem, the best in the way of a solution which can be given.]